OpenEvidence批评近期LLM基准研究,呼吁更好基准,Medmarks被建议作为替代评估
英文摘要
OpenEvidence expressed dissatisfaction with a recent LLM benchmarking study, echoing a broader call for improved benchmarks. The author supports this view and suggests evaluating OpenEvidence on the open and transparent Medmarks benchmark suite.
中文摘要
OpenEvidence对近期一项LLM基准研究表示不满,呼应了需要更好基准的呼声。作者赞同这一观点,并建议使用公开透明的Medmarks基准套件来评估OpenEvidence。
关键要点
OpenEvidence expressed displeasure with a recent LLM benchmarking study.
OpenEvidence对近期一项LLM基准研究表示不满。
The post argues that better benchmarks are needed for evaluating medical LLMs.
帖子认为需要更好的基准来评估医学LLM。
The author suggests using the Medmarks benchmark suite, described as open and completely transparent.
作者建议使用Medmarks基准套件,称其公开且完全透明。