Are Large Language Models True Healthcare Jacks
View PDF HTML (experimental)
Abstract:Recent advancements in Large Language Models (LLMs) have demonstrated their potential in delivering accurate answers to questions about world knowledge. Despite this, existing benchmarks for evaluating LLMs in healthcare predominantly focus on medical doctors, leaving other critical healthcare professions underrepresented. To fill this research gap, we introduce the Examinations for Medical Personnel in Chinese (EMPEC), a pioneering large-scale healthcare knowledge benchmark in traditional Chinese. EMPEC consists of 157,803 exam questions across 124 subjects and 20 healthcare professions, including underrepresented occupations like Optometrists and Audiologists. Each question is tagged with its release time and source, ensuring relevance and authenticity. We conducted extensive experiments on 17 LLMs, including proprietary, open-source models, general domain models and medical specific models, evaluating their performance under various settings. Our findings reveal that while leading models like GPT-4 achieve over 75% accuracy, they still struggle with specialized fields and alternative medicine. Surprisingly, general-purpose LLMs outperformed medical-specific models, and incorporating EMPEC's training data significantly enhanced performance. Additionally, the results on questions released after the models' training cutoff date were consistent with overall performance trends, suggesting that the models' performance on the test set can predict their effectiveness in addressing unseen healthcare-related queries. The transition from traditional to simplified Chinese characters had a negligible impact on model performance, indicating robust linguistic versatility. Our study underscores the importance of expanding benchmarks to cover a broader range of healthcare professions to better assess the applicability of LLMs in real-world healthcare scenarios. Comments: 15 pages, 4 figures Subjects: Computation and Language (cs.CL) Cite as: arXiv:2406.11328 [cs.CL] (or arXiv:2406.11328v1 [cs.CL] for this version) https://doi.org/10.48550/arXiv.2406.11328arXiv-issued DOI via DataCite
Submission history
From: Zheheng Luo [view email]
[v1] Mon, 17 Jun 2024 08:40:36 UTC (2,313 KB)
相关知识
MentalGLM Series: Explainable Large Language Models for Mental Health Analysis on Chinese Social Media
Data Solutions for Healthcare
Research into language learning and motivation has changed direction over the pa
Do LLMs Provide Consistent Answers to Health
虚拟物种的基本原理及其在物种分布模型评估中的应用
The foundation of wellness – Esports Healthcare
[2025] 150 Courses & Webinars on AI in Healthcare — Class Central
Disrupting diagnostic hegemony: reimagining mental health language with British South Asian communities
Language interpreting and translation: migrant health guide
Language: A Powerful Tool in Promoting Healthy Behaviors
网址: Are Large Language Models True Healthcare Jacks https://www.trfsz.com/newsview1706456.html
推荐资讯
- 1发朋友圈对老公彻底失望的心情 12775
- 2BMI体重指数计算公式是什么 11235
- 3补肾吃什么 补肾最佳食物推荐 11199
- 4性生活姿势有哪些 盘点夫妻性 10428
- 5BMI正常值范围一般是多少? 10137
- 6在线基础代谢率(BMR)计算 9652
- 7一边做饭一边躁狂怎么办 9138
- 8从出汗看健康 出汗透露你的健 9063
- 9早上怎么喝水最健康? 8613
- 10五大原因危害女性健康 如何保 7828
