Details der Publikation - Comparing the performance of ChatGPT GPT-4, Bard, and Llama-2 in the Taiwan Psychiatric Licensing Examination and in differential diagnosis with multi-center psychiatrists

Comparing the performance of ChatGPT GPT-4, Bard, and Llama-2 in the Taiwan Psychiatric Licensing Examination and in differential diagnosis with multi-center psychiatrists

© 2024 The Authors. Psychiatry and Clinical Neurosciences © 2024 Japanese Society of Psychiatry and Neurology..

AIM: Large language models (LLMs) have been suggested to play a role in medical education and medical practice. However, the potential of their application in the psychiatric domain has not been well-studied.

METHOD: In the first step, we compared the performance of ChatGPT GPT-4, Bard, and Llama-2 in the 2022 Taiwan Psychiatric Licensing Examination conducted in traditional Mandarin. In the second step, we compared the scores of these three LLMs with those of 24 experienced psychiatrists in 10 advanced clinical scenario questions designed for psychiatric differential diagnosis.

RESULT: Only GPT-4 passed the 2022 Taiwan Psychiatric Licensing Examination (scoring 69 and ≥ 60 being considered a passing grade), while Bard scored 36 and Llama-2 scored 25. GPT-4 outperformed Bard and Llama-2, especially in the areas of 'Pathophysiology & Epidemiology' (χ2 = 22.4, P < 0.001) and 'Psychopharmacology & Other therapies' (χ2 = 15.8, P < 0.001). In the differential diagnosis, the mean score of the 24 experienced psychiatrists (mean 6.1, standard deviation 1.9) was higher than that of GPT-4 (5), Bard (3), and Llama-2 (1).

CONCLUSION: Compared to Bard and Llama-2, GPT-4 demonstrated superior abilities in identifying psychiatric symptoms and making clinical judgments. Besides, GPT-4's ability for differential diagnosis closely approached that of the experienced psychiatrists. GPT-4 revealed a promising potential as a valuable tool in psychiatric practice among the three LLMs.

Medienart:	E-Artikel

Erscheinungsjahr:	2024
Erschienen:	2024

Enthalten in:	Zur Gesamtaufnahme - year:2024
Enthalten in:	Psychiatry and clinical neurosciences - (2024) vom: 26. Feb.

Sprache:	Englisch

Beteiligte Personen:	Li, Dian-Jeng [VerfasserIn] Kao, Yu-Chen [VerfasserIn] Tsai, Shih-Jen [VerfasserIn] Bai, Ya-Mei [VerfasserIn] Yeh, Ta-Chuan [VerfasserIn] Chu, Che-Sheng [VerfasserIn] Hsu, Chih-Wei [VerfasserIn] Cheng, Szu-Wei [VerfasserIn] Hsu, Tien-Wei [VerfasserIn] Liang, Chih-Sung [VerfasserIn] Su, Kuan-Pin [VerfasserIn]

Links:	Volltext

Themen:	ChatGPT Chatbot Differential diagnosis in psychiatry Journal Article Psychiatric application Taiwanese psychiatric licensing examination

Anmerkungen:	Date Revised 26.02.2024 published: Print-Electronic Citation Status Publisher

doi:	10.1111/pcn.13656

funding:
Förderinstitution / Projekttitel:

PPN (Katalog-ID):	NLM368942198

Internformat


LEADER	01000naa a22002652 4500
001	NLM368942198
003	DE-627
005	20240229160134.0
007	cr uuu---uuuuu
008	240229s2024 xx \|\|\|\|\|o 00\| \|\|eng c
024	7		\|a 10.1111/pcn.13656 \|2 doi
028	5	2	\|a pubmed24n1306.xml
035			\|a (DE-627)NLM368942198
035			\|a (NLM)38404249
040			\|a DE-627 \|b ger \|c DE-627 \|e rakwb
041			\|a eng
100	1		\|a Li, Dian-Jeng \|e verfasserin \|4 aut
245	1	0	\|a Comparing the performance of ChatGPT GPT-4, Bard, and Llama-2 in the Taiwan Psychiatric Licensing Examination and in differential diagnosis with multi-center psychiatrists
264		1	\|c 2024
336			\|a Text \|b txt \|2 rdacontent
337			\|a ƒaComputermedien \|b c \|2 rdamedia
338			\|a ƒa Online-Ressource \|b cr \|2 rdacarrier
500			\|a Date Revised 26.02.2024
500			\|a published: Print-Electronic
500			\|a Citation Status Publisher
520			\|a © 2024 The Authors. Psychiatry and Clinical Neurosciences © 2024 Japanese Society of Psychiatry and Neurology.
520			\|a AIM: Large language models (LLMs) have been suggested to play a role in medical education and medical practice. However, the potential of their application in the psychiatric domain has not been well-studied
520			\|a METHOD: In the first step, we compared the performance of ChatGPT GPT-4, Bard, and Llama-2 in the 2022 Taiwan Psychiatric Licensing Examination conducted in traditional Mandarin. In the second step, we compared the scores of these three LLMs with those of 24 experienced psychiatrists in 10 advanced clinical scenario questions designed for psychiatric differential diagnosis
520			\|a RESULT: Only GPT-4 passed the 2022 Taiwan Psychiatric Licensing Examination (scoring 69 and ≥ 60 being considered a passing grade), while Bard scored 36 and Llama-2 scored 25. GPT-4 outperformed Bard and Llama-2, especially in the areas of 'Pathophysiology & Epidemiology' (χ2 = 22.4, P < 0.001) and 'Psychopharmacology & Other therapies' (χ2 = 15.8, P < 0.001). In the differential diagnosis, the mean score of the 24 experienced psychiatrists (mean 6.1, standard deviation 1.9) was higher than that of GPT-4 (5), Bard (3), and Llama-2 (1)
520			\|a CONCLUSION: Compared to Bard and Llama-2, GPT-4 demonstrated superior abilities in identifying psychiatric symptoms and making clinical judgments. Besides, GPT-4's ability for differential diagnosis closely approached that of the experienced psychiatrists. GPT-4 revealed a promising potential as a valuable tool in psychiatric practice among the three LLMs
650		4	\|a Journal Article
650		4	\|a ChatGPT
650		4	\|a Taiwanese psychiatric licensing examination
650		4	\|a chatbot
650		4	\|a differential diagnosis in psychiatry
650		4	\|a psychiatric application
700	1		\|a Kao, Yu-Chen \|e verfasserin \|4 aut
700	1		\|a Tsai, Shih-Jen \|e verfasserin \|4 aut
700	1		\|a Bai, Ya-Mei \|e verfasserin \|4 aut
700	1		\|a Yeh, Ta-Chuan \|e verfasserin \|4 aut
700	1		\|a Chu, Che-Sheng \|e verfasserin \|4 aut
700	1		\|a Hsu, Chih-Wei \|e verfasserin \|4 aut
700	1		\|a Cheng, Szu-Wei \|e verfasserin \|4 aut
700	1		\|a Hsu, Tien-Wei \|e verfasserin \|4 aut
700	1		\|a Liang, Chih-Sung \|e verfasserin \|4 aut
700	1		\|a Su, Kuan-Pin \|e verfasserin \|4 aut
773	0	8	\|i Enthalten in \|t Psychiatry and clinical neurosciences \|d 1998 \|g (2024) vom: 26. Feb. \|w (DE-627)NLM085825468 \|x 1440-1819 \|7 nnns
773	1	8	\|g year:2024 \|g day:26 \|g month:02
856	4	0	\|u http://dx.doi.org/10.1111/pcn.13656 \|3 Volltext
912			\|a GBV_USEFLAG_A
912			\|a GBV_NLM
951			\|a AR
952			\|j 2024 \|b 26 \|c 02

Comparing the performance of ChatGPT GPT-4, Bard, and Llama-2 in the Taiwan Psychiatric Licensing Examination and in differential diagnosis with multi-center psychiatrists

Zugang & Verfügbarkeit

Zugehörige Publikationen/Bände