Details der Publikation - Comparison of the problem-solving performance of ChatGPT-3.5, ChatGPT-4, Bing Chat, and Bard for the Korean emergency medicine board examination question bank

Comparison of the problem-solving performance of ChatGPT-3.5, ChatGPT-4, Bing Chat, and Bard for the Korean emergency medicine board examination question bank

Copyright © 2024 the Author(s). Published by Wolters Kluwer Health, Inc..

Large language models (LLMs) have been deployed in diverse fields, and the potential for their application in medicine has been explored through numerous studies. This study aimed to evaluate and compare the performance of ChatGPT-3.5, ChatGPT-4, Bing Chat, and Bard for the Emergency Medicine Board Examination question bank in the Korean language. Of the 2353 questions in the question bank, 150 questions were randomly selected, and 27 containing figures were excluded. Questions that required abilities such as analysis, creative thinking, evaluation, and synthesis were classified as higher-order questions, and those that required only recall, memory, and factual information in response were classified as lower-order questions. The answers and explanations obtained by inputting the 123 questions into the LLMs were analyzed and compared. ChatGPT-4 (75.6%) and Bing Chat (70.7%) showed higher correct response rates than ChatGPT-3.5 (56.9%) and Bard (51.2%). ChatGPT-4 showed the highest correct response rate for the higher-order questions at 76.5%, and Bard and Bing Chat showed the highest rate for the lower-order questions at 71.4%. The appropriateness of the explanation for the answer was significantly higher for ChatGPT-4 and Bing Chat than for ChatGPT-3.5 and Bard (75.6%, 68.3%, 52.8%, and 50.4%, respectively). ChatGPT-4 and Bing Chat outperformed ChatGPT-3.5 and Bard in answering a random selection of Emergency Medicine Board Examination questions in the Korean language.

Medienart:	E-Artikel

Erscheinungsjahr:	2024
Erschienen:	2024

Enthalten in:	Zur Gesamtaufnahme - volume:103
Enthalten in:	Medicine - 103(2024), 9 vom: 01. März, Seite e37325

Sprache:	Englisch

Beteiligte Personen:	Lee, Go Un [VerfasserIn] Hong, Dae Young [VerfasserIn] Kim, Sin Young [VerfasserIn] Kim, Jong Won [VerfasserIn] Lee, Young Hwan [VerfasserIn] Park, Sang O [VerfasserIn] Lee, Kyeong Ryong [VerfasserIn]

Links:	Volltext

Themen:	Comparative Study Journal Article

Anmerkungen:	Date Completed 04.03.2024 Date Revised 04.03.2024 published: Print Citation Status MEDLINE

doi:	10.1097/MD.0000000000037325

funding:
Förderinstitution / Projekttitel:

PPN (Katalog-ID):	NLM369187709

Internformat


LEADER	01000caa a22002652 4500
001	NLM369187709
003	DE-627
005	20240304232843.0
007	cr uuu---uuuuu
008	240302s2024 xx \|\|\|\|\|o 00\| \|\|eng c
024	7		\|a 10.1097/MD.0000000000037325 \|2 doi
028	5	2	\|a pubmed24n1316.xml
035			\|a (DE-627)NLM369187709
035			\|a (NLM)38428889
040			\|a DE-627 \|b ger \|c DE-627 \|e rakwb
041			\|a eng
100	1		\|a Lee, Go Un \|e verfasserin \|4 aut
245	1	0	\|a Comparison of the problem-solving performance of ChatGPT-3.5, ChatGPT-4, Bing Chat, and Bard for the Korean emergency medicine board examination question bank
264		1	\|c 2024
336			\|a Text \|b txt \|2 rdacontent
337			\|a ƒaComputermedien \|b c \|2 rdamedia
338			\|a ƒa Online-Ressource \|b cr \|2 rdacarrier
500			\|a Date Completed 04.03.2024
500			\|a Date Revised 04.03.2024
500			\|a published: Print
500			\|a Citation Status MEDLINE
520			\|a Copyright © 2024 the Author(s). Published by Wolters Kluwer Health, Inc.
520			\|a Large language models (LLMs) have been deployed in diverse fields, and the potential for their application in medicine has been explored through numerous studies. This study aimed to evaluate and compare the performance of ChatGPT-3.5, ChatGPT-4, Bing Chat, and Bard for the Emergency Medicine Board Examination question bank in the Korean language. Of the 2353 questions in the question bank, 150 questions were randomly selected, and 27 containing figures were excluded. Questions that required abilities such as analysis, creative thinking, evaluation, and synthesis were classified as higher-order questions, and those that required only recall, memory, and factual information in response were classified as lower-order questions. The answers and explanations obtained by inputting the 123 questions into the LLMs were analyzed and compared. ChatGPT-4 (75.6%) and Bing Chat (70.7%) showed higher correct response rates than ChatGPT-3.5 (56.9%) and Bard (51.2%). ChatGPT-4 showed the highest correct response rate for the higher-order questions at 76.5%, and Bard and Bing Chat showed the highest rate for the lower-order questions at 71.4%. The appropriateness of the explanation for the answer was significantly higher for ChatGPT-4 and Bing Chat than for ChatGPT-3.5 and Bard (75.6%, 68.3%, 52.8%, and 50.4%, respectively). ChatGPT-4 and Bing Chat outperformed ChatGPT-3.5 and Bard in answering a random selection of Emergency Medicine Board Examination questions in the Korean language
650		4	\|a Comparative Study
650		4	\|a Journal Article
700	1		\|a Hong, Dae Young \|e verfasserin \|4 aut
700	1		\|a Kim, Sin Young \|e verfasserin \|4 aut
700	1		\|a Kim, Jong Won \|e verfasserin \|4 aut
700	1		\|a Lee, Young Hwan \|e verfasserin \|4 aut
700	1		\|a Park, Sang O \|e verfasserin \|4 aut
700	1		\|a Lee, Kyeong Ryong \|e verfasserin \|4 aut
773	0	8	\|i Enthalten in \|t Medicine \|d 1945 \|g 103(2024), 9 vom: 01. März, Seite e37325 \|w (DE-627)NLM000020737 \|x 1536-5964 \|7 nnns
773	1	8	\|g volume:103 \|g year:2024 \|g number:9 \|g day:01 \|g month:03 \|g pages:e37325
856	4	0	\|u http://dx.doi.org/10.1097/MD.0000000000037325 \|3 Volltext
912			\|a GBV_USEFLAG_A
912			\|a GBV_NLM
951			\|a AR
952			\|d 103 \|j 2024 \|e 9 \|b 01 \|c 03 \|h e37325

Comparison of the problem-solving performance of ChatGPT-3.5, ChatGPT-4, Bing Chat, and Bard for the Korean emergency medicine board examination question bank

Zugang & Verfügbarkeit

Zugehörige Publikationen/Bände