Details der Publikation - Performance of the Large Language Model ChatGPT on the National Nurse Examinations in Japan

Performance of the Large Language Model ChatGPT on the National Nurse Examinations in Japan : Evaluation Study

©Kazuya Taira, Takahiro Itaya, Ayame Hanada. Originally published in JMIR Nursing (https://nursing.jmir.org), 27.06.2023..

BACKGROUND: ChatGPT, a large language model, has shown good performance on physician certification examinations and medical consultations. However, its performance has not been examined in languages other than English or on nursing examinations.

OBJECTIVE: We aimed to evaluate the performance of ChatGPT on the Japanese National Nurse Examinations.

METHODS: We evaluated the percentages of correct answers provided by ChatGPT (GPT-3.5) for all questions on the Japanese National Nurse Examinations from 2019 to 2023, excluding inappropriate questions and those containing images. Inappropriate questions were pointed out by a third-party organization and announced by the government to be excluded from scoring. Specifically, these include "questions with inappropriate question difficulty" and "questions with errors in the questions or choices." These examinations consist of 240 questions each year, divided into basic knowledge questions that test the basic issues of particular importance to nurses and general questions that test a wide range of specialized knowledge. Furthermore, the questions had 2 types of formats: simple-choice and situation-setup questions. Simple-choice questions are primarily knowledge-based and multiple-choice, whereas situation-setup questions entail the candidate reading a patient's and family situation's description, and selecting the nurse's action or patient's response. Hence, the questions were standardized using 2 types of prompts before requesting answers from ChatGPT. Chi-square tests were conducted to compare the percentage of correct answers for each year's examination format and specialty area related to the question. In addition, a Cochran-Armitage trend test was performed with the percentage of correct answers from 2019 to 2023.

RESULTS: The 5-year average percentage of correct answers for ChatGPT was 75.1% (SD 3%) for basic knowledge questions and 64.5% (SD 5%) for general questions. The highest percentage of correct answers on the 2019 examination was 80% for basic knowledge questions and 71.2% for general questions. ChatGPT met the passing criteria for the 2019 Japanese National Nurse Examination and was close to passing the 2020-2023 examinations, with only a few more correct answers required to pass. ChatGPT had a lower percentage of correct answers in some areas, such as pharmacology, social welfare, related law and regulations, endocrinology/metabolism, and dermatology, and a higher percentage of correct answers in the areas of nutrition, pathology, hematology, ophthalmology, otolaryngology, dentistry and dental surgery, and nursing integration and practice.

CONCLUSIONS: ChatGPT only passed the 2019 Japanese National Nursing Examination during the most recent 5 years. Although it did not pass the examinations from other years, it performed very close to the passing level, even in those containing questions related to psychology, communication, and nursing.

Medienart:	E-Artikel

Erscheinungsjahr:	2023
Erschienen:	2023

Enthalten in:	Zur Gesamtaufnahme - volume:6
Enthalten in:	JMIR nursing - 6(2023) vom: 27. Juni, Seite e47305

Sprache:	Englisch

Beteiligte Personen:	Taira, Kazuya [VerfasserIn] Itaya, Takahiro [VerfasserIn] Hanada, Ayame [VerfasserIn]

Links:	Volltext

Themen:	Artificial intelligence ChatGPT Japan Journal Article National Nurse Examination Natural language processing Registered nurses

Anmerkungen:	Date Revised 14.07.2023 published: Electronic Citation Status PubMed-not-MEDLINE

doi:	10.2196/47305

funding:
Förderinstitution / Projekttitel:

PPN (Katalog-ID):	NLM358698111

Internformat


LEADER	01000naa a22002652 4500
001	NLM358698111
003	DE-627
005	20231226075309.0
007	cr uuu---uuuuu
008	231226s2023 xx \|\|\|\|\|o 00\| \|\|eng c
024	7		\|a 10.2196/47305 \|2 doi
028	5	2	\|a pubmed24n1195.xml
035			\|a (DE-627)NLM358698111
035			\|a (NLM)37368470
040			\|a DE-627 \|b ger \|c DE-627 \|e rakwb
041			\|a eng
100	1		\|a Taira, Kazuya \|e verfasserin \|4 aut
245	1	0	\|a Performance of the Large Language Model ChatGPT on the National Nurse Examinations in Japan \|b Evaluation Study
264		1	\|c 2023
336			\|a Text \|b txt \|2 rdacontent
337			\|a ƒaComputermedien \|b c \|2 rdamedia
338			\|a ƒa Online-Ressource \|b cr \|2 rdacarrier
500			\|a Date Revised 14.07.2023
500			\|a published: Electronic
500			\|a Citation Status PubMed-not-MEDLINE
520			\|a ©Kazuya Taira, Takahiro Itaya, Ayame Hanada. Originally published in JMIR Nursing (https://nursing.jmir.org), 27.06.2023.
520			\|a BACKGROUND: ChatGPT, a large language model, has shown good performance on physician certification examinations and medical consultations. However, its performance has not been examined in languages other than English or on nursing examinations
520			\|a OBJECTIVE: We aimed to evaluate the performance of ChatGPT on the Japanese National Nurse Examinations
520			\|a METHODS: We evaluated the percentages of correct answers provided by ChatGPT (GPT-3.5) for all questions on the Japanese National Nurse Examinations from 2019 to 2023, excluding inappropriate questions and those containing images. Inappropriate questions were pointed out by a third-party organization and announced by the government to be excluded from scoring. Specifically, these include "questions with inappropriate question difficulty" and "questions with errors in the questions or choices." These examinations consist of 240 questions each year, divided into basic knowledge questions that test the basic issues of particular importance to nurses and general questions that test a wide range of specialized knowledge. Furthermore, the questions had 2 types of formats: simple-choice and situation-setup questions. Simple-choice questions are primarily knowledge-based and multiple-choice, whereas situation-setup questions entail the candidate reading a patient's and family situation's description, and selecting the nurse's action or patient's response. Hence, the questions were standardized using 2 types of prompts before requesting answers from ChatGPT. Chi-square tests were conducted to compare the percentage of correct answers for each year's examination format and specialty area related to the question. In addition, a Cochran-Armitage trend test was performed with the percentage of correct answers from 2019 to 2023
520			\|a RESULTS: The 5-year average percentage of correct answers for ChatGPT was 75.1% (SD 3%) for basic knowledge questions and 64.5% (SD 5%) for general questions. The highest percentage of correct answers on the 2019 examination was 80% for basic knowledge questions and 71.2% for general questions. ChatGPT met the passing criteria for the 2019 Japanese National Nurse Examination and was close to passing the 2020-2023 examinations, with only a few more correct answers required to pass. ChatGPT had a lower percentage of correct answers in some areas, such as pharmacology, social welfare, related law and regulations, endocrinology/metabolism, and dermatology, and a higher percentage of correct answers in the areas of nutrition, pathology, hematology, ophthalmology, otolaryngology, dentistry and dental surgery, and nursing integration and practice
520			\|a CONCLUSIONS: ChatGPT only passed the 2019 Japanese National Nursing Examination during the most recent 5 years. Although it did not pass the examinations from other years, it performed very close to the passing level, even in those containing questions related to psychology, communication, and nursing
650		4	\|a Journal Article
650		4	\|a ChatGPT
650		4	\|a Japan
650		4	\|a National Nurse Examination
650		4	\|a artificial intelligence
650		4	\|a natural language processing
650		4	\|a registered nurses
700	1		\|a Itaya, Takahiro \|e verfasserin \|4 aut
700	1		\|a Hanada, Ayame \|e verfasserin \|4 aut
773	0	8	\|i Enthalten in \|t JMIR nursing \|d 2018 \|g 6(2023) vom: 27. Juni, Seite e47305 \|w (DE-627)NLM314699953 \|x 2562-7600 \|7 nnns
773	1	8	\|g volume:6 \|g year:2023 \|g day:27 \|g month:06 \|g pages:e47305
856	4	0	\|u http://dx.doi.org/10.2196/47305 \|3 Volltext
912			\|a GBV_USEFLAG_A
912			\|a GBV_NLM
951			\|a AR
952			\|d 6 \|j 2023 \|b 27 \|c 06 \|h e47305

Performance of the Large Language Model ChatGPT on the National Nurse Examinations in Japan : Evaluation Study

Zugang & Verfügbarkeit

Zugehörige Publikationen/Bände