Details der Publikation - Evaluating the Artificial Intelligence Performance Growth in Ophthalmic Knowledge

Evaluating the Artificial Intelligence Performance Growth in Ophthalmic Knowledge

Copyright © 2023, Jiao et al..

OBJECTIVE: We aim to compare the capabilities of Chat Generative Pre-Trained Transformer (ChatGPT)-3.5 and ChatGPT-4.0 (OpenAI, San Francisco, CA, USA) in addressing multiple-choice ophthalmic case challenges.

METHODS AND ANALYSIS: Both models' accuracy was compared across different ophthalmology subspecialties using multiple-choice ophthalmic clinical cases provided by the American Academy of Ophthalmology (AAO) "Diagnosis This" questions. Additional analysis was based on image content, question difficulty, character length of models' responses, and model's alignment with responses from human respondents. χ2 test, Fisher's exact test, Student's t-test, and one-way analysis of variance (ANOVA) were conducted where appropriate, with p<0.05 considered significant.

RESULTS: GPT-4.0 significantly outperformed GPT-3.5 (75% versus 46%, p<0.01), with the most noticeable improvement in neuro-ophthalmology (100% versus 38%, p=0.03). While both models struggled with uveitis and refractive questions, GPT-4.0 excelled in other areas, such as pediatric questions (82%). In image-related questions, GPT-4.0 also displayed superior accuracy that trended toward significance (73% versus 46%, p=0.07). GPT-4.0 performed better with easier questions (93.8% (least difficult) versus 76.2% (middle) versus 53.3% (most), p=0.03) and generated more concise answers than GPT-3.5 (651.7±342.9 versus 1,112.9±328.8 characters, p<0.01). Moreover, GPT-4.0's answers were more in line with those of AAO respondents (57.3% versus 41.4%, p<0.01), showing a strong correlation between its accuracy and the proportion of AAO respondents who selected GPT-4.0's answer (ρ=0.713, p<0.01).

CONCLUSION AND RELEVANCE: Our study demonstrated that GPT-4.0 significantly outperforms GPT-3.5 in addressing ophthalmic case challenges, especially in neuro-ophthalmology, with improved accuracy even in image-related questions. These findings underscore the potential of advancing artificial intelligence (AI) models in enhancing ophthalmic diagnostics and medical education.

Medienart:	E-Artikel

Erscheinungsjahr:	2023
Erschienen:	2023

Enthalten in:	Zur Gesamtaufnahme - volume:15
Enthalten in:	Cureus - 15(2023), 9 vom: 30. Sept., Seite e45700

Sprache:	Englisch

Beteiligte Personen:	Jiao, Cheng [VerfasserIn] Edupuganti, Neel R [VerfasserIn] Patel, Parth A [VerfasserIn] Bui, Tommy [VerfasserIn] Sheth, Veeral [VerfasserIn]

Links:	Volltext

Themen:	Artificial intelligence Chatgpt Journal Article Medical education Natural language processing models Ophthalmology

Anmerkungen:	Date Revised 31.10.2023 published: Electronic-eCollection Citation Status PubMed-not-MEDLINE

doi:	10.7759/cureus.45700

funding:
Förderinstitution / Projekttitel:

PPN (Katalog-ID):	NLM363607188

Internformat


LEADER	01000naa a22002652 4500
001	NLM363607188
003	DE-627
005	20231226093715.0
007	cr uuu---uuuuu
008	231226s2023 xx \|\|\|\|\|o 00\| \|\|eng c
024	7		\|a 10.7759/cureus.45700 \|2 doi
028	5	2	\|a pubmed24n1211.xml
035			\|a (DE-627)NLM363607188
035			\|a (NLM)37868408
040			\|a DE-627 \|b ger \|c DE-627 \|e rakwb
041			\|a eng
100	1		\|a Jiao, Cheng \|e verfasserin \|4 aut
245	1	0	\|a Evaluating the Artificial Intelligence Performance Growth in Ophthalmic Knowledge
264		1	\|c 2023
336			\|a Text \|b txt \|2 rdacontent
337			\|a ƒaComputermedien \|b c \|2 rdamedia
338			\|a ƒa Online-Ressource \|b cr \|2 rdacarrier
500			\|a Date Revised 31.10.2023
500			\|a published: Electronic-eCollection
500			\|a Citation Status PubMed-not-MEDLINE
520			\|a Copyright © 2023, Jiao et al.
520			\|a OBJECTIVE: We aim to compare the capabilities of Chat Generative Pre-Trained Transformer (ChatGPT)-3.5 and ChatGPT-4.0 (OpenAI, San Francisco, CA, USA) in addressing multiple-choice ophthalmic case challenges
520			\|a METHODS AND ANALYSIS: Both models' accuracy was compared across different ophthalmology subspecialties using multiple-choice ophthalmic clinical cases provided by the American Academy of Ophthalmology (AAO) "Diagnosis This" questions. Additional analysis was based on image content, question difficulty, character length of models' responses, and model's alignment with responses from human respondents. χ2 test, Fisher's exact test, Student's t-test, and one-way analysis of variance (ANOVA) were conducted where appropriate, with p<0.05 considered significant
520			\|a RESULTS: GPT-4.0 significantly outperformed GPT-3.5 (75% versus 46%, p<0.01), with the most noticeable improvement in neuro-ophthalmology (100% versus 38%, p=0.03). While both models struggled with uveitis and refractive questions, GPT-4.0 excelled in other areas, such as pediatric questions (82%). In image-related questions, GPT-4.0 also displayed superior accuracy that trended toward significance (73% versus 46%, p=0.07). GPT-4.0 performed better with easier questions (93.8% (least difficult) versus 76.2% (middle) versus 53.3% (most), p=0.03) and generated more concise answers than GPT-3.5 (651.7±342.9 versus 1,112.9±328.8 characters, p<0.01). Moreover, GPT-4.0's answers were more in line with those of AAO respondents (57.3% versus 41.4%, p<0.01), showing a strong correlation between its accuracy and the proportion of AAO respondents who selected GPT-4.0's answer (ρ=0.713, p<0.01)
520			\|a CONCLUSION AND RELEVANCE: Our study demonstrated that GPT-4.0 significantly outperforms GPT-3.5 in addressing ophthalmic case challenges, especially in neuro-ophthalmology, with improved accuracy even in image-related questions. These findings underscore the potential of advancing artificial intelligence (AI) models in enhancing ophthalmic diagnostics and medical education
650		4	\|a Journal Article
650		4	\|a artificial intelligence
650		4	\|a chatgpt
650		4	\|a medical education
650		4	\|a natural language processing models
650		4	\|a ophthalmology
700	1		\|a Edupuganti, Neel R \|e verfasserin \|4 aut
700	1		\|a Patel, Parth A \|e verfasserin \|4 aut
700	1		\|a Bui, Tommy \|e verfasserin \|4 aut
700	1		\|a Sheth, Veeral \|e verfasserin \|4 aut
773	0	8	\|i Enthalten in \|t Cureus \|d 2013 \|g 15(2023), 9 vom: 30. Sept., Seite e45700 \|w (DE-627)NLM24118083X \|x 2168-8184 \|7 nnns
773	1	8	\|g volume:15 \|g year:2023 \|g number:9 \|g day:30 \|g month:09 \|g pages:e45700
856	4	0	\|u http://dx.doi.org/10.7759/cureus.45700 \|3 Volltext
912			\|a GBV_USEFLAG_A
912			\|a GBV_NLM
951			\|a AR
952			\|d 15 \|j 2023 \|e 9 \|b 30 \|c 09 \|h e45700

Evaluating the Artificial Intelligence Performance Growth in Ophthalmic Knowledge

Zugang & Verfügbarkeit

Zugehörige Publikationen/Bände