Details der Publikation - Assessing the accuracy and completeness of artificial intelligence language models in providing information on methotrexate use

Assessing the accuracy and completeness of artificial intelligence language models in providing information on methotrexate use

© 2023. The Author(s), under exclusive licence to Springer-Verlag GmbH Germany, part of Springer Nature..

We aimed to assess Large Language Models (LLMs)-ChatGPT 3.5-4, BARD, and Bing-in their accuracy and completeness when answering Methotrexate (MTX) related questions for treating rheumatoid arthritis. We employed 23 questions from an earlier study related to MTX concerns. These questions were entered into the LLMs, and the responses generated by each model were evaluated by two reviewers using Likert scales to assess accuracy and completeness. The GPT models achieved a 100% correct answer rate, while BARD and Bing scored 73.91%. In terms of accuracy of the outputs (completely correct responses), GPT-4 achieved a score of 100%, GPT 3.5 secured 86.96%, and BARD and Bing each scored 60.87%. BARD produced 17.39% incorrect responses and 8.7% non-responses, while Bing recorded 13.04% incorrect and 13.04% non-responses. The ChatGPT models produced significantly more accurate responses than Bing for the "mechanism of action" category, and GPT-4 model showed significantly higher accuracy than BARD in the "side effects" category. There were no statistically significant differences among the models for the "lifestyle" category. GPT-4 achieved a comprehensive output of 100%, followed by GPT-3.5 at 86.96%, BARD at 60.86%, and Bing at 0%. In the "mechanism of action" category, both ChatGPT models and BARD produced significantly more comprehensive outputs than Bing. For the "side effects" and "lifestyle" categories, the ChatGPT models showed significantly higher completeness than Bing. The GPT models, particularly GPT 4, demonstrated superior performance in providing accurate and comprehensive patient information about MTX use. However, the study also identified inaccuracies and shortcomings in the generated responses.

Medienart:	E-Artikel

Erscheinungsjahr:	2024
Erschienen:	2024

Enthalten in:	Zur Gesamtaufnahme - volume:44
Enthalten in:	Rheumatology international - 44(2024), 3 vom: 01. Feb., Seite 509-515

Sprache:	Englisch

Beteiligte Personen:	Coskun, Belkis Nihan [VerfasserIn] Yagiz, Burcu [VerfasserIn] Ocakoglu, Gokhan [VerfasserIn] Dalkilic, Ediz [VerfasserIn] Pehlivan, Yavuz [VerfasserIn]

Links:	Volltext

Themen:	Accuracy Artificial intelligence Completeness Journal Article Large language models Methotrexate YL5FZ2Y5U1

Anmerkungen:	Date Completed 16.02.2024 Date Revised 16.02.2024 published: Print-Electronic Citation Status MEDLINE

doi:	10.1007/s00296-023-05473-5

funding:
Förderinstitution / Projekttitel:

PPN (Katalog-ID):	NLM36243994X

Internformat


LEADER	01000caa a22002652 4500
001	NLM36243994X
003	DE-627
005	20240216232626.0
007	cr uuu---uuuuu
008	231226s2024 xx \|\|\|\|\|o 00\| \|\|eng c
024	7		\|a 10.1007/s00296-023-05473-5 \|2 doi
028	5	2	\|a pubmed24n1295.xml
035			\|a (DE-627)NLM36243994X
035			\|a (NLM)37747564
040			\|a DE-627 \|b ger \|c DE-627 \|e rakwb
041			\|a eng
100	1		\|a Coskun, Belkis Nihan \|e verfasserin \|4 aut
245	1	0	\|a Assessing the accuracy and completeness of artificial intelligence language models in providing information on methotrexate use
264		1	\|c 2024
336			\|a Text \|b txt \|2 rdacontent
337			\|a ƒaComputermedien \|b c \|2 rdamedia
338			\|a ƒa Online-Ressource \|b cr \|2 rdacarrier
500			\|a Date Completed 16.02.2024
500			\|a Date Revised 16.02.2024
500			\|a published: Print-Electronic
500			\|a Citation Status MEDLINE
520			\|a © 2023. The Author(s), under exclusive licence to Springer-Verlag GmbH Germany, part of Springer Nature.
520			\|a We aimed to assess Large Language Models (LLMs)-ChatGPT 3.5-4, BARD, and Bing-in their accuracy and completeness when answering Methotrexate (MTX) related questions for treating rheumatoid arthritis. We employed 23 questions from an earlier study related to MTX concerns. These questions were entered into the LLMs, and the responses generated by each model were evaluated by two reviewers using Likert scales to assess accuracy and completeness. The GPT models achieved a 100% correct answer rate, while BARD and Bing scored 73.91%. In terms of accuracy of the outputs (completely correct responses), GPT-4 achieved a score of 100%, GPT 3.5 secured 86.96%, and BARD and Bing each scored 60.87%. BARD produced 17.39% incorrect responses and 8.7% non-responses, while Bing recorded 13.04% incorrect and 13.04% non-responses. The ChatGPT models produced significantly more accurate responses than Bing for the "mechanism of action" category, and GPT-4 model showed significantly higher accuracy than BARD in the "side effects" category. There were no statistically significant differences among the models for the "lifestyle" category. GPT-4 achieved a comprehensive output of 100%, followed by GPT-3.5 at 86.96%, BARD at 60.86%, and Bing at 0%. In the "mechanism of action" category, both ChatGPT models and BARD produced significantly more comprehensive outputs than Bing. For the "side effects" and "lifestyle" categories, the ChatGPT models showed significantly higher completeness than Bing. The GPT models, particularly GPT 4, demonstrated superior performance in providing accurate and comprehensive patient information about MTX use. However, the study also identified inaccuracies and shortcomings in the generated responses
650		4	\|a Journal Article
650		4	\|a Accuracy
650		4	\|a Artificial intelligence
650		4	\|a Completeness
650		4	\|a Large language models
650		4	\|a Methotrexate
650		7	\|a Methotrexate \|2 NLM
650		7	\|a YL5FZ2Y5U1 \|2 NLM
700	1		\|a Yagiz, Burcu \|e verfasserin \|4 aut
700	1		\|a Ocakoglu, Gokhan \|e verfasserin \|4 aut
700	1		\|a Dalkilic, Ediz \|e verfasserin \|4 aut
700	1		\|a Pehlivan, Yavuz \|e verfasserin \|4 aut
773	0	8	\|i Enthalten in \|t Rheumatology international \|d 1985 \|g 44(2024), 3 vom: 01. Feb., Seite 509-515 \|w (DE-627)NLM012644315 \|x 1437-160X \|7 nnns
773	1	8	\|g volume:44 \|g year:2024 \|g number:3 \|g day:01 \|g month:02 \|g pages:509-515
856	4	0	\|u http://dx.doi.org/10.1007/s00296-023-05473-5 \|3 Volltext
912			\|a GBV_USEFLAG_A
912			\|a GBV_NLM
951			\|a AR
952			\|d 44 \|j 2024 \|e 3 \|b 01 \|c 02 \|h 509-515

Assessing the accuracy and completeness of artificial intelligence language models in providing information on methotrexate use

Zugang & Verfügbarkeit

Zugehörige Publikationen/Bände