Details der Publikation - Superior Performance of Artificial Intelligence Models in English Compared to Arabic in Infectious Disease Queries

Superior Performance of Artificial Intelligence Models in English Compared to Arabic in Infectious Disease Queries

Abstract Background Assessment of artificial intelligence (AI)-based models across languages is crucial to ensure equitable access and accuracy of information in multilingual contexts. This study aimed to compare AI model efficiency in English and Arabic for infectious disease queries. Methods The study employed the METRICS checklist for the design and reporting of AI-based studies in healthcare. The AI models tested included ChatGPT-3.5, ChatGPT-4, Bing, and Bard. The queries comprised 15 questions on HIV/AIDS, tuberculosis, malaria, COVID-19, and influenza. The AI-generated content was assessed by two bilingual experts using the validated CLEAR tool. Results In comparing AI models' performance in English and Arabic for infectious disease queries, variability was noted. English queries showed consistently superior performance, with Bard leading, followed by Bing, ChatGPT-4, and ChatGPT-3.5 (P = .012). The same trend was observed in Arabic, albeit without statistical significance (P = .082). Stratified analysis revealed higher scores for English in most CLEAR components, notably in completeness, accuracy, appropriateness, and relevance, especially with ChatGPT-3.5 and Bard. Across the five infectious disease topics, English outperformed Arabic, except for flu queries in Bing and Bard. The four AI models' performance in English was rated as “excellent”, significantly outperforming their “above-average” Arabic counterparts (P = .002). Conclusions Disparity in AI model performance was noticed between English and Arabic in response to infectious disease queries. This language variation can negatively impact the quality of health content delivered by AI models among native speakers of Arabic. This issue is recommended to be addressed by AI developers, with the ultimate goal of enhancing health outcomes..

Medienart:	Preprint

Erscheinungsjahr:	2024
Erschienen:	2024

Enthalten in:	ResearchSquare.com - (2024) vom: 11. Jan. Zur Gesamtaufnahme - year:2024

Sprache:	Englisch

Beteiligte Personen:	Sallam, Malik [VerfasserIn] Al-Mahzoum, Kholoud [VerfasserIn] Alshuaib, Omaima [VerfasserIn] Alhajri, Hawajer [VerfasserIn] Alotaibi, Fatmah [VerfasserIn] Alkhurainej, Dalal [VerfasserIn] Al-Balwah, Mohammad Yahya [VerfasserIn] Barakat, Muna [VerfasserIn] Egger, Jan [VerfasserIn]

Links:	Volltext [kostenfrei]

Themen:	570 Biology

doi:	10.21203/rs.3.rs-3830452/v1

funding:
Förderinstitution / Projekttitel:

PPN (Katalog-ID):	XRA042144760

Internformat


LEADER	01000naa a22002652 4500
001	XRA042144760
003	DE-627
005	20240112131153.0
007	cr uuu---uuuuu
008	240112s2024 xx \|\|\|\|\|o 00\| \|\|eng c
024	7		\|a 10.21203/rs.3.rs-3830452/v1 \|2 doi
035			\|a (DE-627)XRA042144760
035			\|a (ResearchSquare)10.21203/rs.3.rs-3830452/v1
040			\|a DE-627 \|b ger \|c DE-627 \|e rakwb
041			\|a eng
100	1		\|a Sallam, Malik \|e verfasserin \|4 aut
245	1	0	\|a Superior Performance of Artificial Intelligence Models in English Compared to Arabic in Infectious Disease Queries
264		1	\|c 2024
336			\|a Text \|b txt \|2 rdacontent
337			\|a Computermedien \|b c \|2 rdamedia
338			\|a Online-Ressource \|b cr \|2 rdacarrier
520			\|a Abstract Background Assessment of artificial intelligence (AI)-based models across languages is crucial to ensure equitable access and accuracy of information in multilingual contexts. This study aimed to compare AI model efficiency in English and Arabic for infectious disease queries. Methods The study employed the METRICS checklist for the design and reporting of AI-based studies in healthcare. The AI models tested included ChatGPT-3.5, ChatGPT-4, Bing, and Bard. The queries comprised 15 questions on HIV/AIDS, tuberculosis, malaria, COVID-19, and influenza. The AI-generated content was assessed by two bilingual experts using the validated CLEAR tool. Results In comparing AI models' performance in English and Arabic for infectious disease queries, variability was noted. English queries showed consistently superior performance, with Bard leading, followed by Bing, ChatGPT-4, and ChatGPT-3.5 (P = .012). The same trend was observed in Arabic, albeit without statistical significance (P = .082). Stratified analysis revealed higher scores for English in most CLEAR components, notably in completeness, accuracy, appropriateness, and relevance, especially with ChatGPT-3.5 and Bard. Across the five infectious disease topics, English outperformed Arabic, except for flu queries in Bing and Bard. The four AI models' performance in English was rated as “excellent”, significantly outperforming their “above-average” Arabic counterparts (P = .002). Conclusions Disparity in AI model performance was noticed between English and Arabic in response to infectious disease queries. This language variation can negatively impact the quality of health content delivered by AI models among native speakers of Arabic. This issue is recommended to be addressed by AI developers, with the ultimate goal of enhancing health outcomes.
650		4	\|a Biology \|7 (dpeaa)DE-84
650		4	\|a 570 \|7 (dpeaa)DE-84
700	1		\|a Al-Mahzoum, Kholoud \|4 aut
700	1		\|a Alshuaib, Omaima \|4 aut
700	1		\|a Alhajri, Hawajer \|4 aut
700	1		\|a Alotaibi, Fatmah \|4 aut
700	1		\|a Alkhurainej, Dalal \|4 aut
700	1		\|a Al-Balwah, Mohammad Yahya \|4 aut
700	1		\|a Barakat, Muna \|4 aut
700	1		\|a Egger, Jan \|4 aut
773	0	8	\|i Enthalten in \|t ResearchSquare.com \|g (2024) vom: 11. Jan.
773	1	8	\|g year:2024 \|g day:11 \|g month:01
856	4	0	\|u http://dx.doi.org/10.21203/rs.3.rs-3830452/v1 \|z kostenfrei \|3 Volltext
912			\|a GBV_XRA
951			\|a AR
952			\|j 2024 \|b 11 \|c 01

Superior Performance of Artificial Intelligence Models in English Compared to Arabic in Infectious Disease Queries

Zugang & Verfügbarkeit

Zugehörige Publikationen/Bände