GPT-4 outperforms ChatGPT in answering non-English questions related to cirrhosis
Abstract Background and Objectives Artificial intelligence is increasingly being employed in healthcare, raising concerns about the exacerbation of disparities. This study evaluates ChatGPT and GPT-4’s ability to comprehend and respond to cirrhosis-related questions in English, Korean, Mandarin, and Spanish, addressing language barriers that may impact patient care.Methods A set of 36 cirrhosis-related questions were translated into Korean, Mandarin, and Spanish and prompted to both ChatGPT and GPT-4 models. Non-English responses were graded by native-speaking hepatologists on accuracy and similarity to English responses. Chi-square tests were used to compare the proportions of grading between ChatGPT and GPT-4.Results GPT-4 showed a marked improvement in the proportion of comprehensive and correct answers compared to ChatGPT across all four languages (p<0.05). GPT-4 demonstrated enhanced accuracy and avoided erroneous responses evident in ChatGPT’s output. Significant improvement was observed in Mandarin and Korean subgroups, with a smaller quality gap between English and non-English responses in GPT-4 compared to ChatGPT.Conclusions GPT-4 exhibited significantly higher accuracy in English and non-English cirrhosis-related questions, highlighting its potential for more accurate and reliable language model applications in diverse linguistic contexts. These advancements have important implications for patients with language discordance, contributing to equalizing health literacy on a global scale..
Medienart: |
Preprint |
---|
Erscheinungsjahr: |
2023 |
---|---|
Erschienen: |
2023 |
Enthalten in: |
bioRxiv.org - (2023) vom: 09. Mai Zur Gesamtaufnahme - year:2023 |
---|
Sprache: |
Englisch |
---|
Beteiligte Personen: |
Yeo, Yee Hui [VerfasserIn] |
---|
Links: |
Volltext [kostenfrei] |
---|
Themen: |
---|
doi: |
10.1101/2023.05.04.23289482 |
---|
funding: |
|
---|---|
Förderinstitution / Projekttitel: |
|
PPN (Katalog-ID): |
XBI03946489X |
---|
LEADER | 01000caa a22002652 4500 | ||
---|---|---|---|
001 | XBI03946489X | ||
003 | DE-627 | ||
005 | 20231205145206.0 | ||
007 | cr uuu---uuuuu | ||
008 | 230506s2023 xx |||||o 00| ||eng c | ||
024 | 7 | |a 10.1101/2023.05.04.23289482 |2 doi | |
035 | |a (DE-627)XBI03946489X | ||
035 | |a (biorXiv)10.1101/2023.05.04.23289482 | ||
040 | |a DE-627 |b ger |c DE-627 |e rakwb | ||
041 | |a eng | ||
100 | 1 | |a Yeo, Yee Hui |e verfasserin |0 (orcid)0000-0002-2703-5954 |4 aut | |
245 | 1 | 0 | |a GPT-4 outperforms ChatGPT in answering non-English questions related to cirrhosis |
264 | 1 | |c 2023 | |
336 | |a Text |b txt |2 rdacontent | ||
337 | |a Computermedien |b c |2 rdamedia | ||
338 | |a Online-Ressource |b cr |2 rdacarrier | ||
520 | |a Abstract Background and Objectives Artificial intelligence is increasingly being employed in healthcare, raising concerns about the exacerbation of disparities. This study evaluates ChatGPT and GPT-4’s ability to comprehend and respond to cirrhosis-related questions in English, Korean, Mandarin, and Spanish, addressing language barriers that may impact patient care.Methods A set of 36 cirrhosis-related questions were translated into Korean, Mandarin, and Spanish and prompted to both ChatGPT and GPT-4 models. Non-English responses were graded by native-speaking hepatologists on accuracy and similarity to English responses. Chi-square tests were used to compare the proportions of grading between ChatGPT and GPT-4.Results GPT-4 showed a marked improvement in the proportion of comprehensive and correct answers compared to ChatGPT across all four languages (p<0.05). GPT-4 demonstrated enhanced accuracy and avoided erroneous responses evident in ChatGPT’s output. Significant improvement was observed in Mandarin and Korean subgroups, with a smaller quality gap between English and non-English responses in GPT-4 compared to ChatGPT.Conclusions GPT-4 exhibited significantly higher accuracy in English and non-English cirrhosis-related questions, highlighting its potential for more accurate and reliable language model applications in diverse linguistic contexts. These advancements have important implications for patients with language discordance, contributing to equalizing health literacy on a global scale. | ||
650 | 4 | |a Biology |7 (dpeaa)DE-84 | |
650 | 4 | |a 570 |7 (dpeaa)DE-84 | |
700 | 1 | |a Samaan, Jamil S. |4 aut | |
700 | 1 | |a Ng, Wee Han |4 aut | |
700 | 1 | |a Ma, Xiaoyan |4 aut | |
700 | 1 | |a Ting, Peng-Sheng |4 aut | |
700 | 1 | |a Kwak, Min-Sun |4 aut | |
700 | 1 | |a Panduro, Arturo |4 aut | |
700 | 1 | |a Lizaola-Mayo, Blanca |4 aut | |
700 | 1 | |a Trivedi, Hirsh |4 aut | |
700 | 1 | |a Vipani, Aarshi |4 aut | |
700 | 1 | |a Ayoub, Walid |4 aut | |
700 | 1 | |a Yang, Ju Dong |4 aut | |
700 | 1 | |a Liran, Omer |4 aut | |
700 | 1 | |a Spiegel, Brennan |0 (orcid)0000-0002-4608-6896 |4 aut | |
700 | 1 | |a Kuo, Alexander |0 (orcid)0000-0002-9106-8865 |4 aut | |
773 | 0 | 8 | |i Enthalten in |t bioRxiv.org |g (2023) vom: 09. Mai |
773 | 1 | 8 | |g year:2023 |g day:09 |g month:05 |
856 | 4 | 0 | |u http://dx.doi.org/10.1101/2023.05.04.23289482 |z kostenfrei |3 Volltext |
912 | |a GBV_XBI | ||
951 | |a AR | ||
952 | |j 2023 |b 09 |c 05 |