Response accuracy of ChatGPT 3.5 Copilot and Gemini in interpreting biochemical laboratory data a pilot study
© 2024. The Author(s)..
With the release of ChatGPT at the end of 2022, a new era of thinking and technology use has begun. Artificial intelligence models (AIs) like Gemini (Bard), Copilot (Bing), and ChatGPT-3.5 have the potential to impact every aspect of our lives, including laboratory data interpretation. To assess the accuracy of ChatGPT-3.5, Copilot, and Gemini responses in evaluating biochemical data. Ten simulated patients' biochemical laboratory data, including serum urea, creatinine, glucose, cholesterol, triglycerides, low-density lipoprotein (LDL-c), and high-density lipoprotein (HDL-c), in addition to HbA1c, were interpreted by three AIs: Copilot, Gemini, and ChatGPT-3.5, followed by evaluation with three raters. The study was carried out using two approaches. The first encompassed all biochemical data. The second contained only kidney function data. The first approach indicated Copilot to have the highest level of accuracy, followed by Gemini and ChatGPT-3.5. Friedman and Dunn's post-hoc test revealed that Copilot had the highest mean rank; the pairwise comparisons revealed significant differences for Copilot vs. ChatGPT-3.5 (P = 0.002) and Gemini (P = 0.008). The second approach exhibited Copilot to have the highest accuracy of performance. The Friedman test with Dunn's post-hoc analysis showed Copilot to have the highest mean rank. The Wilcoxon Signed-Rank Test demonstrated an indistinguishable response (P = 0.5) of Copilot when all laboratory data were applied vs. the application of only kidney function data. Copilot is more accurate in interpreting biochemical data than Gemini and ChatGPT-3.5. Its consistent responses across different data subsets highlight its reliability in this context.
Medienart: |
E-Artikel |
---|
Erscheinungsjahr: |
2024 |
---|---|
Erschienen: |
2024 |
Enthalten in: |
Zur Gesamtaufnahme - volume:14 |
---|---|
Enthalten in: |
Scientific reports - 14(2024), 1 vom: 08. Apr., Seite 8233 |
Sprache: |
Englisch |
---|
Beteiligte Personen: |
Kaftan, Ahmed Naseer [VerfasserIn] |
---|
Links: |
---|
Themen: |
AYI8EX34EU |
---|
Anmerkungen: |
Date Completed 10.04.2024 Date Revised 11.04.2024 published: Electronic Citation Status MEDLINE |
---|
doi: |
10.1038/s41598-024-58964-1 |
---|
funding: |
|
---|---|
Förderinstitution / Projekttitel: |
|
PPN (Katalog-ID): |
NLM37078961X |
---|
LEADER | 01000caa a22002652 4500 | ||
---|---|---|---|
001 | NLM37078961X | ||
003 | DE-627 | ||
005 | 20240411232652.0 | ||
007 | cr uuu---uuuuu | ||
008 | 240409s2024 xx |||||o 00| ||eng c | ||
024 | 7 | |a 10.1038/s41598-024-58964-1 |2 doi | |
028 | 5 | 2 | |a pubmed24n1372.xml |
035 | |a (DE-627)NLM37078961X | ||
035 | |a (NLM)38589613 | ||
040 | |a DE-627 |b ger |c DE-627 |e rakwb | ||
041 | |a eng | ||
100 | 1 | |a Kaftan, Ahmed Naseer |e verfasserin |4 aut | |
245 | 1 | 0 | |a Response accuracy of ChatGPT 3.5 Copilot and Gemini in interpreting biochemical laboratory data a pilot study |
264 | 1 | |c 2024 | |
336 | |a Text |b txt |2 rdacontent | ||
337 | |a ƒaComputermedien |b c |2 rdamedia | ||
338 | |a ƒa Online-Ressource |b cr |2 rdacarrier | ||
500 | |a Date Completed 10.04.2024 | ||
500 | |a Date Revised 11.04.2024 | ||
500 | |a published: Electronic | ||
500 | |a Citation Status MEDLINE | ||
520 | |a © 2024. The Author(s). | ||
520 | |a With the release of ChatGPT at the end of 2022, a new era of thinking and technology use has begun. Artificial intelligence models (AIs) like Gemini (Bard), Copilot (Bing), and ChatGPT-3.5 have the potential to impact every aspect of our lives, including laboratory data interpretation. To assess the accuracy of ChatGPT-3.5, Copilot, and Gemini responses in evaluating biochemical data. Ten simulated patients' biochemical laboratory data, including serum urea, creatinine, glucose, cholesterol, triglycerides, low-density lipoprotein (LDL-c), and high-density lipoprotein (HDL-c), in addition to HbA1c, were interpreted by three AIs: Copilot, Gemini, and ChatGPT-3.5, followed by evaluation with three raters. The study was carried out using two approaches. The first encompassed all biochemical data. The second contained only kidney function data. The first approach indicated Copilot to have the highest level of accuracy, followed by Gemini and ChatGPT-3.5. Friedman and Dunn's post-hoc test revealed that Copilot had the highest mean rank; the pairwise comparisons revealed significant differences for Copilot vs. ChatGPT-3.5 (P = 0.002) and Gemini (P = 0.008). The second approach exhibited Copilot to have the highest accuracy of performance. The Friedman test with Dunn's post-hoc analysis showed Copilot to have the highest mean rank. The Wilcoxon Signed-Rank Test demonstrated an indistinguishable response (P = 0.5) of Copilot when all laboratory data were applied vs. the application of only kidney function data. Copilot is more accurate in interpreting biochemical data than Gemini and ChatGPT-3.5. Its consistent responses across different data subsets highlight its reliability in this context | ||
650 | 4 | |a Journal Article | |
650 | 4 | |a Artificial intelligence models | |
650 | 4 | |a Biochemical parameters | |
650 | 4 | |a ChatGPT-3.5 | |
650 | 4 | |a Copilot | |
650 | 4 | |a Gemini | |
650 | 4 | |a Interpretation | |
650 | 7 | |a Creatinine |2 NLM | |
650 | 7 | |a AYI8EX34EU |2 NLM | |
700 | 1 | |a Hussain, Majid Kadhum |e verfasserin |4 aut | |
700 | 1 | |a Naser, Farah Hasson |e verfasserin |4 aut | |
773 | 0 | 8 | |i Enthalten in |t Scientific reports |d 2011 |g 14(2024), 1 vom: 08. Apr., Seite 8233 |w (DE-627)NLM215703936 |x 2045-2322 |7 nnns |
773 | 1 | 8 | |g volume:14 |g year:2024 |g number:1 |g day:08 |g month:04 |g pages:8233 |
856 | 4 | 0 | |u http://dx.doi.org/10.1038/s41598-024-58964-1 |3 Volltext |
912 | |a GBV_USEFLAG_A | ||
912 | |a GBV_NLM | ||
951 | |a AR | ||
952 | |d 14 |j 2024 |e 1 |b 08 |c 04 |h 8233 |