Even with ChatGPT, race matters
Copyright © 2024 Elsevier Inc. All rights reserved..
BACKGROUND: Applications of large language models such as ChatGPT are increasingly being studied. Before these technologies become entrenched, it is crucial to analyze whether they perpetuate racial inequities.
METHODS: We asked Open AI's ChatGPT-3.5 and ChatGPT-4 to simplify 750 radiology reports with the prompt "I am a ___ patient. Simplify this radiology report:" while providing the context of the five major racial classifications on the U.S. census: White, Black or African American, American Indian or Alaska Native, Asian, and Native Hawaiian or other Pacific Islander. To ensure an unbiased analysis, the readability scores of the outputs were calculated and compared.
RESULTS: Statistically significant differences were found in both models based on the racial context. For ChatGPT-3.5, output for White and Asian was at a significantly higher reading grade level than both Black or African American and American Indian or Alaska Native, among other differences. For ChatGPT-4, output for Asian was at a significantly higher reading grade level than American Indian or Alaska Native and Native Hawaiian or other Pacific Islander, among other differences.
CONCLUSION: Here, we tested an application where we would expect no differences in output based on racial classification. Hence, the differences found are alarming and demonstrate that the medical community must remain vigilant to ensure large language models do not provide biased or otherwise harmful outputs.
Medienart: |
E-Artikel |
---|
Erscheinungsjahr: |
2024 |
---|---|
Erschienen: |
2024 |
Enthalten in: |
Zur Gesamtaufnahme - volume:109 |
---|---|
Enthalten in: |
Clinical imaging - 109(2024) vom: 29. Apr., Seite 110113 |
Sprache: |
Englisch |
---|
Beteiligte Personen: |
Amin, Kanhai S [VerfasserIn] |
---|
Links: |
---|
Themen: |
ChatGPT |
---|
Anmerkungen: |
Date Completed 17.04.2024 Date Revised 17.04.2024 published: Print-Electronic Citation Status MEDLINE |
---|
doi: |
10.1016/j.clinimag.2024.110113 |
---|
funding: |
|
---|---|
Förderinstitution / Projekttitel: |
|
PPN (Katalog-ID): |
NLM370419081 |
---|
LEADER | 01000caa a22002652 4500 | ||
---|---|---|---|
001 | NLM370419081 | ||
003 | DE-627 | ||
005 | 20240417232813.0 | ||
007 | cr uuu---uuuuu | ||
008 | 240331s2024 xx |||||o 00| ||eng c | ||
024 | 7 | |a 10.1016/j.clinimag.2024.110113 |2 doi | |
028 | 5 | 2 | |a pubmed24n1378.xml |
035 | |a (DE-627)NLM370419081 | ||
035 | |a (NLM)38552383 | ||
035 | |a (PII)S0899-7071(24)00043-3 | ||
040 | |a DE-627 |b ger |c DE-627 |e rakwb | ||
041 | |a eng | ||
100 | 1 | |a Amin, Kanhai S |e verfasserin |4 aut | |
245 | 1 | 0 | |a Even with ChatGPT, race matters |
264 | 1 | |c 2024 | |
336 | |a Text |b txt |2 rdacontent | ||
337 | |a ƒaComputermedien |b c |2 rdamedia | ||
338 | |a ƒa Online-Ressource |b cr |2 rdacarrier | ||
500 | |a Date Completed 17.04.2024 | ||
500 | |a Date Revised 17.04.2024 | ||
500 | |a published: Print-Electronic | ||
500 | |a Citation Status MEDLINE | ||
520 | |a Copyright © 2024 Elsevier Inc. All rights reserved. | ||
520 | |a BACKGROUND: Applications of large language models such as ChatGPT are increasingly being studied. Before these technologies become entrenched, it is crucial to analyze whether they perpetuate racial inequities | ||
520 | |a METHODS: We asked Open AI's ChatGPT-3.5 and ChatGPT-4 to simplify 750 radiology reports with the prompt "I am a ___ patient. Simplify this radiology report:" while providing the context of the five major racial classifications on the U.S. census: White, Black or African American, American Indian or Alaska Native, Asian, and Native Hawaiian or other Pacific Islander. To ensure an unbiased analysis, the readability scores of the outputs were calculated and compared | ||
520 | |a RESULTS: Statistically significant differences were found in both models based on the racial context. For ChatGPT-3.5, output for White and Asian was at a significantly higher reading grade level than both Black or African American and American Indian or Alaska Native, among other differences. For ChatGPT-4, output for Asian was at a significantly higher reading grade level than American Indian or Alaska Native and Native Hawaiian or other Pacific Islander, among other differences | ||
520 | |a CONCLUSION: Here, we tested an application where we would expect no differences in output based on racial classification. Hence, the differences found are alarming and demonstrate that the medical community must remain vigilant to ensure large language models do not provide biased or otherwise harmful outputs | ||
650 | 4 | |a Journal Article | |
650 | 4 | |a ChatGPT | |
650 | 4 | |a Health equity | |
650 | 4 | |a Implicit bias | |
650 | 4 | |a Large language models | |
650 | 4 | |a Radiology report | |
700 | 1 | |a Forman, Howard P |e verfasserin |4 aut | |
700 | 1 | |a Davis, Melissa A |e verfasserin |4 aut | |
773 | 0 | 8 | |i Enthalten in |t Clinical imaging |d 1996 |g 109(2024) vom: 29. Apr., Seite 110113 |w (DE-627)NLM012624381 |x 1873-4499 |7 nnns |
773 | 1 | 8 | |g volume:109 |g year:2024 |g day:29 |g month:04 |g pages:110113 |
856 | 4 | 0 | |u http://dx.doi.org/10.1016/j.clinimag.2024.110113 |3 Volltext |
912 | |a GBV_USEFLAG_A | ||
912 | |a GBV_NLM | ||
951 | |a AR | ||
952 | |d 109 |j 2024 |b 29 |c 04 |h 110113 |