Details der Publikation - Comparing scientific abstracts generated by ChatGPT to real abstracts with detectors and blinded human reviewers

Comparing scientific abstracts generated by ChatGPT to real abstracts with detectors and blinded human reviewers

© 2023. The Author(s)..

Large language models such as ChatGPT can produce increasingly realistic text, with unknown information on the accuracy and integrity of using these models in scientific writing. We gathered fifth research abstracts from five high-impact factor medical journals and asked ChatGPT to generate research abstracts based on their titles and journals. Most generated abstracts were detected using an AI output detector, 'GPT-2 Output Detector', with % 'fake' scores (higher meaning more likely to be generated) of median [interquartile range] of 99.98% 'fake' [12.73%, 99.98%] compared with median 0.02% [IQR 0.02%, 0.09%] for the original abstracts. The AUROC of the AI output detector was 0.94. Generated abstracts scored lower than original abstracts when run through a plagiarism detector website and iThenticate (higher scores meaning more matching text found). When given a mixture of original and general abstracts, blinded human reviewers correctly identified 68% of generated abstracts as being generated by ChatGPT, but incorrectly identified 14% of original abstracts as being generated. Reviewers indicated that it was surprisingly difficult to differentiate between the two, though abstracts they suspected were generated were vaguer and more formulaic. ChatGPT writes believable scientific abstracts, though with completely generated data. Depending on publisher-specific guidelines, AI output detectors may serve as an editorial tool to help maintain scientific standards. The boundaries of ethical and acceptable use of large language models to help scientific writing are still being discussed, and different journals and conferences are adopting varying policies.

Errataetall:	CommentIn: J Am Acad Dermatol. 2023 Sep;89(3):e127-e129. - PMID 37179029
Medienart:	E-Artikel

Erscheinungsjahr:	2023
Erschienen:	2023

Enthalten in:	Zur Gesamtaufnahme - volume:6
Enthalten in:	NPJ digital medicine - 6(2023), 1 vom: 26. Apr., Seite 75

Sprache:	Englisch

Beteiligte Personen:	Gao, Catherine A [VerfasserIn] Howard, Frederick M [VerfasserIn] Markov, Nikolay S [VerfasserIn] Dyer, Emma C [VerfasserIn] Ramesh, Siddhi [VerfasserIn] Luo, Yuan [VerfasserIn] Pearson, Alexander T [VerfasserIn]

Links:	Volltext

Themen:	Journal Article

Anmerkungen:	Date Revised 30.08.2023 published: Electronic CommentIn: J Am Acad Dermatol. 2023 Sep;89(3):e127-e129. - PMID 37179029 Citation Status PubMed-not-MEDLINE

doi:	10.1038/s41746-023-00819-6

funding:
Förderinstitution / Projekttitel:

PPN (Katalog-ID):	NLM356044890

Internformat


LEADER	01000naa a22002652 4500
001	NLM356044890
003	DE-627
005	20231226065626.0
007	cr uuu---uuuuu
008	231226s2023 xx \|\|\|\|\|o 00\| \|\|eng c
024	7		\|a 10.1038/s41746-023-00819-6 \|2 doi
028	5	2	\|a pubmed24n1186.xml
035			\|a (DE-627)NLM356044890
035			\|a (NLM)37100871
040			\|a DE-627 \|b ger \|c DE-627 \|e rakwb
041			\|a eng
100	1		\|a Gao, Catherine A \|e verfasserin \|4 aut
245	1	0	\|a Comparing scientific abstracts generated by ChatGPT to real abstracts with detectors and blinded human reviewers
264		1	\|c 2023
336			\|a Text \|b txt \|2 rdacontent
337			\|a ƒaComputermedien \|b c \|2 rdamedia
338			\|a ƒa Online-Ressource \|b cr \|2 rdacarrier
500			\|a Date Revised 30.08.2023
500			\|a published: Electronic
500			\|a CommentIn: J Am Acad Dermatol. 2023 Sep;89(3):e127-e129. - PMID 37179029
500			\|a Citation Status PubMed-not-MEDLINE
520			\|a © 2023. The Author(s).
520			\|a Large language models such as ChatGPT can produce increasingly realistic text, with unknown information on the accuracy and integrity of using these models in scientific writing. We gathered fifth research abstracts from five high-impact factor medical journals and asked ChatGPT to generate research abstracts based on their titles and journals. Most generated abstracts were detected using an AI output detector, 'GPT-2 Output Detector', with % 'fake' scores (higher meaning more likely to be generated) of median [interquartile range] of 99.98% 'fake' [12.73%, 99.98%] compared with median 0.02% [IQR 0.02%, 0.09%] for the original abstracts. The AUROC of the AI output detector was 0.94. Generated abstracts scored lower than original abstracts when run through a plagiarism detector website and iThenticate (higher scores meaning more matching text found). When given a mixture of original and general abstracts, blinded human reviewers correctly identified 68% of generated abstracts as being generated by ChatGPT, but incorrectly identified 14% of original abstracts as being generated. Reviewers indicated that it was surprisingly difficult to differentiate between the two, though abstracts they suspected were generated were vaguer and more formulaic. ChatGPT writes believable scientific abstracts, though with completely generated data. Depending on publisher-specific guidelines, AI output detectors may serve as an editorial tool to help maintain scientific standards. The boundaries of ethical and acceptable use of large language models to help scientific writing are still being discussed, and different journals and conferences are adopting varying policies
650		4	\|a Journal Article
700	1		\|a Howard, Frederick M \|e verfasserin \|4 aut
700	1		\|a Markov, Nikolay S \|e verfasserin \|4 aut
700	1		\|a Dyer, Emma C \|e verfasserin \|4 aut
700	1		\|a Ramesh, Siddhi \|e verfasserin \|4 aut
700	1		\|a Luo, Yuan \|e verfasserin \|4 aut
700	1		\|a Pearson, Alexander T \|e verfasserin \|4 aut
773	0	8	\|i Enthalten in \|t NPJ digital medicine \|d 2018 \|g 6(2023), 1 vom: 26. Apr., Seite 75 \|w (DE-627)NLM293151253 \|x 2398-6352 \|7 nnns
773	1	8	\|g volume:6 \|g year:2023 \|g number:1 \|g day:26 \|g month:04 \|g pages:75
856	4	0	\|u http://dx.doi.org/10.1038/s41746-023-00819-6 \|3 Volltext
912			\|a GBV_USEFLAG_A
912			\|a GBV_NLM
951			\|a AR
952			\|d 6 \|j 2023 \|e 1 \|b 26 \|c 04 \|h 75

Comparing scientific abstracts generated by ChatGPT to real abstracts with detectors and blinded human reviewers

Zugang & Verfügbarkeit

Zugehörige Publikationen/Bände