Details der Publikation - Automated HEART score determination via ChatGPT

Automated HEART score determination via ChatGPT : Honing a framework for iterative prompt development

© 2024 The Authors. Journal of the American College of Emergency Physicians Open published by Wiley Periodicals LLC on behalf of American College of Emergency Physicians..

Objectives: This study presents a design framework to enhance the accuracy by which large language models (LLMs), like ChatGPT can extract insights from clinical notes. We highlight this framework via prompt refinement for the automated determination of HEART (History, ECG, Age, Risk factors, Troponin risk algorithm) scores in chest pain evaluation.

Methods: We developed a pipeline for LLM prompt testing, employing stochastic repeat testing and quantifying response errors relative to physician assessment. We evaluated the pipeline for automated HEART score determination across a limited set of 24 synthetic clinical notes representing four simulated patients. To assess whether iterative prompt design could improve the LLMs' ability to extract complex clinical concepts and apply rule-based logic to translate them to HEART subscores, we monitored diagnostic performance during prompt iteration.

Results: Validation included three iterative rounds of prompt improvement for three HEART subscores with 25 repeat trials totaling 1200 queries each for GPT-3.5 and GPT-4. For both LLM models, from initial to final prompt design, there was a decrease in the rate of responses with erroneous, non-numerical subscore answers. Accuracy of numerical responses for HEART subscores (discrete 0-2 point scale) improved for GPT-4 from the initial to final prompt iteration, decreasing from a mean error of 0.16-0.10 (95% confidence interval: 0.07-0.14) points.

Conclusion: We established a framework for iterative prompt design in the clinical space. Although the results indicate potential for integrating LLMs in structured clinical note analysis, translation to real, large-scale clinical data with appropriate data privacy safeguards is needed.

Medienart:	E-Artikel

Erscheinungsjahr:	2024
Erschienen:	2024

Enthalten in:	Zur Gesamtaufnahme - volume:5
Enthalten in:	Journal of the American College of Emergency Physicians open - 5(2024), 2 vom: 19. März, Seite e13133

Sprache:	Englisch

Beteiligte Personen:	Safranek, Conrad W [VerfasserIn] Huang, Thomas [VerfasserIn] Wright, Donald S [VerfasserIn] Wright, Catherine X [VerfasserIn] Socrates, Vimig [VerfasserIn] Sangal, Rohit B [VerfasserIn] Iscoe, Mark [VerfasserIn] Chartash, David [VerfasserIn] Taylor, R Andrew [VerfasserIn]

Links:	Volltext

Themen:	Artificial intelligence in medicine ChatGPT Clinical decision support systems Clinical note analysis Emergency department risk algorithms HEART score Journal Article Large language models Natural language processing Prompt engineering

Anmerkungen:	Date Revised 16.03.2024 published: Electronic-eCollection Citation Status PubMed-not-MEDLINE

doi:	10.1002/emp2.13133

funding:
Förderinstitution / Projekttitel:

PPN (Katalog-ID):	NLM369712072

Internformat


LEADER	01000caa a22002652 4500
001	NLM369712072
003	DE-627
005	20240316233027.0
007	cr uuu---uuuuu
008	240315s2024 xx \|\|\|\|\|o 00\| \|\|eng c
024	7		\|a 10.1002/emp2.13133 \|2 doi
028	5	2	\|a pubmed24n1332.xml
035			\|a (DE-627)NLM369712072
035			\|a (NLM)38481520
040			\|a DE-627 \|b ger \|c DE-627 \|e rakwb
041			\|a eng
100	1		\|a Safranek, Conrad W \|e verfasserin \|4 aut
245	1	0	\|a Automated HEART score determination via ChatGPT \|b Honing a framework for iterative prompt development
264		1	\|c 2024
336			\|a Text \|b txt \|2 rdacontent
337			\|a ƒaComputermedien \|b c \|2 rdamedia
338			\|a ƒa Online-Ressource \|b cr \|2 rdacarrier
500			\|a Date Revised 16.03.2024
500			\|a published: Electronic-eCollection
500			\|a Citation Status PubMed-not-MEDLINE
520			\|a © 2024 The Authors. Journal of the American College of Emergency Physicians Open published by Wiley Periodicals LLC on behalf of American College of Emergency Physicians.
520			\|a Objectives: This study presents a design framework to enhance the accuracy by which large language models (LLMs), like ChatGPT can extract insights from clinical notes. We highlight this framework via prompt refinement for the automated determination of HEART (History, ECG, Age, Risk factors, Troponin risk algorithm) scores in chest pain evaluation
520			\|a Methods: We developed a pipeline for LLM prompt testing, employing stochastic repeat testing and quantifying response errors relative to physician assessment. We evaluated the pipeline for automated HEART score determination across a limited set of 24 synthetic clinical notes representing four simulated patients. To assess whether iterative prompt design could improve the LLMs' ability to extract complex clinical concepts and apply rule-based logic to translate them to HEART subscores, we monitored diagnostic performance during prompt iteration
520			\|a Results: Validation included three iterative rounds of prompt improvement for three HEART subscores with 25 repeat trials totaling 1200 queries each for GPT-3.5 and GPT-4. For both LLM models, from initial to final prompt design, there was a decrease in the rate of responses with erroneous, non-numerical subscore answers. Accuracy of numerical responses for HEART subscores (discrete 0-2 point scale) improved for GPT-4 from the initial to final prompt iteration, decreasing from a mean error of 0.16-0.10 (95% confidence interval: 0.07-0.14) points
520			\|a Conclusion: We established a framework for iterative prompt design in the clinical space. Although the results indicate potential for integrating LLMs in structured clinical note analysis, translation to real, large-scale clinical data with appropriate data privacy safeguards is needed
650		4	\|a Journal Article
650		4	\|a ChatGPT
650		4	\|a HEART score
650		4	\|a artificial intelligence in medicine
650		4	\|a clinical decision support systems
650		4	\|a clinical note analysis
650		4	\|a emergency department risk algorithms
650		4	\|a large language models
650		4	\|a natural language processing
650		4	\|a prompt engineering
700	1		\|a Huang, Thomas \|e verfasserin \|4 aut
700	1		\|a Wright, Donald S \|e verfasserin \|4 aut
700	1		\|a Wright, Catherine X \|e verfasserin \|4 aut
700	1		\|a Socrates, Vimig \|e verfasserin \|4 aut
700	1		\|a Sangal, Rohit B \|e verfasserin \|4 aut
700	1		\|a Iscoe, Mark \|e verfasserin \|4 aut
700	1		\|a Chartash, David \|e verfasserin \|4 aut
700	1		\|a Taylor, R Andrew \|e verfasserin \|4 aut
773	0	8	\|i Enthalten in \|t Journal of the American College of Emergency Physicians open \|d 2020 \|g 5(2024), 2 vom: 19. März, Seite e13133 \|w (DE-627)NLM310096499 \|x 2688-1152 \|7 nnns
773	1	8	\|g volume:5 \|g year:2024 \|g number:2 \|g day:19 \|g month:03 \|g pages:e13133
856	4	0	\|u http://dx.doi.org/10.1002/emp2.13133 \|3 Volltext
912			\|a GBV_USEFLAG_A
912			\|a GBV_NLM
951			\|a AR
952			\|d 5 \|j 2024 \|e 2 \|b 19 \|c 03 \|h e13133

Automated HEART score determination via ChatGPT : Honing a framework for iterative prompt development

Zugang & Verfügbarkeit

Zugehörige Publikationen/Bände