Automated HEART score determination via ChatGPT : Honing a framework for iterative prompt development

© 2024 The Authors. Journal of the American College of Emergency Physicians Open published by Wiley Periodicals LLC on behalf of American College of Emergency Physicians..

Objectives: This study presents a design framework to enhance the accuracy by which large language models (LLMs), like ChatGPT can extract insights from clinical notes. We highlight this framework via prompt refinement for the automated determination of HEART (History, ECG, Age, Risk factors, Troponin risk algorithm) scores in chest pain evaluation.

Methods: We developed a pipeline for LLM prompt testing, employing stochastic repeat testing and quantifying response errors relative to physician assessment. We evaluated the pipeline for automated HEART score determination across a limited set of 24 synthetic clinical notes representing four simulated patients. To assess whether iterative prompt design could improve the LLMs' ability to extract complex clinical concepts and apply rule-based logic to translate them to HEART subscores, we monitored diagnostic performance during prompt iteration.

Results: Validation included three iterative rounds of prompt improvement for three HEART subscores with 25 repeat trials totaling 1200 queries each for GPT-3.5 and GPT-4. For both LLM models, from initial to final prompt design, there was a decrease in the rate of responses with erroneous, non-numerical subscore answers. Accuracy of numerical responses for HEART subscores (discrete 0-2 point scale) improved for GPT-4 from the initial to final prompt iteration, decreasing from a mean error of 0.16-0.10 (95% confidence interval: 0.07-0.14) points.

Conclusion: We established a framework for iterative prompt design in the clinical space. Although the results indicate potential for integrating LLMs in structured clinical note analysis, translation to real, large-scale clinical data with appropriate data privacy safeguards is needed.

Medienart:

E-Artikel

Erscheinungsjahr:

2024

Erschienen:

2024

Enthalten in:

Zur Gesamtaufnahme - volume:5

Enthalten in:

Journal of the American College of Emergency Physicians open - 5(2024), 2 vom: 19. März, Seite e13133

Sprache:

Englisch

Beteiligte Personen:

Safranek, Conrad W [VerfasserIn]
Huang, Thomas [VerfasserIn]
Wright, Donald S [VerfasserIn]
Wright, Catherine X [VerfasserIn]
Socrates, Vimig [VerfasserIn]
Sangal, Rohit B [VerfasserIn]
Iscoe, Mark [VerfasserIn]
Chartash, David [VerfasserIn]
Taylor, R Andrew [VerfasserIn]

Links:

Volltext

Themen:

Artificial intelligence in medicine
ChatGPT
Clinical decision support systems
Clinical note analysis
Emergency department risk algorithms
HEART score
Journal Article
Large language models
Natural language processing
Prompt engineering

Anmerkungen:

Date Revised 16.03.2024

published: Electronic-eCollection

Citation Status PubMed-not-MEDLINE

doi:

10.1002/emp2.13133

funding:

Förderinstitution / Projekttitel:

PPN (Katalog-ID):

NLM369712072