Details der Publikation - Protocol For Human Evaluation of Artificial Intelligence Chatbots in Clinical Consultations

Protocol For Human Evaluation of Artificial Intelligence Chatbots in Clinical Consultations

Abstract Background Generative artificial intelligence (AI) technology has the revolutionary potentials to augment clinical practice and telemedicine. The nuances of real-life patient scenarios and complex clinical environments demand a rigorous, evidence-based approach to ensure safe and effective application.Methods We present a protocol for the systematic evaluation of generative AI large language models (LLMs) as chatbots within the context of clinical microbiology and infectious disease consultations. We aim to critically assess the clinical accuracy, comprehensiveness, coherence, and safety of recommendations produced by leading generative AI models, including Claude 2, Gemini Pro, GPT-4.0, and a GPT-4.0-based custom AI chatbot.Discussion A standardised healthcare-specific prompt template is employed to elicit clinically impactful AI responses. Generated responses will be graded by a panel of human evaluators, encompassing a wide spectrum of domain expertise in clinical microbiology and virology and clinical infectious diseases. Evaluations are performed using a 5-point Likert scale across four clinical domains: factual consistency, comprehensiveness, coherence, and medical harmfulness. Our study will offer insights into the feasibility, limitations, and boundaries of generative AI in healthcare, providing guidance for future research and clinical implementation. Ethical guidelines and safety guardrails should be developed to uphold patient safety and clinical standards..

Medienart:	Preprint

Erscheinungsjahr:	2024
Erschienen:	2024

Enthalten in:	bioRxiv.org - (2024) vom: 05. März Zur Gesamtaufnahme - year:2024

Sprache:	Englisch

Beteiligte Personen:	Chiu, Edwin Kwan-Yeung [VerfasserIn] Chung, Tom Wai-Hin [VerfasserIn]

Links:	Volltext [kostenfrei]

Themen:	570 Biology

doi:	10.1101/2024.03.01.24303593

funding:
Förderinstitution / Projekttitel:

PPN (Katalog-ID):	XBI042712319

Internformat


LEADER	01000naa a22002652 4500
001	XBI042712319
003	DE-627
005	20240306115010.0
007	cr uuu---uuuuu
008	240306s2024 xx \|\|\|\|\|o 00\| \|\|eng c
024	7		\|a 10.1101/2024.03.01.24303593 \|2 doi
035			\|a (DE-627)XBI042712319
035			\|a (biorXiv)10.1101/2024.03.01.24303593
040			\|a DE-627 \|b ger \|c DE-627 \|e rakwb
041			\|a eng
100	1		\|a Chiu, Edwin Kwan-Yeung \|e verfasserin \|0 (orcid)0000-0003-1644-491X \|4 aut
245	1	0	\|a Protocol For Human Evaluation of Artificial Intelligence Chatbots in Clinical Consultations
264		1	\|c 2024
336			\|a Text \|b txt \|2 rdacontent
337			\|a Computermedien \|b c \|2 rdamedia
338			\|a Online-Ressource \|b cr \|2 rdacarrier
520			\|a Abstract Background Generative artificial intelligence (AI) technology has the revolutionary potentials to augment clinical practice and telemedicine. The nuances of real-life patient scenarios and complex clinical environments demand a rigorous, evidence-based approach to ensure safe and effective application.Methods We present a protocol for the systematic evaluation of generative AI large language models (LLMs) as chatbots within the context of clinical microbiology and infectious disease consultations. We aim to critically assess the clinical accuracy, comprehensiveness, coherence, and safety of recommendations produced by leading generative AI models, including Claude 2, Gemini Pro, GPT-4.0, and a GPT-4.0-based custom AI chatbot.Discussion A standardised healthcare-specific prompt template is employed to elicit clinically impactful AI responses. Generated responses will be graded by a panel of human evaluators, encompassing a wide spectrum of domain expertise in clinical microbiology and virology and clinical infectious diseases. Evaluations are performed using a 5-point Likert scale across four clinical domains: factual consistency, comprehensiveness, coherence, and medical harmfulness. Our study will offer insights into the feasibility, limitations, and boundaries of generative AI in healthcare, providing guidance for future research and clinical implementation. Ethical guidelines and safety guardrails should be developed to uphold patient safety and clinical standards.
650		4	\|a Biology \|7 (dpeaa)DE-84
650		4	\|a 570 \|7 (dpeaa)DE-84
700	1		\|a Chung, Tom Wai-Hin \|0 (orcid)0000-0003-1780-821X \|4 aut
773	0	8	\|i Enthalten in \|t bioRxiv.org \|g (2024) vom: 05. März
773	1	8	\|g year:2024 \|g day:05 \|g month:03
856	4	0	\|u http://dx.doi.org/10.1101/2024.03.01.24303593 \|z kostenfrei \|3 Volltext
912			\|a GBV_XBI
951			\|a AR
952			\|j 2024 \|b 05 \|c 03

Protocol For Human Evaluation of Artificial Intelligence Chatbots in Clinical Consultations

Zugang & Verfügbarkeit

Zugehörige Publikationen/Bände