Fine-tuning Large Language Models for Rare Disease Concept Normalization
Objective: We aim to develop a novel method for rare disease concept normalization by fine-tuning Llama 2, an open-source large language model (LLM), using a domain-specific corpus sourced from the Human Phenotype Ontology (HPO).
Methods: We developed an in-house template-based script to generate two corpora for fine-tuning. The first (NAME) contains standardized HPO names, sourced from the HPO vocabularies, along with their corresponding identifiers. The second (NAME+SYN) includes HPO names and half of the concept's synonyms as well as identifiers. Subsequently, we fine-tuned Llama2 (Llama2-7B) for each sentence set and conducted an evaluation using a range of sentence prompts and various phenotype terms.
Results: When the phenotype terms for normalization were included in the fine-tuning corpora, both models demonstrated nearly perfect performance, averaging over 99% accuracy. In comparison, ChatGPT-3.5 has only ~20% accuracy in identifying HPO IDs for phenotype terms. When single-character typos were introduced in the phenotype terms, the accuracy of NAME and NAME+SYN is 10.2% and 36.1%, respectively, but increases to 61.8% (NAME+SYN) with additional typo-specific fine-tuning. For terms sourced from HPO vocabularies as unseen synonyms, the NAME model achieved 11.2% accuracy, while the NAME+SYN model achieved 92.7% accuracy.
Conclusion: Our fine-tuned models demonstrate ability to normalize phenotype terms unseen in the fine-tuning corpus, including misspellings, synonyms, terms from other ontologies, and laymen's terms. Our approach provides a solution for the use of LLM to identify named medical entities from the clinical narratives, while successfully normalizing them to standard concepts in a controlled vocabulary.
Medienart: |
E-Artikel |
---|
Erscheinungsjahr: |
2024 |
---|---|
Erschienen: |
2024 |
Enthalten in: |
Zur Gesamtaufnahme - year:2024 |
---|---|
Enthalten in: |
bioRxiv : the preprint server for biology - (2024) vom: 14. Apr. |
Sprache: |
Englisch |
---|
Beteiligte Personen: |
Wang, Andy [VerfasserIn] |
---|
Links: |
---|
Themen: |
Concept normalization |
---|
Anmerkungen: |
Date Revised 25.04.2024 published: Electronic Citation Status PubMed-not-MEDLINE |
---|
doi: |
10.1101/2023.12.28.573586 |
---|
funding: |
|
---|---|
Förderinstitution / Projekttitel: |
|
PPN (Katalog-ID): |
NLM367253194 |
---|
LEADER | 01000caa a22002652 4500 | ||
---|---|---|---|
001 | NLM367253194 | ||
003 | DE-627 | ||
005 | 20240425232718.0 | ||
007 | cr uuu---uuuuu | ||
008 | 240118s2024 xx |||||o 00| ||eng c | ||
024 | 7 | |a 10.1101/2023.12.28.573586 |2 doi | |
028 | 5 | 2 | |a pubmed24n1386.xml |
035 | |a (DE-627)NLM367253194 | ||
035 | |a (NLM)38234802 | ||
035 | |a (PII)2023.12.28.573586 | ||
040 | |a DE-627 |b ger |c DE-627 |e rakwb | ||
041 | |a eng | ||
100 | 1 | |a Wang, Andy |e verfasserin |4 aut | |
245 | 1 | 0 | |a Fine-tuning Large Language Models for Rare Disease Concept Normalization |
264 | 1 | |c 2024 | |
336 | |a Text |b txt |2 rdacontent | ||
337 | |a ƒaComputermedien |b c |2 rdamedia | ||
338 | |a ƒa Online-Ressource |b cr |2 rdacarrier | ||
500 | |a Date Revised 25.04.2024 | ||
500 | |a published: Electronic | ||
500 | |a Citation Status PubMed-not-MEDLINE | ||
520 | |a Objective: We aim to develop a novel method for rare disease concept normalization by fine-tuning Llama 2, an open-source large language model (LLM), using a domain-specific corpus sourced from the Human Phenotype Ontology (HPO) | ||
520 | |a Methods: We developed an in-house template-based script to generate two corpora for fine-tuning. The first (NAME) contains standardized HPO names, sourced from the HPO vocabularies, along with their corresponding identifiers. The second (NAME+SYN) includes HPO names and half of the concept's synonyms as well as identifiers. Subsequently, we fine-tuned Llama2 (Llama2-7B) for each sentence set and conducted an evaluation using a range of sentence prompts and various phenotype terms | ||
520 | |a Results: When the phenotype terms for normalization were included in the fine-tuning corpora, both models demonstrated nearly perfect performance, averaging over 99% accuracy. In comparison, ChatGPT-3.5 has only ~20% accuracy in identifying HPO IDs for phenotype terms. When single-character typos were introduced in the phenotype terms, the accuracy of NAME and NAME+SYN is 10.2% and 36.1%, respectively, but increases to 61.8% (NAME+SYN) with additional typo-specific fine-tuning. For terms sourced from HPO vocabularies as unseen synonyms, the NAME model achieved 11.2% accuracy, while the NAME+SYN model achieved 92.7% accuracy | ||
520 | |a Conclusion: Our fine-tuned models demonstrate ability to normalize phenotype terms unseen in the fine-tuning corpus, including misspellings, synonyms, terms from other ontologies, and laymen's terms. Our approach provides a solution for the use of LLM to identify named medical entities from the clinical narratives, while successfully normalizing them to standard concepts in a controlled vocabulary | ||
650 | 4 | |a Preprint | |
650 | 4 | |a HPO | |
650 | 4 | |a Large language model | |
650 | 4 | |a Llama2 | |
650 | 4 | |a concept normalization | |
650 | 4 | |a fine-tuning | |
700 | 1 | |a Liu, Cong |e verfasserin |4 aut | |
700 | 1 | |a Yang, Jingye |e verfasserin |4 aut | |
700 | 1 | |a Weng, Chunhua |e verfasserin |4 aut | |
773 | 0 | 8 | |i Enthalten in |t bioRxiv : the preprint server for biology |d 2020 |g (2024) vom: 14. Apr. |w (DE-627)NLM31090014X |7 nnns |
773 | 1 | 8 | |g year:2024 |g day:14 |g month:04 |
856 | 4 | 0 | |u http://dx.doi.org/10.1101/2023.12.28.573586 |3 Volltext |
912 | |a GBV_USEFLAG_A | ||
912 | |a GBV_NLM | ||
951 | |a AR | ||
952 | |j 2024 |b 14 |c 04 |