reguloGPT : Harnessing GPT for Knowledge Graph Construction of Molecular Regulatory Pathways

Motivation: Molecular Regulatory Pathways (MRPs) are crucial for understanding biological functions. Knowledge Graphs (KGs) have become vital in organizing and analyzing MRPs, providing structured representations of complex biological interactions. Current tools for mining KGs from biomedical literature are inadequate in capturing complex, hierarchical relationships and contextual information about MRPs. Large Language Models (LLMs) like GPT-4 offer a promising solution, with advanced capabilities to decipher the intricate nuances of language. However, their potential for end-to-end KG construction, particularly for MRPs, remains largely unexplored.

Results: We present reguloGPT, a novel GPT-4 based in-context learning prompt, designed for the end-to-end joint name entity recognition, N-ary relationship extraction, and context predictions from a sentence that describes regulatory interactions with MRPs. Our reguloGPT approach introduces a context-aware relational graph that effectively embodies the hierarchical structure of MRPs and resolves semantic inconsistencies by embedding context directly within relational edges. We created a benchmark dataset including 400 annotated PubMed titles on N6-methyladenosine (m6A) regulations. Rigorous evaluation of reguloGPT on the benchmark dataset demonstrated marked improvement over existing algorithms. We further developed a novel G-Eval scheme, leveraging GPT-4 for annotation-free performance evaluation and demonstrated its agreement with traditional annotation-based evaluations. Utilizing reguloGPT predictions on m6A-related titles, we constructed the m6A-KG and demonstrated its utility in elucidating m6A's regulatory mechanisms in cancer phenotypes across various cancers. These results underscore reguloGPT's transformative potential for extracting biological knowledge from the literature.

Availability and implementation: The source code of reguloGPT, the m6A title and benchmark datasets, and m6A-KG are available at: https://github.com/Huang-AI4Medicine-Lab/reguloGPT.

Medienart:

E-Artikel

Erscheinungsjahr:

2024

Erschienen:

2024

Enthalten in:

Zur Gesamtaufnahme - year:2024

Enthalten in:

bioRxiv : the preprint server for biology - (2024) vom: 30. Jan.

Sprache:

Englisch

Beteiligte Personen:

Wu, Xidong [VerfasserIn]
Zeng, Yiming [VerfasserIn]
Das, Arun [VerfasserIn]
Jo, Sumin [VerfasserIn]
Zhang, Tinghe [VerfasserIn]
Patel, Parth [VerfasserIn]
Zhang, Jianqiu [VerfasserIn]
Gao, Shou-Jiang [VerfasserIn]
Pratt, Dexter [VerfasserIn]
Chiu, Yu-Chiao [VerfasserIn]
Huang, Yufei [VerfasserIn]

Links:

Volltext

Themen:

GPT
In Context Learning
Knowledge Graph
M6A mRNA Methylation
Molecular Regulatory Pathways
Preprint

Anmerkungen:

Date Revised 12.02.2024

published: Electronic

Citation Status PubMed-not-MEDLINE

doi:

10.1101/2024.01.27.577521

funding:

Förderinstitution / Projekttitel:

PPN (Katalog-ID):

NLM368024709