Inferring functions of coding and non-coding genes using epigenomic patterns and deciphering the effect of combinatorics of transcription factors binding at promoters

Abstract The number of annotated genes in the human genome has increased tremendously, and understanding their biological role is challenging through experimental methods alone. There is a need for a computational approach to infer the function of genes, particularly for non-coding RNAs, with reliable explainability. We have utilized genomic features that are present across both coding and non-coding genes like transcription factor (TF) binding pattern, histone modifications, and DNase hypersensitivity profiles to predict ontology-based functions of genes. Our approach for gene function prediction (GFPred) made reliable predictions (>90% balanced accuracy) for 486 gene-sets. Further analysis revealed that predictability using only TF-binding patterns at promoters is also high, and it paved the way for studying the effect of their combinatorics. The predicted associations between functions and genes were validated for their reliability using PubMed abstract mining. Clustering functions based on shared top predictive TFs revealed many latent groups of gene-sets involved in common major biological processes. Available CRISPR screens also supported the inferred association of genes with the major biological processes of latent groups of gene-sets. For the explainability of our approach, we also made more insights into the effect of combinatorics of TF binding (especially TF-pairs) on association with biological functions..

Medienart:

Preprint

Erscheinungsjahr:

2023

Erschienen:

2023

Enthalten in:

bioRxiv.org - (2023) vom: 20. Nov. Zur Gesamtaufnahme - year:2023

Sprache:

Englisch

Beteiligte Personen:

Chandra, Omkar [VerfasserIn]
Sharma, Madhu [VerfasserIn]
Pandey, Neetesh [VerfasserIn]
Jha, Indra Prakash [VerfasserIn]
Mishra, Shreya [VerfasserIn]
Kong, Say Li [VerfasserIn]
Kumar, Vibhor [VerfasserIn]

Links:

Volltext [kostenfrei]

Themen:

570
Biology

doi:

10.1101/2022.04.17.488570

funding:

Förderinstitution / Projekttitel:

PPN (Katalog-ID):

XBI03578136X