A sequence-based two-layer predictor for identifying enhancers and their strength through enhanced feature extraction

Enhancers are short regulatory DNA fragments that are bound with proteins called activators. They are free-bound and distant elements, which play a vital role in controlling gene expression. It is challenging to identify enhancers and their strength due to their dynamic nature. Although some machine learning methods exist to accelerate identification process, their prediction accuracy and efficiency will need more improvement. In this regard, we propose a two-layer prediction model with enhanced feature extraction strategy which does feature combination from improved position-specific amino acid propensity (PSTKNC) method along with Enhanced Nucleic Acid Composition (ENAC) and Composition of k-spaced Nucleic Acid Pairs (CKSNAP). The feature sets from all three feature extraction approaches were concatenated and then sent through a simple artificial neural network (ANN) to accurately identify enhancers in the first layer and their strength in the second layer. Experiments are conducted on benchmark chromatin nine cell lines dataset. A 10-fold cross validation method is employed to evaluate model's performance. The results show that the proposed model gives an outstanding performance with 94.50%, 0.8903 of accuracy and Matthew's correlation coefficient (MCC) in predicting enhancers and fairly does well with independent test also when compared with all other existing methods.

Medienart:

E-Artikel

Erscheinungsjahr:

2022

Erschienen:

2022

Enthalten in:

Zur Gesamtaufnahme - volume:20

Enthalten in:

Journal of bioinformatics and computational biology - 20(2022), 2 vom: 19. Apr., Seite 2250005

Sprache:

Englisch

Beteiligte Personen:

Amilpur, Santhosh [VerfasserIn]
Bhukya, Raju [VerfasserIn]

Links:

Volltext

Themen:

9007-49-2
Artificial neural network
DNA
Enhancers
Feature extraction
Gene regulation
Genome annotation
Journal Article
Research Support, Non-U.S. Gov't

Anmerkungen:

Date Completed 10.05.2022

Date Revised 15.07.2022

published: Print-Electronic

Citation Status MEDLINE

doi:

10.1142/S0219720022500056

funding:

Förderinstitution / Projekttitel:

PPN (Katalog-ID):

NLM337949573