A sequence-based two-layer predictor for identifying enhancers and their strength through enhanced feature extraction
Enhancers are short regulatory DNA fragments that are bound with proteins called activators. They are free-bound and distant elements, which play a vital role in controlling gene expression. It is challenging to identify enhancers and their strength due to their dynamic nature. Although some machine learning methods exist to accelerate identification process, their prediction accuracy and efficiency will need more improvement. In this regard, we propose a two-layer prediction model with enhanced feature extraction strategy which does feature combination from improved position-specific amino acid propensity (PSTKNC) method along with Enhanced Nucleic Acid Composition (ENAC) and Composition of k-spaced Nucleic Acid Pairs (CKSNAP). The feature sets from all three feature extraction approaches were concatenated and then sent through a simple artificial neural network (ANN) to accurately identify enhancers in the first layer and their strength in the second layer. Experiments are conducted on benchmark chromatin nine cell lines dataset. A 10-fold cross validation method is employed to evaluate model's performance. The results show that the proposed model gives an outstanding performance with 94.50%, 0.8903 of accuracy and Matthew's correlation coefficient (MCC) in predicting enhancers and fairly does well with independent test also when compared with all other existing methods.
Medienart: |
E-Artikel |
---|
Erscheinungsjahr: |
2022 |
---|---|
Erschienen: |
2022 |
Enthalten in: |
Zur Gesamtaufnahme - volume:20 |
---|---|
Enthalten in: |
Journal of bioinformatics and computational biology - 20(2022), 2 vom: 19. Apr., Seite 2250005 |
Sprache: |
Englisch |
---|
Beteiligte Personen: |
Amilpur, Santhosh [VerfasserIn] |
---|
Links: |
---|
Themen: |
9007-49-2 |
---|
Anmerkungen: |
Date Completed 10.05.2022 Date Revised 15.07.2022 published: Print-Electronic Citation Status MEDLINE |
---|
doi: |
10.1142/S0219720022500056 |
---|
funding: |
|
---|---|
Förderinstitution / Projekttitel: |
|
PPN (Katalog-ID): |
NLM337949573 |
---|
LEADER | 01000naa a22002652 4500 | ||
---|---|---|---|
001 | NLM337949573 | ||
003 | DE-627 | ||
005 | 20231225235557.0 | ||
007 | cr uuu---uuuuu | ||
008 | 231225s2022 xx |||||o 00| ||eng c | ||
024 | 7 | |a 10.1142/S0219720022500056 |2 doi | |
028 | 5 | 2 | |a pubmed24n1126.xml |
035 | |a (DE-627)NLM337949573 | ||
035 | |a (NLM)35264081 | ||
040 | |a DE-627 |b ger |c DE-627 |e rakwb | ||
041 | |a eng | ||
100 | 1 | |a Amilpur, Santhosh |e verfasserin |4 aut | |
245 | 1 | 2 | |a A sequence-based two-layer predictor for identifying enhancers and their strength through enhanced feature extraction |
264 | 1 | |c 2022 | |
336 | |a Text |b txt |2 rdacontent | ||
337 | |a ƒaComputermedien |b c |2 rdamedia | ||
338 | |a ƒa Online-Ressource |b cr |2 rdacarrier | ||
500 | |a Date Completed 10.05.2022 | ||
500 | |a Date Revised 15.07.2022 | ||
500 | |a published: Print-Electronic | ||
500 | |a Citation Status MEDLINE | ||
520 | |a Enhancers are short regulatory DNA fragments that are bound with proteins called activators. They are free-bound and distant elements, which play a vital role in controlling gene expression. It is challenging to identify enhancers and their strength due to their dynamic nature. Although some machine learning methods exist to accelerate identification process, their prediction accuracy and efficiency will need more improvement. In this regard, we propose a two-layer prediction model with enhanced feature extraction strategy which does feature combination from improved position-specific amino acid propensity (PSTKNC) method along with Enhanced Nucleic Acid Composition (ENAC) and Composition of k-spaced Nucleic Acid Pairs (CKSNAP). The feature sets from all three feature extraction approaches were concatenated and then sent through a simple artificial neural network (ANN) to accurately identify enhancers in the first layer and their strength in the second layer. Experiments are conducted on benchmark chromatin nine cell lines dataset. A 10-fold cross validation method is employed to evaluate model's performance. The results show that the proposed model gives an outstanding performance with 94.50%, 0.8903 of accuracy and Matthew's correlation coefficient (MCC) in predicting enhancers and fairly does well with independent test also when compared with all other existing methods | ||
650 | 4 | |a Journal Article | |
650 | 4 | |a Research Support, Non-U.S. Gov't | |
650 | 4 | |a Enhancers | |
650 | 4 | |a artificial neural network | |
650 | 4 | |a feature extraction | |
650 | 4 | |a gene regulation | |
650 | 4 | |a genome annotation | |
650 | 7 | |a DNA |2 NLM | |
650 | 7 | |a 9007-49-2 |2 NLM | |
700 | 1 | |a Bhukya, Raju |e verfasserin |4 aut | |
773 | 0 | 8 | |i Enthalten in |t Journal of bioinformatics and computational biology |d 2003 |g 20(2022), 2 vom: 19. Apr., Seite 2250005 |w (DE-627)NLM149554192 |x 1757-6334 |7 nnns |
773 | 1 | 8 | |g volume:20 |g year:2022 |g number:2 |g day:19 |g month:04 |g pages:2250005 |
856 | 4 | 0 | |u http://dx.doi.org/10.1142/S0219720022500056 |3 Volltext |
912 | |a GBV_USEFLAG_A | ||
912 | |a GBV_NLM | ||
951 | |a AR | ||
952 | |d 20 |j 2022 |e 2 |b 19 |c 04 |h 2250005 |