An attention-based hybrid deep neural networks for accurate identification of transcription factor binding sites
Abstract Transcription factors (TF) control gene expression by binding to specific regions of DNA sequence. TF play an important role in various disease processes, and their identification helps in understanding underlying gene regulation leading to disease risk. Currently, the most powerful models used for the predicting binding sites between TF and DNA sequence from ChIP-Seq dataset are lagging in terms of good feature extraction capabilities. We propose two models named PCLAtt and TranAtt for the prediction of 690 TF-cell line pairs from DNA sequence data. PCLAtt consists of two sets of convolutional neural networks—bidirectional long short-term memory (CNN-BiLSTM) layers in parallel followed by a multi-head attention layer and weight-shared dense layer which all contribute towards extracting efficient features from DNA sequence. TranAtt consists of convolution layers of a pre-trained model along with a BiLSTM layer and attention layer. The convolutional layers of the model act as a motif scanner and the BiLSTM layer learns the regulatory grammar of the motifs. Further, the attention mechanism is applied to give more importance to those sequence regions of DNA that consist of transcription factor binding motifs thus resulting in better performance of the proposed models. PCLAtt outperformed other state-of-the-art methods like DeepSEA, DanQ, TBiNet and DeepATT in prediction of binding sites between TF and the DNA sequence..
Medienart: |
E-Artikel |
---|
Erscheinungsjahr: |
2022 |
---|---|
Erschienen: |
2022 |
Enthalten in: |
Zur Gesamtaufnahme - volume:34 |
---|---|
Enthalten in: |
Neural computing & applications - 34(2022), 21 vom: 29. Juni, Seite 19051-19060 |
Sprache: |
Englisch |
---|
Beteiligte Personen: |
Bhukya, Raju [VerfasserIn] |
---|
Links: |
Volltext [lizenzpflichtig] |
---|
Themen: |
Convolution neural network-bidirectional long short-term memory |
---|
Anmerkungen: |
© The Author(s), under exclusive licence to Springer-Verlag London Ltd., part of Springer Nature 2022 |
---|
doi: |
10.1007/s00521-022-07502-z |
---|
funding: |
|
---|---|
Förderinstitution / Projekttitel: |
|
PPN (Katalog-ID): |
OLC2132439990 |
---|
LEADER | 01000naa a22002652 4500 | ||
---|---|---|---|
001 | OLC2132439990 | ||
003 | DE-627 | ||
005 | 20230506080036.0 | ||
007 | cr uuu---uuuuu | ||
008 | 230506s2022 xx |||||o 00| ||eng c | ||
024 | 7 | |a 10.1007/s00521-022-07502-z |2 doi | |
035 | |a (DE-627)OLC2132439990 | ||
035 | |a (DE-He213)s00521-022-07502-z-e | ||
040 | |a DE-627 |b ger |c DE-627 |e rakwb | ||
041 | |a eng | ||
082 | 0 | 4 | |a 004 |q VZ |
082 | 0 | 4 | |a 004 |q VZ |
100 | 1 | |a Bhukya, Raju |e verfasserin |4 aut | |
245 | 1 | 0 | |a An attention-based hybrid deep neural networks for accurate identification of transcription factor binding sites |
264 | 1 | |c 2022 | |
336 | |a Text |b txt |2 rdacontent | ||
337 | |a Computermedien |b c |2 rdamedia | ||
338 | |a Online-Ressource |b cr |2 rdacarrier | ||
500 | |a © The Author(s), under exclusive licence to Springer-Verlag London Ltd., part of Springer Nature 2022 | ||
520 | |a Abstract Transcription factors (TF) control gene expression by binding to specific regions of DNA sequence. TF play an important role in various disease processes, and their identification helps in understanding underlying gene regulation leading to disease risk. Currently, the most powerful models used for the predicting binding sites between TF and DNA sequence from ChIP-Seq dataset are lagging in terms of good feature extraction capabilities. We propose two models named PCLAtt and TranAtt for the prediction of 690 TF-cell line pairs from DNA sequence data. PCLAtt consists of two sets of convolutional neural networks—bidirectional long short-term memory (CNN-BiLSTM) layers in parallel followed by a multi-head attention layer and weight-shared dense layer which all contribute towards extracting efficient features from DNA sequence. TranAtt consists of convolution layers of a pre-trained model along with a BiLSTM layer and attention layer. The convolutional layers of the model act as a motif scanner and the BiLSTM layer learns the regulatory grammar of the motifs. Further, the attention mechanism is applied to give more importance to those sequence regions of DNA that consist of transcription factor binding motifs thus resulting in better performance of the proposed models. PCLAtt outperformed other state-of-the-art methods like DeepSEA, DanQ, TBiNet and DeepATT in prediction of binding sites between TF and the DNA sequence. | ||
650 | 4 | |a Transcription factors | |
650 | 4 | |a Convolution neural network-bidirectional long short-term memory | |
650 | 4 | |a Multi-head attention | |
650 | 4 | |a Weight-shared dense | |
700 | 1 | |a Kumari, Archana |4 aut | |
700 | 1 | |a Dasari, Chandra Mohan |4 aut | |
700 | 1 | |a Amilpur, Santhosh |4 aut | |
773 | 0 | 8 | |i Enthalten in |t Neural computing & applications |d Springer London, 1993 |g 34(2022), 21 vom: 29. Juni, Seite 19051-19060 |h Online-Ressource |w (DE-627)271595574 |w (DE-600)1480526-1 |w (DE-576)096188545 |x 1433-3058 |7 nnns |
773 | 1 | 8 | |g volume:34 |g year:2022 |g number:21 |g day:29 |g month:06 |g pages:19051-19060 |
856 | 4 | 0 | |u https://dx.doi.org/10.1007/s00521-022-07502-z |z lizenzpflichtig |3 Volltext |
912 | |a GBV_USEFLAG_A | ||
912 | |a SYSFLAG_A | ||
912 | |a GBV_OLC | ||
912 | |a GBV_ILN_11 | ||
912 | |a GBV_ILN_20 | ||
912 | |a GBV_ILN_22 | ||
912 | |a GBV_ILN_23 | ||
912 | |a GBV_ILN_24 | ||
912 | |a GBV_ILN_31 | ||
912 | |a GBV_ILN_32 | ||
912 | |a GBV_ILN_39 | ||
912 | |a GBV_ILN_40 | ||
912 | |a GBV_ILN_60 | ||
912 | |a GBV_ILN_62 | ||
912 | |a GBV_ILN_63 | ||
912 | |a GBV_ILN_69 | ||
912 | |a GBV_ILN_70 | ||
912 | |a GBV_ILN_73 | ||
912 | |a GBV_ILN_74 | ||
912 | |a GBV_ILN_90 | ||
912 | |a GBV_ILN_95 | ||
912 | |a GBV_ILN_100 | ||
912 | |a GBV_ILN_101 | ||
912 | |a GBV_ILN_105 | ||
912 | |a GBV_ILN_110 | ||
912 | |a GBV_ILN_120 | ||
912 | |a GBV_ILN_138 | ||
912 | |a GBV_ILN_150 | ||
912 | |a GBV_ILN_151 | ||
912 | |a GBV_ILN_152 | ||
912 | |a GBV_ILN_161 | ||
912 | |a GBV_ILN_170 | ||
912 | |a GBV_ILN_171 | ||
912 | |a GBV_ILN_187 | ||
912 | |a GBV_ILN_206 | ||
912 | |a GBV_ILN_213 | ||
912 | |a GBV_ILN_224 | ||
912 | |a GBV_ILN_230 | ||
912 | |a GBV_ILN_250 | ||
912 | |a GBV_ILN_267 | ||
912 | |a GBV_ILN_281 | ||
912 | |a GBV_ILN_285 | ||
912 | |a GBV_ILN_293 | ||
912 | |a GBV_ILN_370 | ||
912 | |a GBV_ILN_602 | ||
912 | |a GBV_ILN_636 | ||
912 | |a GBV_ILN_702 | ||
912 | |a GBV_ILN_2001 | ||
912 | |a GBV_ILN_2003 | ||
912 | |a GBV_ILN_2004 | ||
912 | |a GBV_ILN_2005 | ||
912 | |a GBV_ILN_2006 | ||
912 | |a GBV_ILN_2007 | ||
912 | |a GBV_ILN_2008 | ||
912 | |a GBV_ILN_2009 | ||
912 | |a GBV_ILN_2010 | ||
912 | |a GBV_ILN_2011 | ||
912 | |a GBV_ILN_2014 | ||
912 | |a GBV_ILN_2015 | ||
912 | |a GBV_ILN_2020 | ||
912 | |a GBV_ILN_2021 | ||
912 | |a GBV_ILN_2025 | ||
912 | |a GBV_ILN_2026 | ||
912 | |a GBV_ILN_2027 | ||
912 | |a GBV_ILN_2031 | ||
912 | |a GBV_ILN_2034 | ||
912 | |a GBV_ILN_2037 | ||
912 | |a GBV_ILN_2038 | ||
912 | |a GBV_ILN_2039 | ||
912 | |a GBV_ILN_2044 | ||
912 | |a GBV_ILN_2048 | ||
912 | |a GBV_ILN_2049 | ||
912 | |a GBV_ILN_2055 | ||
912 | |a GBV_ILN_2056 | ||
912 | |a GBV_ILN_2057 | ||
912 | |a GBV_ILN_2059 | ||
912 | |a GBV_ILN_2061 | ||
912 | |a GBV_ILN_2064 | ||
912 | |a GBV_ILN_2065 | ||
912 | |a GBV_ILN_2068 | ||
912 | |a GBV_ILN_2088 | ||
912 | |a GBV_ILN_2093 | ||
912 | |a GBV_ILN_2106 | ||
912 | |a GBV_ILN_2107 | ||
912 | |a GBV_ILN_2108 | ||
912 | |a GBV_ILN_2110 | ||
912 | |a GBV_ILN_2111 | ||
912 | |a GBV_ILN_2112 | ||
912 | |a GBV_ILN_2113 | ||
912 | |a GBV_ILN_2118 | ||
912 | |a GBV_ILN_2119 | ||
912 | |a GBV_ILN_2129 | ||
912 | |a GBV_ILN_2143 | ||
912 | |a GBV_ILN_2144 | ||
912 | |a GBV_ILN_2147 | ||
912 | |a GBV_ILN_2148 | ||
912 | |a GBV_ILN_2152 | ||
912 | |a GBV_ILN_2153 | ||
912 | |a GBV_ILN_2188 | ||
912 | |a GBV_ILN_2190 | ||
912 | |a GBV_ILN_2232 | ||
912 | |a GBV_ILN_2336 | ||
912 | |a GBV_ILN_2446 | ||
912 | |a GBV_ILN_2470 | ||
912 | |a GBV_ILN_2474 | ||
912 | |a GBV_ILN_2507 | ||
912 | |a GBV_ILN_2522 | ||
912 | |a GBV_ILN_2548 | ||
912 | |a GBV_ILN_4035 | ||
912 | |a GBV_ILN_4037 | ||
912 | |a GBV_ILN_4046 | ||
912 | |a GBV_ILN_4112 | ||
912 | |a GBV_ILN_4125 | ||
912 | |a GBV_ILN_4126 | ||
912 | |a GBV_ILN_4242 | ||
912 | |a GBV_ILN_4246 | ||
912 | |a GBV_ILN_4249 | ||
912 | |a GBV_ILN_4251 | ||
912 | |a GBV_ILN_4305 | ||
912 | |a GBV_ILN_4306 | ||
912 | |a GBV_ILN_4307 | ||
912 | |a GBV_ILN_4313 | ||
912 | |a GBV_ILN_4322 | ||
912 | |a GBV_ILN_4323 | ||
912 | |a GBV_ILN_4324 | ||
912 | |a GBV_ILN_4325 | ||
912 | |a GBV_ILN_4326 | ||
912 | |a GBV_ILN_4328 | ||
912 | |a GBV_ILN_4333 | ||
912 | |a GBV_ILN_4334 | ||
912 | |a GBV_ILN_4335 | ||
912 | |a GBV_ILN_4336 | ||
912 | |a GBV_ILN_4338 | ||
912 | |a GBV_ILN_4393 | ||
912 | |a GBV_ILN_4700 | ||
951 | |a AR | ||
952 | |d 34 |j 2022 |e 21 |b 29 |c 06 |h 19051-19060 |