WSHNN : A Weakly Supervised Hybrid Neural Network for the Identification of DNA-Protein Binding Sites
Copyright© Bentham Science Publishers; For any queries, please email at epubbenthamscience.net..
INTRODUCTION: Transcription factors are vital biological components that control gene expression, and their primary biological function is to recognize DNA sequences. As related research continues, it was found that the specificity of DNA-protein binding has a significant role in gene expression, regulation, and especially gene therapy. Convolutional Neural Networks (CNNs) have become increasingly popular for predicting DNa-protein-specific binding sites, but their accuracy in prediction needs to be improved.
METHODS: We proposed a framework for combining multi-Instance Learning (MIL) and a hybrid neural network named WSHNN. First, we utilized sliding windows to split the DNA sequences into multiple overlapping instances, each instance containing multiple bags. Then, the instances were encoded using a K-mer encoding. Afterward, the scores of all instances in the same bag were calculated separately by a hybrid neural network.
RESULTS: Finally, a fully connected network was utilized as the final prediction for that bag. The framework could achieve the performances of 90.73% in Pre, 82.77% in Recall, 87.17% in Acc, 0.8657 in F1-score, and 0.7462 in MCC, respectively. In addition, we discussed the performance of K-mer encoding. Compared with other art-of-the-state efforts, the model has better performance with sequence information.
CONCLUSION: From the experimental results, it can be concluded that Bi-directional Long-ShortTerm Memory (Bi-LSTM) can better capture the long-sequence relationships between DNA sequences (the code and data can be visited at https://github.com/baowz12345/Weak_ Super_Network).
Medienart: |
E-Artikel |
---|
Erscheinungsjahr: |
2024 |
---|---|
Erschienen: |
2024 |
Enthalten in: |
Zur Gesamtaufnahme - year:2024 |
---|---|
Enthalten in: |
Current computer-aided drug design - (2024) vom: 12. Feb. |
Sprache: |
Englisch |
---|
Beteiligte Personen: |
Bao, Wenzheng [VerfasserIn] |
---|
Links: |
---|
Themen: |
DNA-Protein binding; Weakly supervised |
---|
Anmerkungen: |
Date Revised 13.02.2024 published: Print-Electronic Citation Status Publisher |
---|
doi: |
10.2174/0115734099277249240129114123 |
---|
funding: |
|
---|---|
Förderinstitution / Projekttitel: |
|
PPN (Katalog-ID): |
NLM368379205 |
---|
LEADER | 01000naa a22002652 4500 | ||
---|---|---|---|
001 | NLM368379205 | ||
003 | DE-627 | ||
005 | 20240213233133.0 | ||
007 | cr uuu---uuuuu | ||
008 | 240213s2024 xx |||||o 00| ||eng c | ||
024 | 7 | |a 10.2174/0115734099277249240129114123 |2 doi | |
028 | 5 | 2 | |a pubmed24n1291.xml |
035 | |a (DE-627)NLM368379205 | ||
035 | |a (NLM)38347788 | ||
040 | |a DE-627 |b ger |c DE-627 |e rakwb | ||
041 | |a eng | ||
100 | 1 | |a Bao, Wenzheng |e verfasserin |4 aut | |
245 | 1 | 0 | |a WSHNN |b A Weakly Supervised Hybrid Neural Network for the Identification of DNA-Protein Binding Sites |
264 | 1 | |c 2024 | |
336 | |a Text |b txt |2 rdacontent | ||
337 | |a ƒaComputermedien |b c |2 rdamedia | ||
338 | |a ƒa Online-Ressource |b cr |2 rdacarrier | ||
500 | |a Date Revised 13.02.2024 | ||
500 | |a published: Print-Electronic | ||
500 | |a Citation Status Publisher | ||
520 | |a Copyright© Bentham Science Publishers; For any queries, please email at epubbenthamscience.net. | ||
520 | |a INTRODUCTION: Transcription factors are vital biological components that control gene expression, and their primary biological function is to recognize DNA sequences. As related research continues, it was found that the specificity of DNA-protein binding has a significant role in gene expression, regulation, and especially gene therapy. Convolutional Neural Networks (CNNs) have become increasingly popular for predicting DNa-protein-specific binding sites, but their accuracy in prediction needs to be improved | ||
520 | |a METHODS: We proposed a framework for combining multi-Instance Learning (MIL) and a hybrid neural network named WSHNN. First, we utilized sliding windows to split the DNA sequences into multiple overlapping instances, each instance containing multiple bags. Then, the instances were encoded using a K-mer encoding. Afterward, the scores of all instances in the same bag were calculated separately by a hybrid neural network | ||
520 | |a RESULTS: Finally, a fully connected network was utilized as the final prediction for that bag. The framework could achieve the performances of 90.73% in Pre, 82.77% in Recall, 87.17% in Acc, 0.8657 in F1-score, and 0.7462 in MCC, respectively. In addition, we discussed the performance of K-mer encoding. Compared with other art-of-the-state efforts, the model has better performance with sequence information | ||
520 | |a CONCLUSION: From the experimental results, it can be concluded that Bi-directional Long-ShortTerm Memory (Bi-LSTM) can better capture the long-sequence relationships between DNA sequences (the code and data can be visited at https://github.com/baowz12345/Weak_ Super_Network) | ||
650 | 4 | |a Journal Article | |
650 | 4 | |a DNA-Protein binding; Weakly supervised | |
650 | 4 | |a multiple-instance learning; bioinformatics; transcription factor binding site prediction | |
700 | 1 | |a Chen, Baitong |e verfasserin |4 aut | |
700 | 1 | |a Zhang, Yue |e verfasserin |4 aut | |
773 | 0 | 8 | |i Enthalten in |t Current computer-aided drug design |d 2008 |g (2024) vom: 12. Feb. |w (DE-627)NLM191691046 |x 1875-6697 |7 nnns |
773 | 1 | 8 | |g year:2024 |g day:12 |g month:02 |
856 | 4 | 0 | |u http://dx.doi.org/10.2174/0115734099277249240129114123 |3 Volltext |
912 | |a GBV_USEFLAG_A | ||
912 | |a GBV_NLM | ||
951 | |a AR | ||
952 | |j 2024 |b 12 |c 02 |