Single-sequence protein structure prediction by integrating protein language models
Protein structure prediction has been greatly improved by deep learning in the past few years. However, the most successful methods rely on multiple sequence alignment (MSA) of the sequence homologs of the protein under prediction. In nature, a protein folds in the absence of its sequence homologs and thus, a MSA-free structure prediction method is desired. Here, we develop a single-sequence-based protein structure prediction method RaptorX-Single by integrating several protein language models and a structure generation module and then study its advantage over MSA-based methods. Our experimental results indicate that in addition to running much faster than MSA-based methods such as AlphaFold2, RaptorX-Single outperforms AlphaFold2 and other MSA-free methods in predicting the structure of antibodies (after fine-tuning on antibody data), proteins of very few sequence homologs, and single mutation effects. By comparing different protein language models, our results show that not only the scale but also the training data of protein language models will impact the performance. RaptorX-Single also compares favorably to MSA-based AlphaFold2 when the protein under prediction has a large number of sequence homologs.
Medienart: |
E-Artikel |
---|
Erscheinungsjahr: |
2024 |
---|---|
Erschienen: |
2024 |
Enthalten in: |
Zur Gesamtaufnahme - volume:121 |
---|---|
Enthalten in: |
Proceedings of the National Academy of Sciences of the United States of America - 121(2024), 13 vom: 26. März, Seite e2308788121 |
Sprache: |
Englisch |
---|
Beteiligte Personen: |
Jing, Xiaoyang [VerfasserIn] |
---|
Links: |
---|
Themen: |
Antibodies |
---|
Anmerkungen: |
Date Completed 22.03.2024 Date Revised 05.04.2024 published: Print-Electronic Citation Status MEDLINE |
---|
doi: |
10.1073/pnas.2308788121 |
---|
funding: |
|
---|---|
Förderinstitution / Projekttitel: |
|
PPN (Katalog-ID): |
NLM369970489 |
---|
LEADER | 01000caa a22002652 4500 | ||
---|---|---|---|
001 | NLM369970489 | ||
003 | DE-627 | ||
005 | 20240405233912.0 | ||
007 | cr uuu---uuuuu | ||
008 | 240322s2024 xx |||||o 00| ||eng c | ||
024 | 7 | |a 10.1073/pnas.2308788121 |2 doi | |
028 | 5 | 2 | |a pubmed24n1366.xml |
035 | |a (DE-627)NLM369970489 | ||
035 | |a (NLM)38507445 | ||
040 | |a DE-627 |b ger |c DE-627 |e rakwb | ||
041 | |a eng | ||
100 | 1 | |a Jing, Xiaoyang |e verfasserin |4 aut | |
245 | 1 | 0 | |a Single-sequence protein structure prediction by integrating protein language models |
264 | 1 | |c 2024 | |
336 | |a Text |b txt |2 rdacontent | ||
337 | |a ƒaComputermedien |b c |2 rdamedia | ||
338 | |a ƒa Online-Ressource |b cr |2 rdacarrier | ||
500 | |a Date Completed 22.03.2024 | ||
500 | |a Date Revised 05.04.2024 | ||
500 | |a published: Print-Electronic | ||
500 | |a Citation Status MEDLINE | ||
520 | |a Protein structure prediction has been greatly improved by deep learning in the past few years. However, the most successful methods rely on multiple sequence alignment (MSA) of the sequence homologs of the protein under prediction. In nature, a protein folds in the absence of its sequence homologs and thus, a MSA-free structure prediction method is desired. Here, we develop a single-sequence-based protein structure prediction method RaptorX-Single by integrating several protein language models and a structure generation module and then study its advantage over MSA-based methods. Our experimental results indicate that in addition to running much faster than MSA-based methods such as AlphaFold2, RaptorX-Single outperforms AlphaFold2 and other MSA-free methods in predicting the structure of antibodies (after fine-tuning on antibody data), proteins of very few sequence homologs, and single mutation effects. By comparing different protein language models, our results show that not only the scale but also the training data of protein language models will impact the performance. RaptorX-Single also compares favorably to MSA-based AlphaFold2 when the protein under prediction has a large number of sequence homologs | ||
650 | 4 | |a Journal Article | |
650 | 4 | |a antibody structure prediction | |
650 | 4 | |a protein language model | |
650 | 4 | |a protein structure prediction | |
650 | 4 | |a single mutation effect | |
650 | 4 | |a single-sequence protein structure rediction | |
650 | 7 | |a Proteins |2 NLM | |
650 | 7 | |a Antibodies |2 NLM | |
700 | 1 | |a Wu, Fandi |e verfasserin |4 aut | |
700 | 1 | |a Luo, Xiao |e verfasserin |4 aut | |
700 | 1 | |a Xu, Jinbo |e verfasserin |4 aut | |
773 | 0 | 8 | |i Enthalten in |t Proceedings of the National Academy of Sciences of the United States of America |d 1915 |g 121(2024), 13 vom: 26. März, Seite e2308788121 |w (DE-627)NLM000008982 |x 1091-6490 |7 nnns |
773 | 1 | 8 | |g volume:121 |g year:2024 |g number:13 |g day:26 |g month:03 |g pages:e2308788121 |
856 | 4 | 0 | |u http://dx.doi.org/10.1073/pnas.2308788121 |3 Volltext |
912 | |a GBV_USEFLAG_A | ||
912 | |a GBV_NLM | ||
951 | |a AR | ||
952 | |d 121 |j 2024 |e 13 |b 26 |c 03 |h e2308788121 |