S-PLM: Structure-aware Protein Language Model via Contrastive Learning between Sequence and Structure

Proteins play an essential role in various biological and engineering processes. Large protein language models (PLMs) present excellent potential to reshape protein research by accelerating the determination of protein function and the design of proteins with the desired functions. The prediction and design capacity of PLMs relies on the representation gained from the protein sequences. However, the lack of crucial 3D structure information in most PLMs restricts the prediction capacity of PLMs in various applications, especially those heavily dependent on 3D structures. To address this issue, we introduce S-PLM, a 3D structure-aware PLM that utilizes multi-view contrastive learning to align the sequence and 3D structure of a protein in a coordinated latent space. S-PLM applies Swin-Transformer on AlphaFold-predicted protein structures to embed the structural information and fuses it into sequence-based embedding from ESM2. Additionally, we provide a library of lightweight tuning tools to adapt S-PLM for diverse protein property prediction tasks. Our results demonstrate superior performance of S-PLM over sequence-only PLMs on all protein clustering and classification tasks, achieving competitiveness comparable to state-of-the-art methods requiring both sequence and structure inputs. S-PLM and its lightweight tuning tools are available at https://github.com/duolinwang/S-PLM/..

Medienart:

Preprint

Erscheinungsjahr:

2024

Erschienen:

2024

Enthalten in:

bioRxiv.org - (2024) vom: 14. Mai Zur Gesamtaufnahme - year:2024

Sprache:

Englisch

Beteiligte Personen:

Wang, Duolin [VerfasserIn]
Pourmirzaei, Mahdi [VerfasserIn]
Abbas, Usman L [VerfasserIn]
Zeng, Shuai [VerfasserIn]
Manshour, Negin [VerfasserIn]
Esmaili, Farzaneh [VerfasserIn]
Poudel, Biplab [VerfasserIn]
Jiang, Yuexu [VerfasserIn]
Shao, Qing [VerfasserIn]
Chen, Jin [VerfasserIn]
Xu, Dong [VerfasserIn]

Links:

Volltext [kostenfrei]

Themen:

570
Biology

doi:

10.1101/2023.08.06.552203

funding:

Förderinstitution / Projekttitel:

PPN (Katalog-ID):

XBI04044824X