Bioactive Peptide Recognition Based on NLP Pre-Train Algorithm

Bioactive peptides are defined as peptide sequences within a protein that can regulate important bodily functions through their myriad activities. With the development of machine learning, more computational methods were proposed for bioactive peptides recognition so that this task does not only rely on tedious and time-consuming wet-experiment. But the training and testing process of existing models are limited to small datasets, which affects model performance. Inspired by the success of sequence classification in natural language processing with unlabeled data, we proposed a pre-training method for Bioactive peptides recognition. By pre-trained with large-scale of protein sequences, our method achieved the best performance in multiple functional peptides identification including anti-cancer, anti-diabetic, anti-hypertensive, anti-inflammatory and anti-microbial peptides. Compared with the advanced model, our model's precision, coverage, accuracy and absolute true are improved by 7.2%, 6.9%, 6.1% and 4.2% in the result of 5-fold cross-validation. In addition, the results indicate the model has superior prediction performance in single functional peptides recognition, especially for anti-cancer peptides and anti-microbial peptides which with longer sequences.

Medienart:

E-Artikel

Erscheinungsjahr:

2023

Erschienen:

2023

Enthalten in:

Zur Gesamtaufnahme - volume:20

Enthalten in:

IEEE/ACM transactions on computational biology and bioinformatics - 20(2023), 6 vom: 01. Nov., Seite 3809-3819

Sprache:

Englisch

Beteiligte Personen:

Jiang, Likun [VerfasserIn]
Sun, Nan [VerfasserIn]
Zhang, Yue [VerfasserIn]
Yu, Xinyu [VerfasserIn]
Liu, Xiangrong [VerfasserIn]

Links:

Volltext

Themen:

Anti-Inflammatory Agents
Journal Article
Peptides

Anmerkungen:

Date Completed 26.12.2023

Date Revised 26.12.2023

published: Print-Electronic

Citation Status MEDLINE

doi:

10.1109/TCBB.2023.3323295

funding:

Förderinstitution / Projekttitel:

PPN (Katalog-ID):

NLM363091068