Development and validation of machine learning models for the prediction of SH-2 containing protein tyrosine phosphatase 2 inhibitors
© 2023. The Author(s), under exclusive licence to Springer Nature Switzerland AG..
Discovery and development of a new drug to the market is a highly challenging and resource consuming process. Although, modern drug discovery technologies have enabled the rapid identification of lead compounds, translation of the lead compounds into successful clinical candidates remains a big challenge. In recent years, the availability of massive structural and biological data of diverse small molecules and macromolecules has helped the researchers to deep mine the multidimensional data with the help of artificial intelligence-based predictive tools to draw useful insights on the structural features of biological or therapeutic significance. The aim of this study was to utilize the available data on small molecule (SH2)-containing protein tyrosine phosphatase 2 (SHP2) inhibitors to build and develop machine learning (ML) models that can predict the SHP2 inhibitory potential of new compounds. The dataset contained 2739 unique small molecule SHP2 inhibitors obtained from the BindingDB, ChEMBL and recent literature. After curation of the data, the predictive models such as XGBoost, K nearest neighbours, neural networks were developed and validated through a tenfold cross-validation testing procedure. Out of the seven models developed, the XGBoost model showed an excellent performance with ROC AUC score of 0.96 and accuracy of 0.97 on the test data. Moreover, the Shapley Additive Explanations method was applied to assess a more in-depth understanding of the influence of variables on the model's predictions. In summary, the XGBoost model developed in this study can be useful in the identification of novel SHP2 inhibitors and therefore, can accelerate the discovery of novel therapeutics for cancer therapy.
Medienart: |
E-Artikel |
---|
Erscheinungsjahr: |
2023 |
---|---|
Erschienen: |
2023 |
Enthalten in: |
Zur Gesamtaufnahme - year:2023 |
---|---|
Enthalten in: |
Molecular diversity - (2023) vom: 08. Aug. |
Sprache: |
Englisch |
---|
Beteiligte Personen: |
Adhikari, Nilanjan [VerfasserIn] |
---|
Links: |
---|
Themen: |
Journal Article |
---|
Anmerkungen: |
Date Revised 08.08.2023 published: Print-Electronic Citation Status Publisher |
---|
doi: |
10.1007/s11030-023-10710-x |
---|
funding: |
|
---|---|
Förderinstitution / Projekttitel: |
|
PPN (Katalog-ID): |
NLM360520812 |
---|
LEADER | 01000naa a22002652 4500 | ||
---|---|---|---|
001 | NLM360520812 | ||
003 | DE-627 | ||
005 | 20231226083205.0 | ||
007 | cr uuu---uuuuu | ||
008 | 231226s2023 xx |||||o 00| ||eng c | ||
024 | 7 | |a 10.1007/s11030-023-10710-x |2 doi | |
028 | 5 | 2 | |a pubmed24n1201.xml |
035 | |a (DE-627)NLM360520812 | ||
035 | |a (NLM)37552436 | ||
040 | |a DE-627 |b ger |c DE-627 |e rakwb | ||
041 | |a eng | ||
100 | 1 | |a Adhikari, Nilanjan |e verfasserin |4 aut | |
245 | 1 | 0 | |a Development and validation of machine learning models for the prediction of SH-2 containing protein tyrosine phosphatase 2 inhibitors |
264 | 1 | |c 2023 | |
336 | |a Text |b txt |2 rdacontent | ||
337 | |a ƒaComputermedien |b c |2 rdamedia | ||
338 | |a ƒa Online-Ressource |b cr |2 rdacarrier | ||
500 | |a Date Revised 08.08.2023 | ||
500 | |a published: Print-Electronic | ||
500 | |a Citation Status Publisher | ||
520 | |a © 2023. The Author(s), under exclusive licence to Springer Nature Switzerland AG. | ||
520 | |a Discovery and development of a new drug to the market is a highly challenging and resource consuming process. Although, modern drug discovery technologies have enabled the rapid identification of lead compounds, translation of the lead compounds into successful clinical candidates remains a big challenge. In recent years, the availability of massive structural and biological data of diverse small molecules and macromolecules has helped the researchers to deep mine the multidimensional data with the help of artificial intelligence-based predictive tools to draw useful insights on the structural features of biological or therapeutic significance. The aim of this study was to utilize the available data on small molecule (SH2)-containing protein tyrosine phosphatase 2 (SHP2) inhibitors to build and develop machine learning (ML) models that can predict the SHP2 inhibitory potential of new compounds. The dataset contained 2739 unique small molecule SHP2 inhibitors obtained from the BindingDB, ChEMBL and recent literature. After curation of the data, the predictive models such as XGBoost, K nearest neighbours, neural networks were developed and validated through a tenfold cross-validation testing procedure. Out of the seven models developed, the XGBoost model showed an excellent performance with ROC AUC score of 0.96 and accuracy of 0.97 on the test data. Moreover, the Shapley Additive Explanations method was applied to assess a more in-depth understanding of the influence of variables on the model's predictions. In summary, the XGBoost model developed in this study can be useful in the identification of novel SHP2 inhibitors and therefore, can accelerate the discovery of novel therapeutics for cancer therapy | ||
650 | 4 | |a Journal Article | |
650 | 4 | |a Machine learning | |
650 | 4 | |a QSAR | |
650 | 4 | |a SH2-containing protein tyrosine phosphatase 2 | |
650 | 4 | |a SHP2 inhibitors | |
650 | 4 | |a Virtual screening | |
700 | 1 | |a Ayyannan, Senthil Raja |e verfasserin |4 aut | |
773 | 0 | 8 | |i Enthalten in |t Molecular diversity |d 1997 |g (2023) vom: 08. Aug. |w (DE-627)NLM091914590 |x 1573-501X |7 nnns |
773 | 1 | 8 | |g year:2023 |g day:08 |g month:08 |
856 | 4 | 0 | |u http://dx.doi.org/10.1007/s11030-023-10710-x |3 Volltext |
912 | |a GBV_USEFLAG_A | ||
912 | |a GBV_NLM | ||
951 | |a AR | ||
952 | |j 2023 |b 08 |c 08 |