"DompeKeys" : a set of novel substructure-based descriptors for efficient chemical space mapping, development and structural interpretation of machine learning models, and indexing of large databases
© 2024. The Author(s)..
The conversion of chemical structures into computer-readable descriptors, able to capture key structural aspects, is of pivotal importance in the field of cheminformatics and computer-aided drug design. Molecular fingerprints represent a widely employed class of descriptors; however, their generation process is time-consuming for large databases and is susceptible to bias. Therefore, descriptors able to accurately detect predefined structural fragments and devoid of lengthy generation procedures would be highly desirable. To meet additional needs, such descriptors should also be interpretable by medicinal chemists, and suitable for indexing databases with trillions of compounds. To this end, we developed-as integral part of EXSCALATE, Dompé's end-to-end drug discovery platform-the DompeKeys (DK), a new substructure-based descriptor set, which encodes the chemical features that characterize compounds of pharmaceutical interest. DK represent an exhaustive collection of curated SMARTS strings, defining chemical features at different levels of complexity, from specific functional groups and structural patterns to simpler pharmacophoric points, corresponding to a network of hierarchically interconnected substructures. Because of their extended and hierarchical structure, DK can be used, with good performance, in different kinds of applications. In particular, we demonstrate how they are very well suited for effective mapping of chemical space, as well as substructure search and virtual screening. Notably, the incorporation of DK yields highly performing machine learning models for the prediction of both compounds' activity and metabolic reaction occurrence. The protocol to generate the DK is freely available at https://dompekeys.exscalate.eu and is fully integrated with the Molecular Anatomy protocol for the generation and analysis of hierarchically interconnected molecular scaffolds and frameworks, thus providing a comprehensive and flexible tool for drug design applications.
Medienart: |
E-Artikel |
---|
Erscheinungsjahr: |
2024 |
---|---|
Erschienen: |
2024 |
Enthalten in: |
Zur Gesamtaufnahme - volume:16 |
---|---|
Enthalten in: |
Journal of cheminformatics - 16(2024), 1 vom: 23. Feb., Seite 21 |
Sprache: |
Englisch |
---|
Beteiligte Personen: |
Manelfi, Candida [VerfasserIn] |
---|
Links: |
---|
Themen: |
Artificial intelligence |
---|
Anmerkungen: |
Date Revised 27.02.2024 published: Electronic Citation Status PubMed-not-MEDLINE |
---|
doi: |
10.1186/s13321-024-00813-4 |
---|
funding: |
|
---|---|
Förderinstitution / Projekttitel: |
|
PPN (Katalog-ID): |
NLM368859479 |
---|
LEADER | 01000naa a22002652 4500 | ||
---|---|---|---|
001 | NLM368859479 | ||
003 | DE-627 | ||
005 | 20240229164542.0 | ||
007 | cr uuu---uuuuu | ||
008 | 240229s2024 xx |||||o 00| ||eng c | ||
024 | 7 | |a 10.1186/s13321-024-00813-4 |2 doi | |
028 | 5 | 2 | |a pubmed24n1309.xml |
035 | |a (DE-627)NLM368859479 | ||
035 | |a (NLM)38395961 | ||
040 | |a DE-627 |b ger |c DE-627 |e rakwb | ||
041 | |a eng | ||
100 | 1 | |a Manelfi, Candida |e verfasserin |4 aut | |
245 | 1 | 0 | |a "DompeKeys" |b a set of novel substructure-based descriptors for efficient chemical space mapping, development and structural interpretation of machine learning models, and indexing of large databases |
264 | 1 | |c 2024 | |
336 | |a Text |b txt |2 rdacontent | ||
337 | |a ƒaComputermedien |b c |2 rdamedia | ||
338 | |a ƒa Online-Ressource |b cr |2 rdacarrier | ||
500 | |a Date Revised 27.02.2024 | ||
500 | |a published: Electronic | ||
500 | |a Citation Status PubMed-not-MEDLINE | ||
520 | |a © 2024. The Author(s). | ||
520 | |a The conversion of chemical structures into computer-readable descriptors, able to capture key structural aspects, is of pivotal importance in the field of cheminformatics and computer-aided drug design. Molecular fingerprints represent a widely employed class of descriptors; however, their generation process is time-consuming for large databases and is susceptible to bias. Therefore, descriptors able to accurately detect predefined structural fragments and devoid of lengthy generation procedures would be highly desirable. To meet additional needs, such descriptors should also be interpretable by medicinal chemists, and suitable for indexing databases with trillions of compounds. To this end, we developed-as integral part of EXSCALATE, Dompé's end-to-end drug discovery platform-the DompeKeys (DK), a new substructure-based descriptor set, which encodes the chemical features that characterize compounds of pharmaceutical interest. DK represent an exhaustive collection of curated SMARTS strings, defining chemical features at different levels of complexity, from specific functional groups and structural patterns to simpler pharmacophoric points, corresponding to a network of hierarchically interconnected substructures. Because of their extended and hierarchical structure, DK can be used, with good performance, in different kinds of applications. In particular, we demonstrate how they are very well suited for effective mapping of chemical space, as well as substructure search and virtual screening. Notably, the incorporation of DK yields highly performing machine learning models for the prediction of both compounds' activity and metabolic reaction occurrence. The protocol to generate the DK is freely available at https://dompekeys.exscalate.eu and is fully integrated with the Molecular Anatomy protocol for the generation and analysis of hierarchically interconnected molecular scaffolds and frameworks, thus providing a comprehensive and flexible tool for drug design applications | ||
650 | 4 | |a Journal Article | |
650 | 4 | |a Artificial intelligence | |
650 | 4 | |a Chemical pattern search | |
650 | 4 | |a Chemical space | |
650 | 4 | |a Drug design | |
650 | 4 | |a Drug metabolism | |
650 | 4 | |a Machine learning | |
650 | 4 | |a SMARTS | |
650 | 4 | |a Scaffold analysis | |
700 | 1 | |a Tazzari, Valerio |e verfasserin |4 aut | |
700 | 1 | |a Lunghini, Filippo |e verfasserin |4 aut | |
700 | 1 | |a Cerchia, Carmen |e verfasserin |4 aut | |
700 | 1 | |a Fava, Anna |e verfasserin |4 aut | |
700 | 1 | |a Pedretti, Alessandro |e verfasserin |4 aut | |
700 | 1 | |a Stouten, Pieter F W |e verfasserin |4 aut | |
700 | 1 | |a Vistoli, Giulio |e verfasserin |4 aut | |
700 | 1 | |a Beccari, Andrea Rosario |e verfasserin |4 aut | |
773 | 0 | 8 | |i Enthalten in |t Journal of cheminformatics |d 2009 |g 16(2024), 1 vom: 23. Feb., Seite 21 |w (DE-627)NLM194976165 |x 1758-2946 |7 nnns |
773 | 1 | 8 | |g volume:16 |g year:2024 |g number:1 |g day:23 |g month:02 |g pages:21 |
856 | 4 | 0 | |u http://dx.doi.org/10.1186/s13321-024-00813-4 |3 Volltext |
912 | |a GBV_USEFLAG_A | ||
912 | |a GBV_NLM | ||
951 | |a AR | ||
952 | |d 16 |j 2024 |e 1 |b 23 |c 02 |h 21 |