Application of the mol2vec Technology to Large-size Data Visualization and Analysis
© 2020 Wiley-VCH Verlag GmbH & Co. KGaA, Weinheim..
Generative Topographic Mapping (GTM) is a dimensionality reduction method, which is widely used for both data visualization and structure-activity modeling. Large dimensionality of the initial data space may require significant computational resources and slow down the GTM construction. Therefore, it may be meaningful to reduce the number of descriptors used for encoding molecular structures. The Principal Component Analysis (PCA), a standard preprocessing tool, suffers from the information loss upon the dimensionality reduction. As an alternative, we propose to use substructure vector embedding provided by the mol2vec technique. In addition to the data dimensionality reduction, this technology also accounts for proximity of substructures in molecular graphs. In this study, dimensionality of large descriptor spaces of ISIDA fragment descriptors or Morgan fingerprints were reduced using either the PCA or the mol2vec method. The latter significantly speeds up GTM training without compromising its predictive power in bioactivity classification tasks.
Medienart: |
E-Artikel |
---|
Erscheinungsjahr: |
2020 |
---|---|
Erschienen: |
2020 |
Enthalten in: |
Zur Gesamtaufnahme - volume:39 |
---|---|
Enthalten in: |
Molecular informatics - 39(2020), 6 vom: 23. Juni, Seite e1900170 |
Sprache: |
Englisch |
---|
Beteiligte Personen: |
Shibayama, Shojiro [VerfasserIn] |
---|
Links: |
---|
Themen: |
Distributed representation |
---|
Anmerkungen: |
Date Completed 09.07.2021 Date Revised 09.07.2021 published: Print-Electronic Citation Status MEDLINE |
---|
doi: |
10.1002/minf.201900170 |
---|
funding: |
|
---|---|
Förderinstitution / Projekttitel: |
|
PPN (Katalog-ID): |
NLM306833379 |
---|
LEADER | 01000naa a22002652 4500 | ||
---|---|---|---|
001 | NLM306833379 | ||
003 | DE-627 | ||
005 | 20231225124236.0 | ||
007 | cr uuu---uuuuu | ||
008 | 231225s2020 xx |||||o 00| ||eng c | ||
024 | 7 | |a 10.1002/minf.201900170 |2 doi | |
028 | 5 | 2 | |a pubmed24n1022.xml |
035 | |a (DE-627)NLM306833379 | ||
035 | |a (NLM)32090493 | ||
040 | |a DE-627 |b ger |c DE-627 |e rakwb | ||
041 | |a eng | ||
100 | 1 | |a Shibayama, Shojiro |e verfasserin |4 aut | |
245 | 1 | 0 | |a Application of the mol2vec Technology to Large-size Data Visualization and Analysis |
264 | 1 | |c 2020 | |
336 | |a Text |b txt |2 rdacontent | ||
337 | |a ƒaComputermedien |b c |2 rdamedia | ||
338 | |a ƒa Online-Ressource |b cr |2 rdacarrier | ||
500 | |a Date Completed 09.07.2021 | ||
500 | |a Date Revised 09.07.2021 | ||
500 | |a published: Print-Electronic | ||
500 | |a Citation Status MEDLINE | ||
520 | |a © 2020 Wiley-VCH Verlag GmbH & Co. KGaA, Weinheim. | ||
520 | |a Generative Topographic Mapping (GTM) is a dimensionality reduction method, which is widely used for both data visualization and structure-activity modeling. Large dimensionality of the initial data space may require significant computational resources and slow down the GTM construction. Therefore, it may be meaningful to reduce the number of descriptors used for encoding molecular structures. The Principal Component Analysis (PCA), a standard preprocessing tool, suffers from the information loss upon the dimensionality reduction. As an alternative, we propose to use substructure vector embedding provided by the mol2vec technique. In addition to the data dimensionality reduction, this technology also accounts for proximity of substructures in molecular graphs. In this study, dimensionality of large descriptor spaces of ISIDA fragment descriptors or Morgan fingerprints were reduced using either the PCA or the mol2vec method. The latter significantly speeds up GTM training without compromising its predictive power in bioactivity classification tasks | ||
650 | 4 | |a Journal Article | |
650 | 4 | |a Research Support, Non-U.S. Gov't | |
650 | 4 | |a Generative topographic mapping | |
650 | 4 | |a QSAR | |
650 | 4 | |a distributed representation | |
650 | 4 | |a fragment descriptors | |
650 | 4 | |a mol2vec | |
650 | 4 | |a substructure vector embedding | |
700 | 1 | |a Marcou, Gilles |e verfasserin |4 aut | |
700 | 1 | |a Horvath, Dragos |e verfasserin |4 aut | |
700 | 1 | |a Baskin, Igor I |e verfasserin |4 aut | |
700 | 1 | |a Funatsu, Kimito |e verfasserin |4 aut | |
700 | 1 | |a Varnek, Alexandre |e verfasserin |4 aut | |
773 | 0 | 8 | |i Enthalten in |t Molecular informatics |d 2010 |g 39(2020), 6 vom: 23. Juni, Seite e1900170 |w (DE-627)NLM209791799 |x 1868-1751 |7 nnns |
773 | 1 | 8 | |g volume:39 |g year:2020 |g number:6 |g day:23 |g month:06 |g pages:e1900170 |
856 | 4 | 0 | |u http://dx.doi.org/10.1002/minf.201900170 |3 Volltext |
912 | |a GBV_USEFLAG_A | ||
912 | |a GBV_NLM | ||
951 | |a AR | ||
952 | |d 39 |j 2020 |e 6 |b 23 |c 06 |h e1900170 |