Enhancing compound confidence in suspect and non-target screening through machine learning-based retention time prediction

Copyright © 2024 Elsevier Ltd. All rights reserved..

The retention time (RT) of contaminants of emerging concern (CECs) in liquid chromatography-high-resolution mass spectrometry (LC-HRMS) is crucial for database matching in non-targeted screening (NTS) analysis. In this study, we developed a machine learning (ML) model to predict RTs of CECs in NTS analysis. Using 1051 CEC standards, we evaluated Random Forest (RF), XGBoost, Support Vector Regression (SVR), and Artificial Neural Network (ANN) with molecular fingerprints and chemical descriptors to establish an optimal model. The SVR model utilizing chemical descriptors resulted in good predictive capacity with R2ext = 0.850 and r2 = 0.925. The model was further validated through laboratory NTS compound characterization. When applied to examine CEC occurrence in a large wastewater treatment plant, we identified 40 level S1 CECs (confirmed structure by reference standard) and 234 level S2 compounds (probable structure by library spectrum match). The model predicted RTs for level S2 compounds, leading to the classification of 153 level S2 compounds with high confidence (ΔRT <2 min). The model served as a robust filtering mechanism within the analytical framework. This study emphasizes the importance of predicted RTs in NTS analysis and highlights the potential of prediction models. Our research introduces a workflow that enhances NTS analysis by utilizing RT prediction models to determine compound confidence levels.

Medienart:

E-Artikel

Erscheinungsjahr:

2024

Erschienen:

2024

Enthalten in:

Zur Gesamtaufnahme - volume:347

Enthalten in:

Environmental pollution (Barking, Essex : 1987) - 347(2024) vom: 15. Apr., Seite 123763

Sprache:

Englisch

Beteiligte Personen:

Song, Dehao [VerfasserIn]
Tang, Ting [VerfasserIn]
Wang, Rui [VerfasserIn]
Liu, He [VerfasserIn]
Xie, Danping [VerfasserIn]
Zhao, Bo [VerfasserIn]
Dang, Zhi [VerfasserIn]
Lu, Guining [VerfasserIn]

Links:

Volltext

Themen:

Chemical descriptors
Confidence level
Contaminants of emerging concern
Journal Article
Practical applications
Support vector regression

Anmerkungen:

Date Completed 08.04.2024

Date Revised 08.04.2024

published: Print-Electronic

Citation Status MEDLINE

doi:

10.1016/j.envpol.2024.123763

funding:

Förderinstitution / Projekttitel:

PPN (Katalog-ID):

NLM369824334