Early detection of colorectal cancer by leveraging Dutch primary care consultation notes with free text embeddings

© 2023. The Author(s)..

We aimed to assess the added predictive performance that free-text Dutch consultation notes provide in detecting colorectal cancer in primary care, in comparison to currently used models. We developed, evaluated and compared three prediction models for colorectal cancer (CRC) in a large primary care database with 60,641 patients. The prediction model with both known predictive features and free-text data (with TabTxt AUROC: 0.823) performs statistically significantly better (p < 0.05) than the other two models with only tabular (as used nowadays) and text data, respectively (AUROC Tab: 0.767; Txt: 0.797). The specificity of the two models that use demographics and known CRC features (with specificity Tab: 0.321; TabTxt: 0.335) are higher than that of the model with only free-text (specificity Txt: 0.234). The Txt and, to a lesser degree, TabTxt model are well calibrated, while the Tab model shows slight underprediction at both tails. As expected with an outcome prevalence below 0.01, all models show much uncalibrated predictions in the extreme upper tail (top 1%). Free-text consultation notes show promising results to improve the predictive performance over established prediction models that only use structured features. Clinical future implications for our CRC use case include that such improvement may help lowering the number of referrals for suspected CRC to medical specialists.

Medienart:

E-Artikel

Erscheinungsjahr:

2023

Erschienen:

2023

Enthalten in:

Zur Gesamtaufnahme - volume:13

Enthalten in:

Scientific reports - 13(2023), 1 vom: 04. Juli, Seite 10760

Sprache:

Englisch

Beteiligte Personen:

Luik, Torec T [VerfasserIn]
Abu-Hanna, Ameen [VerfasserIn]
van Weert, Henk C P M [VerfasserIn]
Schut, Martijn C [VerfasserIn]

Links:

Volltext

Themen:

Journal Article
Research Support, Non-U.S. Gov't

Anmerkungen:

Date Completed 06.07.2023

Date Revised 18.07.2023

published: Electronic

Citation Status MEDLINE

doi:

10.1038/s41598-023-37397-2

funding:

Förderinstitution / Projekttitel:

PPN (Katalog-ID):

NLM359039375