SAMPLER : Empirical distribution representations for rapid analysis of whole slide tissue images

Deep learning has revolutionized digital pathology, allowing for automatic analysis of hematoxylin and eosin (H&E) stained whole slide images (WSIs) for diverse tasks. In such analyses, WSIs are typically broken into smaller images called tiles, and a neural network backbone encodes each tile in a feature space. Many recent works have applied attention based deep learning models to aggregate tile-level features into a slide-level representation, which is then used for slide-level prediction tasks. However, training attention models is computationally intensive, necessitating hyperparameter optimization and specialized training procedures. Here, we propose SAMPLER, a fully statistical approach to generate efficient and informative WSI representations by encoding the empirical cumulative distribution functions (CDFs) of multiscale tile features. We demonstrate that SAMPLER-based classifiers are as accurate or better than state-of-the-art fully deep learning attention models for classification tasks including distinction of: subtypes of breast carcinoma (BRCA: AUC=0.911 ± 0.029); subtypes of non-small cell lung carcinoma (NSCLC: AUC=0.940±0.018); and subtypes of renal cell carcinoma (RCC: AUC=0.987±0.006). A major advantage of the SAMPLER representation is that predictive models are >100X faster compared to attention models. Histopathological review confirms that SAMPLER-identified high attention tiles contain tumor morphological features specific to the tumor type, while low attention tiles contain fibrous stroma, blood, or tissue folding artifacts. We further apply SAMPLER concepts to improve the design of attention-based neural networks, yielding a context aware multi-head attention model with increased accuracy for subtype classification within BRCA and RCC (BRCA: AUC=0.921±0.027, and RCC: AUC=0.988±0.010). Finally, we provide theoretical results identifying sufficient conditions for which SAMPLER is optimal. SAMPLER is a fast and effective approach for analyzing WSIs, with greatly improved scalability over attention methods to benefit digital pathology analysis.

Errataetall:

UpdateIn: EBioMedicine. 2023 Dec 14;99:104908. - PMID 38101298

Medienart:

E-Artikel

Erscheinungsjahr:

2023

Erschienen:

2023

Enthalten in:

Zur Gesamtaufnahme - year:2023

Enthalten in:

bioRxiv : the preprint server for biology - (2023) vom: 03. Aug.

Sprache:

Englisch

Beteiligte Personen:

Mukashyaka, Patience [VerfasserIn]
Sheridan, Todd B [VerfasserIn]
Foroughi Pour, Ali [VerfasserIn]
Chuang, Jeffrey H [VerfasserIn]

Links:

Volltext

Themen:

Preprint

Anmerkungen:

Date Revised 18.01.2024

published: Electronic

UpdateIn: EBioMedicine. 2023 Dec 14;99:104908. - PMID 38101298

Citation Status PubMed-not-MEDLINE

doi:

10.1101/2023.08.01.551468

funding:

Förderinstitution / Projekttitel:

PPN (Katalog-ID):

NLM36077086X