Slimmable transformer with hybrid axial-attention for medical image segmentation
Copyright © 2024 Elsevier Ltd. All rights reserved..
The transformer architecture has achieved remarkable success in medical image analysis owing to its powerful capability for capturing long-range dependencies. However, due to the lack of intrinsic inductive bias in modeling visual structural information, the transformer generally requires a large-scale pre-training schedule, limiting the clinical applications over expensive small-scale medical data. To this end, we propose a slimmable transformer to explore intrinsic inductive bias via position information for medical image segmentation. Specifically, we empirically investigate how different position encoding strategies affect the prediction quality of the region of interest (ROI) and observe that ROIs are sensitive to different position encoding strategies. Motivated by this, we present a novel Hybrid Axial-Attention (HAA) that can be equipped with pixel-level spatial structure and relative position information as inductive bias. Moreover, we introduce a gating mechanism to achieve efficient feature selection and further improve the representation quality over small-scale datasets. Experiments on LGG and COVID-19 datasets prove the superiority of our method over the baseline and previous works. Internal workflow visualization with interpretability is conducted to validate our success better; the proposed slimmable transformer has the potential to be further developed into a visual software tool for improving computer-aided lesion diagnosis and treatment planning.
Medienart: |
E-Artikel |
---|
Erscheinungsjahr: |
2024 |
---|---|
Erschienen: |
2024 |
Enthalten in: |
Zur Gesamtaufnahme - volume:173 |
---|---|
Enthalten in: |
Computers in biology and medicine - 173(2024) vom: 17. Apr., Seite 108370 |
Sprache: |
Englisch |
---|
Beteiligte Personen: |
Hu, Yiyue [VerfasserIn] |
---|
Links: |
---|
Themen: |
Axial-attention |
---|
Anmerkungen: |
Date Completed 17.04.2024 Date Revised 17.04.2024 published: Print-Electronic Citation Status MEDLINE |
---|
doi: |
10.1016/j.compbiomed.2024.108370 |
---|
funding: |
|
---|---|
Förderinstitution / Projekttitel: |
|
PPN (Katalog-ID): |
NLM370542924 |
---|
LEADER | 01000caa a22002652 4500 | ||
---|---|---|---|
001 | NLM370542924 | ||
003 | DE-627 | ||
005 | 20240417232839.0 | ||
007 | cr uuu---uuuuu | ||
008 | 240404s2024 xx |||||o 00| ||eng c | ||
024 | 7 | |a 10.1016/j.compbiomed.2024.108370 |2 doi | |
028 | 5 | 2 | |a pubmed24n1378.xml |
035 | |a (DE-627)NLM370542924 | ||
035 | |a (NLM)38564854 | ||
035 | |a (PII)S0010-4825(24)00454-2 | ||
040 | |a DE-627 |b ger |c DE-627 |e rakwb | ||
041 | |a eng | ||
100 | 1 | |a Hu, Yiyue |e verfasserin |4 aut | |
245 | 1 | 0 | |a Slimmable transformer with hybrid axial-attention for medical image segmentation |
264 | 1 | |c 2024 | |
336 | |a Text |b txt |2 rdacontent | ||
337 | |a ƒaComputermedien |b c |2 rdamedia | ||
338 | |a ƒa Online-Ressource |b cr |2 rdacarrier | ||
500 | |a Date Completed 17.04.2024 | ||
500 | |a Date Revised 17.04.2024 | ||
500 | |a published: Print-Electronic | ||
500 | |a Citation Status MEDLINE | ||
520 | |a Copyright © 2024 Elsevier Ltd. All rights reserved. | ||
520 | |a The transformer architecture has achieved remarkable success in medical image analysis owing to its powerful capability for capturing long-range dependencies. However, due to the lack of intrinsic inductive bias in modeling visual structural information, the transformer generally requires a large-scale pre-training schedule, limiting the clinical applications over expensive small-scale medical data. To this end, we propose a slimmable transformer to explore intrinsic inductive bias via position information for medical image segmentation. Specifically, we empirically investigate how different position encoding strategies affect the prediction quality of the region of interest (ROI) and observe that ROIs are sensitive to different position encoding strategies. Motivated by this, we present a novel Hybrid Axial-Attention (HAA) that can be equipped with pixel-level spatial structure and relative position information as inductive bias. Moreover, we introduce a gating mechanism to achieve efficient feature selection and further improve the representation quality over small-scale datasets. Experiments on LGG and COVID-19 datasets prove the superiority of our method over the baseline and previous works. Internal workflow visualization with interpretability is conducted to validate our success better; the proposed slimmable transformer has the potential to be further developed into a visual software tool for improving computer-aided lesion diagnosis and treatment planning | ||
650 | 4 | |a Journal Article | |
650 | 4 | |a Axial-attention | |
650 | 4 | |a Interpretability | |
650 | 4 | |a Medical image segmentation | |
650 | 4 | |a Position encoding | |
650 | 4 | |a Slimmable transformer | |
700 | 1 | |a Mu, Nan |e verfasserin |4 aut | |
700 | 1 | |a Liu, Lei |e verfasserin |4 aut | |
700 | 1 | |a Zhang, Lei |e verfasserin |4 aut | |
700 | 1 | |a Jiang, Jingfeng |e verfasserin |4 aut | |
700 | 1 | |a Li, Xiaoning |e verfasserin |4 aut | |
773 | 0 | 8 | |i Enthalten in |t Computers in biology and medicine |d 1970 |g 173(2024) vom: 17. Apr., Seite 108370 |w (DE-627)NLM000382272 |x 1879-0534 |7 nnns |
773 | 1 | 8 | |g volume:173 |g year:2024 |g day:17 |g month:04 |g pages:108370 |
856 | 4 | 0 | |u http://dx.doi.org/10.1016/j.compbiomed.2024.108370 |3 Volltext |
912 | |a GBV_USEFLAG_A | ||
912 | |a GBV_NLM | ||
951 | |a AR | ||
952 | |d 173 |j 2024 |b 17 |c 04 |h 108370 |