Instrument-tissue Interaction Detection Framework for Surgical Video Understanding
Instrument-tissue interaction detection task, which helps understand surgical activities, is vital for constructing computer-assisted surgery systems but with many challenges. Firstly, most models represent instrument-tissue interaction in a coarse-grained way which only focuses on classification and lacks the ability to automatically detect instruments and tissues. Secondly, existing works do not fully consider relations between intra-and inter-frame of instruments and tissues. In the paper, we propose to represent instrument-tissue interaction as ⟨instrument class, instrument bounding box, tissue class, tissue bounding box, action class⟩ quintuple and present an Instrument-Tissue Interaction Detection Network (ITIDNet) to detect the quintuple for surgery videos understanding. Specifically, we propose a Snippet Consecutive Feature (SCF) Layer to enhance features by modeling relationships of proposals in the current frame using global context information in the video snippet. We also propose a Spatial Corresponding Attention (SCA) Layer to incorporate features of proposals between adjacent frames through spatial encoding. To reason relationships between instruments and tissues, a Temporal Graph (TG) Layer is proposed with intra-frame connections to exploit relationships between instruments and tissues in the same frame and inter-frame connections to model the temporal information for the same instance. For evaluation, we build a cataract surgery video (PhacoQ) dataset and a cholecystectomy surgery video (CholecQ) dataset. Experimental results demonstrate the promising performance of our model, which outperforms other state-of-the-art models on both datasets.
Medienart: |
E-Artikel |
---|
Erscheinungsjahr: |
2024 |
---|---|
Erschienen: |
2024 |
Enthalten in: |
Zur Gesamtaufnahme - volume:PP |
---|---|
Enthalten in: |
IEEE transactions on medical imaging - PP(2024) vom: 26. März |
Sprache: |
Englisch |
---|
Beteiligte Personen: |
Lin, Wenjun [VerfasserIn] |
---|
Links: |
---|
Themen: |
---|
Anmerkungen: |
Date Revised 26.03.2024 published: Print-Electronic Citation Status Publisher |
---|
doi: |
10.1109/TMI.2024.3381209 |
---|
funding: |
|
---|---|
Förderinstitution / Projekttitel: |
|
PPN (Katalog-ID): |
NLM370202740 |
---|
LEADER | 01000naa a22002652 4500 | ||
---|---|---|---|
001 | NLM370202740 | ||
003 | DE-627 | ||
005 | 20240328000642.0 | ||
007 | cr uuu---uuuuu | ||
008 | 240328s2024 xx |||||o 00| ||eng c | ||
024 | 7 | |a 10.1109/TMI.2024.3381209 |2 doi | |
028 | 5 | 2 | |a pubmed24n1351.xml |
035 | |a (DE-627)NLM370202740 | ||
035 | |a (NLM)38530715 | ||
040 | |a DE-627 |b ger |c DE-627 |e rakwb | ||
041 | |a eng | ||
100 | 1 | |a Lin, Wenjun |e verfasserin |4 aut | |
245 | 1 | 0 | |a Instrument-tissue Interaction Detection Framework for Surgical Video Understanding |
264 | 1 | |c 2024 | |
336 | |a Text |b txt |2 rdacontent | ||
337 | |a ƒaComputermedien |b c |2 rdamedia | ||
338 | |a ƒa Online-Ressource |b cr |2 rdacarrier | ||
500 | |a Date Revised 26.03.2024 | ||
500 | |a published: Print-Electronic | ||
500 | |a Citation Status Publisher | ||
520 | |a Instrument-tissue interaction detection task, which helps understand surgical activities, is vital for constructing computer-assisted surgery systems but with many challenges. Firstly, most models represent instrument-tissue interaction in a coarse-grained way which only focuses on classification and lacks the ability to automatically detect instruments and tissues. Secondly, existing works do not fully consider relations between intra-and inter-frame of instruments and tissues. In the paper, we propose to represent instrument-tissue interaction as ⟨instrument class, instrument bounding box, tissue class, tissue bounding box, action class⟩ quintuple and present an Instrument-Tissue Interaction Detection Network (ITIDNet) to detect the quintuple for surgery videos understanding. Specifically, we propose a Snippet Consecutive Feature (SCF) Layer to enhance features by modeling relationships of proposals in the current frame using global context information in the video snippet. We also propose a Spatial Corresponding Attention (SCA) Layer to incorporate features of proposals between adjacent frames through spatial encoding. To reason relationships between instruments and tissues, a Temporal Graph (TG) Layer is proposed with intra-frame connections to exploit relationships between instruments and tissues in the same frame and inter-frame connections to model the temporal information for the same instance. For evaluation, we build a cataract surgery video (PhacoQ) dataset and a cholecystectomy surgery video (CholecQ) dataset. Experimental results demonstrate the promising performance of our model, which outperforms other state-of-the-art models on both datasets | ||
650 | 4 | |a Journal Article | |
700 | 1 | |a Hu, Yan |e verfasserin |4 aut | |
700 | 1 | |a Fu, Huazhu |e verfasserin |4 aut | |
700 | 1 | |a Yang, Mingming |e verfasserin |4 aut | |
700 | 1 | |a Chng, Chin-Boon |e verfasserin |4 aut | |
700 | 1 | |a Kawasaki, Ryo |e verfasserin |4 aut | |
700 | 1 | |a Chui, Cheekong |e verfasserin |4 aut | |
700 | 1 | |a Liu, Jiang |e verfasserin |4 aut | |
773 | 0 | 8 | |i Enthalten in |t IEEE transactions on medical imaging |d 1982 |g PP(2024) vom: 26. März |w (DE-627)NLM082855269 |x 1558-254X |7 nnns |
773 | 1 | 8 | |g volume:PP |g year:2024 |g day:26 |g month:03 |
856 | 4 | 0 | |u http://dx.doi.org/10.1109/TMI.2024.3381209 |3 Volltext |
912 | |a GBV_USEFLAG_A | ||
912 | |a GBV_NLM | ||
951 | |a AR | ||
952 | |d PP |j 2024 |b 26 |c 03 |