C2F-TCN : A Framework for Semi- and Fully-Supervised Temporal Action Segmentation
Temporal action segmentation tags action labels for every frame in an input untrimmed video containing multiple actions in a sequence. For the task of temporal action segmentation, we propose an encoder-decoder style architecture named C2F-TCN featuring a "coarse-to-fine" ensemble of decoder outputs. The C2F-TCN framework is enhanced with a novel model agnostic temporal feature augmentation strategy formed by the computationally inexpensive strategy of the stochastic max-pooling of segments. It produces more accurate and well-calibrated supervised results on three benchmark action segmentation datasets. We show that the architecture is flexible for both supervised and representation learning. In line with this, we present a novel unsupervised way to learn frame-wise representation from C2F-TCN. Our unsupervised learning approach hinges on the clustering capabilities of the input features and the formation of multi-resolution features from the decoder's implicit structure. Further, we provide first semi-supervised temporal action segmentation results by merging representation learning with conventional supervised learning. Our semi-supervised learning scheme, called "Iterative-Contrastive-Classify (ICC)", progressively improves in performance with more labeled data. The ICC semi-supervised learning in C2F-TCN, with 40% labeled videos, performs similar to fully supervised counterparts.
Medienart: |
E-Artikel |
---|
Erscheinungsjahr: |
2023 |
---|---|
Erschienen: |
2023 |
Enthalten in: |
Zur Gesamtaufnahme - volume:45 |
---|---|
Enthalten in: |
IEEE transactions on pattern analysis and machine intelligence - 45(2023), 10 vom: 08. Okt., Seite 11484-11501 |
Sprache: |
Englisch |
---|
Beteiligte Personen: |
Singhania, Dipika [VerfasserIn] |
---|
Links: |
---|
Themen: |
---|
Anmerkungen: |
Date Revised 06.09.2023 published: Print-Electronic Citation Status PubMed-not-MEDLINE |
---|
doi: |
10.1109/TPAMI.2023.3284080 |
---|
funding: |
|
---|---|
Förderinstitution / Projekttitel: |
|
PPN (Katalog-ID): |
NLM35791421X |
---|
LEADER | 01000naa a22002652 4500 | ||
---|---|---|---|
001 | NLM35791421X | ||
003 | DE-627 | ||
005 | 20231226073621.0 | ||
007 | cr uuu---uuuuu | ||
008 | 231226s2023 xx |||||o 00| ||eng c | ||
024 | 7 | |a 10.1109/TPAMI.2023.3284080 |2 doi | |
028 | 5 | 2 | |a pubmed24n1192.xml |
035 | |a (DE-627)NLM35791421X | ||
035 | |a (NLM)37289603 | ||
040 | |a DE-627 |b ger |c DE-627 |e rakwb | ||
041 | |a eng | ||
100 | 1 | |a Singhania, Dipika |e verfasserin |4 aut | |
245 | 1 | 0 | |a C2F-TCN |b A Framework for Semi- and Fully-Supervised Temporal Action Segmentation |
264 | 1 | |c 2023 | |
336 | |a Text |b txt |2 rdacontent | ||
337 | |a ƒaComputermedien |b c |2 rdamedia | ||
338 | |a ƒa Online-Ressource |b cr |2 rdacarrier | ||
500 | |a Date Revised 06.09.2023 | ||
500 | |a published: Print-Electronic | ||
500 | |a Citation Status PubMed-not-MEDLINE | ||
520 | |a Temporal action segmentation tags action labels for every frame in an input untrimmed video containing multiple actions in a sequence. For the task of temporal action segmentation, we propose an encoder-decoder style architecture named C2F-TCN featuring a "coarse-to-fine" ensemble of decoder outputs. The C2F-TCN framework is enhanced with a novel model agnostic temporal feature augmentation strategy formed by the computationally inexpensive strategy of the stochastic max-pooling of segments. It produces more accurate and well-calibrated supervised results on three benchmark action segmentation datasets. We show that the architecture is flexible for both supervised and representation learning. In line with this, we present a novel unsupervised way to learn frame-wise representation from C2F-TCN. Our unsupervised learning approach hinges on the clustering capabilities of the input features and the formation of multi-resolution features from the decoder's implicit structure. Further, we provide first semi-supervised temporal action segmentation results by merging representation learning with conventional supervised learning. Our semi-supervised learning scheme, called "Iterative-Contrastive-Classify (ICC)", progressively improves in performance with more labeled data. The ICC semi-supervised learning in C2F-TCN, with 40% labeled videos, performs similar to fully supervised counterparts | ||
650 | 4 | |a Journal Article | |
700 | 1 | |a Rahaman, Rahul |e verfasserin |4 aut | |
700 | 1 | |a Yao, Angela |e verfasserin |4 aut | |
773 | 0 | 8 | |i Enthalten in |t IEEE transactions on pattern analysis and machine intelligence |d 1979 |g 45(2023), 10 vom: 08. Okt., Seite 11484-11501 |w (DE-627)NLM098212257 |x 1939-3539 |7 nnns |
773 | 1 | 8 | |g volume:45 |g year:2023 |g number:10 |g day:08 |g month:10 |g pages:11484-11501 |
856 | 4 | 0 | |u http://dx.doi.org/10.1109/TPAMI.2023.3284080 |3 Volltext |
912 | |a GBV_USEFLAG_A | ||
912 | |a GBV_NLM | ||
951 | |a AR | ||
952 | |d 45 |j 2023 |e 10 |b 08 |c 10 |h 11484-11501 |