Can Deep Learning Recognize Subtle Human Activities?
Deep Learning has driven recent and exciting progress in computer vision, instilling the belief that these algorithms could solve any visual task. Yet, datasets commonly used to train and test computer vision algorithms have pervasive confounding factors. Such biases make it difficult to truly estimate the performance of those algorithms and how well computer vision models can extrapolate outside the distribution in which they were trained. In this work, we propose a new action classification challenge that is performed well by humans, but poorly by state-of-the-art Deep Learning models. As a proof-of-principle, we consider three exemplary tasks: drinking, reading, and sitting. The best accuracies reached using state-of-the-art computer vision models were 61.7%, 62.8%, and 76.8%, respectively, while human participants scored above 90% accuracy on the three tasks. We propose a rigorous method to reduce confounds when creating datasets, and when comparing human versus computer vision performance. Source code and datasets are publicly available.
Medienart: |
Artikel |
---|
Erscheinungsjahr: |
2020 |
---|---|
Erschienen: |
2020 |
Enthalten in: |
Zur Gesamtaufnahme - volume:2020 |
---|---|
Enthalten in: |
Conference on Computer Vision and Pattern Recognition Workshops. IEEE Computer Society Conference on Computer Vision and Pattern Recognition. Workshops - 2020(2020) vom: 01. Juni |
Sprache: |
Englisch |
---|
Beteiligte Personen: |
Jacquot, Vincent [VerfasserIn] |
---|
Themen: |
---|
Anmerkungen: |
Date Revised 22.08.2022 published: Print-Electronic Citation Status PubMed-not-MEDLINE |
---|
Förderinstitution / Projekttitel: |
|
---|
PPN (Katalog-ID): |
NLM328358916 |
---|
LEADER | 01000naa a22002652 4500 | ||
---|---|---|---|
001 | NLM328358916 | ||
003 | DE-627 | ||
005 | 20231225202753.0 | ||
007 | tu | ||
008 | 231225s2020 xx ||||| 00| ||eng c | ||
028 | 5 | 2 | |a pubmed24n1094.xml |
035 | |a (DE-627)NLM328358916 | ||
035 | |a (NLM)34290902 | ||
035 | |a (PII)19874023 | ||
040 | |a DE-627 |b ger |c DE-627 |e rakwb | ||
041 | |a eng | ||
100 | 1 | |a Jacquot, Vincent |e verfasserin |4 aut | |
245 | 1 | 0 | |a Can Deep Learning Recognize Subtle Human Activities? |
264 | 1 | |c 2020 | |
336 | |a Text |b txt |2 rdacontent | ||
337 | |a ohne Hilfsmittel zu benutzen |b n |2 rdamedia | ||
338 | |a Band |b nc |2 rdacarrier | ||
500 | |a Date Revised 22.08.2022 | ||
500 | |a published: Print-Electronic | ||
500 | |a Citation Status PubMed-not-MEDLINE | ||
520 | |a Deep Learning has driven recent and exciting progress in computer vision, instilling the belief that these algorithms could solve any visual task. Yet, datasets commonly used to train and test computer vision algorithms have pervasive confounding factors. Such biases make it difficult to truly estimate the performance of those algorithms and how well computer vision models can extrapolate outside the distribution in which they were trained. In this work, we propose a new action classification challenge that is performed well by humans, but poorly by state-of-the-art Deep Learning models. As a proof-of-principle, we consider three exemplary tasks: drinking, reading, and sitting. The best accuracies reached using state-of-the-art computer vision models were 61.7%, 62.8%, and 76.8%, respectively, while human participants scored above 90% accuracy on the three tasks. We propose a rigorous method to reduce confounds when creating datasets, and when comparing human versus computer vision performance. Source code and datasets are publicly available | ||
650 | 4 | |a Journal Article | |
700 | 1 | |a Ying, Zhuofan |e verfasserin |4 aut | |
700 | 1 | |a Kreiman, Gabriel |e verfasserin |4 aut | |
773 | 0 | 8 | |i Enthalten in |t Conference on Computer Vision and Pattern Recognition Workshops. IEEE Computer Society Conference on Computer Vision and Pattern Recognition. Workshops |d 2006 |g 2020(2020) vom: 01. Juni |w (DE-627)NLM209045442 |x 2160-7508 |7 nnns |
773 | 1 | 8 | |g volume:2020 |g year:2020 |g day:01 |g month:06 |
912 | |a GBV_USEFLAG_A | ||
912 | |a GBV_NLM | ||
951 | |a AR | ||
952 | |d 2020 |j 2020 |b 01 |c 06 |