Scalable Temporal Localization of Sensitive Activities in Movies and TV Episodes
To help customers make better-informed viewing choices, video-streaming services try to moderate their content and provide more visibility into which portions of their movies and TV episodes contain age-appropriate material (e.g., nudity, sex, violence, or drug-use). Supervised models to localize these sensitive activities require large amounts of clip-level labeled data which is hard to obtain, while weakly-supervised models to this end usually do not offer competitive accuracy. To address this challenge, we propose a novel Coarse2Fine network designed to make use of readily obtainable video-level weak labels in conjunction with sparse clip-level labels of age-appropriate activities. Our model aggregates frame-level predictions to make video-level classifications and is therefore able to leverage sparse clip-level labels along with video-level labels. Furthermore, by performing frame-level predictions in a hierarchical manner, our approach is able to overcome the label-imbalance problem caused due to the rare-occurrence nature of age-appropriate content. We present comparative results of our approach using 41,234 movies and TV episodes (~3 years of video-content) from 521 sub-genres and 250 countries making it by far the largest-scale empirical analysis of age-appropriate activity localization in long-form videos ever published. Our approach offers 107.2% relative mAP improvement (from 5.5% to 11.4%) over existing state-of-the-art activity-localization approaches..
Medienart: |
Preprint |
---|
Erscheinungsjahr: |
2022 |
---|---|
Erschienen: |
2022 |
Enthalten in: |
arXiv.org - (2022) vom: 16. Juni Zur Gesamtaufnahme - year:2022 |
---|
Sprache: |
Englisch |
---|
Beteiligte Personen: |
Hao, Xiang [VerfasserIn] |
---|
Links: |
Volltext [kostenfrei] |
---|
Förderinstitution / Projekttitel: |
|
---|
PPN (Katalog-ID): |
XAR036309605 |
---|
LEADER | 01000caa a22002652 4500 | ||
---|---|---|---|
001 | XAR036309605 | ||
003 | DE-627 | ||
005 | 20230429063739.0 | ||
007 | cr uuu---uuuuu | ||
008 | 220620s2022 xx |||||o 00| ||eng c | ||
035 | |a (DE-627)XAR036309605 | ||
035 | |a (arXiv)2206.08429 | ||
040 | |a DE-627 |b ger |c DE-627 |e rakwb | ||
041 | |a eng | ||
082 | 0 | |a 000 |q DE-84 | |
100 | 1 | |a Hao, Xiang |e verfasserin |4 aut | |
245 | 1 | 0 | |a Scalable Temporal Localization of Sensitive Activities in Movies and TV Episodes |
264 | 1 | |c 2022 | |
336 | |a Text |b txt |2 rdacontent | ||
337 | |a Computermedien |b c |2 rdamedia | ||
338 | |a Online-Ressource |b cr |2 rdacarrier | ||
520 | |a To help customers make better-informed viewing choices, video-streaming services try to moderate their content and provide more visibility into which portions of their movies and TV episodes contain age-appropriate material (e.g., nudity, sex, violence, or drug-use). Supervised models to localize these sensitive activities require large amounts of clip-level labeled data which is hard to obtain, while weakly-supervised models to this end usually do not offer competitive accuracy. To address this challenge, we propose a novel Coarse2Fine network designed to make use of readily obtainable video-level weak labels in conjunction with sparse clip-level labels of age-appropriate activities. Our model aggregates frame-level predictions to make video-level classifications and is therefore able to leverage sparse clip-level labels along with video-level labels. Furthermore, by performing frame-level predictions in a hierarchical manner, our approach is able to overcome the label-imbalance problem caused due to the rare-occurrence nature of age-appropriate content. We present comparative results of our approach using 41,234 movies and TV episodes (~3 years of video-content) from 521 sub-genres and 250 countries making it by far the largest-scale empirical analysis of age-appropriate activity localization in long-form videos ever published. Our approach offers 107.2% relative mAP improvement (from 5.5% to 11.4%) over existing state-of-the-art activity-localization approaches. | ||
700 | 1 | |a Chen, Jingxiang |e verfasserin |4 aut | |
700 | 1 | |a Chen, Shixing |e verfasserin |4 aut | |
700 | 1 | |a Saad, Ahmed |e verfasserin |4 aut | |
700 | 1 | |a Hamid, Raffay |e verfasserin |4 aut | |
773 | 0 | 8 | |i Enthalten in |t arXiv.org |g (2022) vom: 16. Juni |
773 | 1 | 8 | |g year:2022 |g day:16 |g month:06 |
856 | 4 | 0 | |u https://arxiv.org/abs/2206.08429 |z kostenfrei |3 Volltext |
912 | |a GBV_XAR | ||
912 | |a SSG-OLC-PHA | ||
951 | |a AR | ||
952 | |j 2022 |b 16 |c 06 | ||
953 | |2 045F |a 000 |