Latent Attention Network With Position Perception for Visual Question Answering
For exploring the complex relative position relationships among multiobject with multiple position prepositions in the question, we propose a novel latent attention (LA) network for visual question answering (VQA), in which LA with position perception is extracted by a novel LA generation module (LAGM) and encoded along with absolute and relative position relations by our proposed position-aware module (PAM). The LAGM reconstructs original attention into LA by capturing the tendency of visual attention shifting according to the position prepositions in the question. The LA accurately captures the complex relative position features of multiple objects and helps the model locate the attention to the correct object or region. The PAM adopts latent state and relative position relations to enhance the capability of comprehending the multiobject correlations. In addition, we also propose a novel gated counting module (GCM) to strengthen the sensitivity of quantitative knowledge for effectively improving the performance of counting questions. Extensive experiments demonstrate that our proposed method achieves excellent performance on VQA and outperforms state-of-the-art methods on the widely used datasets VQA v2 and VQA v1.
Medienart: |
E-Artikel |
---|
Erscheinungsjahr: |
2024 |
---|---|
Erschienen: |
2024 |
Enthalten in: |
Zur Gesamtaufnahme - volume:PP |
---|---|
Enthalten in: |
IEEE transactions on neural networks and learning systems - PP(2024) vom: 26. März |
Sprache: |
Englisch |
---|
Beteiligte Personen: |
Zhang, Jing [VerfasserIn] |
---|
Links: |
---|
Themen: |
---|
Anmerkungen: |
Date Revised 26.03.2024 published: Print-Electronic Citation Status Publisher |
---|
doi: |
10.1109/TNNLS.2024.3377636 |
---|
funding: |
|
---|---|
Förderinstitution / Projekttitel: |
|
PPN (Katalog-ID): |
NLM370202848 |
---|
LEADER | 01000naa a22002652 4500 | ||
---|---|---|---|
001 | NLM370202848 | ||
003 | DE-627 | ||
005 | 20240328000643.0 | ||
007 | cr uuu---uuuuu | ||
008 | 240328s2024 xx |||||o 00| ||eng c | ||
024 | 7 | |a 10.1109/TNNLS.2024.3377636 |2 doi | |
028 | 5 | 2 | |a pubmed24n1351.xml |
035 | |a (DE-627)NLM370202848 | ||
035 | |a (NLM)38530725 | ||
040 | |a DE-627 |b ger |c DE-627 |e rakwb | ||
041 | |a eng | ||
100 | 1 | |a Zhang, Jing |e verfasserin |4 aut | |
245 | 1 | 0 | |a Latent Attention Network With Position Perception for Visual Question Answering |
264 | 1 | |c 2024 | |
336 | |a Text |b txt |2 rdacontent | ||
337 | |a ƒaComputermedien |b c |2 rdamedia | ||
338 | |a ƒa Online-Ressource |b cr |2 rdacarrier | ||
500 | |a Date Revised 26.03.2024 | ||
500 | |a published: Print-Electronic | ||
500 | |a Citation Status Publisher | ||
520 | |a For exploring the complex relative position relationships among multiobject with multiple position prepositions in the question, we propose a novel latent attention (LA) network for visual question answering (VQA), in which LA with position perception is extracted by a novel LA generation module (LAGM) and encoded along with absolute and relative position relations by our proposed position-aware module (PAM). The LAGM reconstructs original attention into LA by capturing the tendency of visual attention shifting according to the position prepositions in the question. The LA accurately captures the complex relative position features of multiple objects and helps the model locate the attention to the correct object or region. The PAM adopts latent state and relative position relations to enhance the capability of comprehending the multiobject correlations. In addition, we also propose a novel gated counting module (GCM) to strengthen the sensitivity of quantitative knowledge for effectively improving the performance of counting questions. Extensive experiments demonstrate that our proposed method achieves excellent performance on VQA and outperforms state-of-the-art methods on the widely used datasets VQA v2 and VQA v1 | ||
650 | 4 | |a Journal Article | |
700 | 1 | |a Liu, Xiaoqiang |e verfasserin |4 aut | |
700 | 1 | |a Wang, Zhe |e verfasserin |4 aut | |
773 | 0 | 8 | |i Enthalten in |t IEEE transactions on neural networks and learning systems |d 2012 |g PP(2024) vom: 26. März |w (DE-627)NLM23236897X |x 2162-2388 |7 nnns |
773 | 1 | 8 | |g volume:PP |g year:2024 |g day:26 |g month:03 |
856 | 4 | 0 | |u http://dx.doi.org/10.1109/TNNLS.2024.3377636 |3 Volltext |
912 | |a GBV_USEFLAG_A | ||
912 | |a GBV_NLM | ||
951 | |a AR | ||
952 | |d PP |j 2024 |b 26 |c 03 |