Details der Publikation - Latent Attention Network With Position Perception for Visual Question Answering

Latent Attention Network With Position Perception for Visual Question Answering

For exploring the complex relative position relationships among multiobject with multiple position prepositions in the question, we propose a novel latent attention (LA) network for visual question answering (VQA), in which LA with position perception is extracted by a novel LA generation module (LAGM) and encoded along with absolute and relative position relations by our proposed position-aware module (PAM). The LAGM reconstructs original attention into LA by capturing the tendency of visual attention shifting according to the position prepositions in the question. The LA accurately captures the complex relative position features of multiple objects and helps the model locate the attention to the correct object or region. The PAM adopts latent state and relative position relations to enhance the capability of comprehending the multiobject correlations. In addition, we also propose a novel gated counting module (GCM) to strengthen the sensitivity of quantitative knowledge for effectively improving the performance of counting questions. Extensive experiments demonstrate that our proposed method achieves excellent performance on VQA and outperforms state-of-the-art methods on the widely used datasets VQA v2 and VQA v1.

Medienart:	E-Artikel

Erscheinungsjahr:	2024
Erschienen:	2024

Enthalten in:	Zur Gesamtaufnahme - volume:PP
Enthalten in:	IEEE transactions on neural networks and learning systems - PP(2024) vom: 26. März

Sprache:	Englisch

Beteiligte Personen:	Zhang, Jing [VerfasserIn] Liu, Xiaoqiang [VerfasserIn] Wang, Zhe [VerfasserIn]

Links:	Volltext

Themen:	Journal Article

Anmerkungen:	Date Revised 26.03.2024 published: Print-Electronic Citation Status Publisher

doi:	10.1109/TNNLS.2024.3377636

funding:
Förderinstitution / Projekttitel:

PPN (Katalog-ID):	NLM370202848

Internformat


LEADER	01000naa a22002652 4500
001	NLM370202848
003	DE-627
005	20240328000643.0
007	cr uuu---uuuuu
008	240328s2024 xx \|\|\|\|\|o 00\| \|\|eng c
024	7		\|a 10.1109/TNNLS.2024.3377636 \|2 doi
028	5	2	\|a pubmed24n1351.xml
035			\|a (DE-627)NLM370202848
035			\|a (NLM)38530725
040			\|a DE-627 \|b ger \|c DE-627 \|e rakwb
041			\|a eng
100	1		\|a Zhang, Jing \|e verfasserin \|4 aut
245	1	0	\|a Latent Attention Network With Position Perception for Visual Question Answering
264		1	\|c 2024
336			\|a Text \|b txt \|2 rdacontent
337			\|a ƒaComputermedien \|b c \|2 rdamedia
338			\|a ƒa Online-Ressource \|b cr \|2 rdacarrier
500			\|a Date Revised 26.03.2024
500			\|a published: Print-Electronic
500			\|a Citation Status Publisher
520			\|a For exploring the complex relative position relationships among multiobject with multiple position prepositions in the question, we propose a novel latent attention (LA) network for visual question answering (VQA), in which LA with position perception is extracted by a novel LA generation module (LAGM) and encoded along with absolute and relative position relations by our proposed position-aware module (PAM). The LAGM reconstructs original attention into LA by capturing the tendency of visual attention shifting according to the position prepositions in the question. The LA accurately captures the complex relative position features of multiple objects and helps the model locate the attention to the correct object or region. The PAM adopts latent state and relative position relations to enhance the capability of comprehending the multiobject correlations. In addition, we also propose a novel gated counting module (GCM) to strengthen the sensitivity of quantitative knowledge for effectively improving the performance of counting questions. Extensive experiments demonstrate that our proposed method achieves excellent performance on VQA and outperforms state-of-the-art methods on the widely used datasets VQA v2 and VQA v1
650		4	\|a Journal Article
700	1		\|a Liu, Xiaoqiang \|e verfasserin \|4 aut
700	1		\|a Wang, Zhe \|e verfasserin \|4 aut
773	0	8	\|i Enthalten in \|t IEEE transactions on neural networks and learning systems \|d 2012 \|g PP(2024) vom: 26. März \|w (DE-627)NLM23236897X \|x 2162-2388 \|7 nnns
773	1	8	\|g volume:PP \|g year:2024 \|g day:26 \|g month:03
856	4	0	\|u http://dx.doi.org/10.1109/TNNLS.2024.3377636 \|3 Volltext
912			\|a GBV_USEFLAG_A
912			\|a GBV_NLM
951			\|a AR
952			\|d PP \|j 2024 \|b 26 \|c 03

Latent Attention Network With Position Perception for Visual Question Answering

Zugang & Verfügbarkeit

Zugehörige Publikationen/Bände