Details der Publikation - Prototype local–global alignment network for image

Prototype local–global alignment network for image–text retrieval

Abstract Image–text retrieval is a challenging task due to the requirement of thorough multimodal understanding and precise inter-modality relationship discovery. However, most previous approaches resort to doing global image–text alignment and neglect fine-grained correspondence. Although some works explore local region–word alignment, they usually suffer from a heavy computing burden. In this paper, we propose a prototype local–global alignment (PLGA) network for image–text retrieval by jointly performing the fine-grained local alignment and high-level global alignment. Specifically, our PLGA contains two key components: a prototype-based local alignment module and a multi-scale global alignment module. The former enables efficient fine-grained local matching by combining region–prototype alignment and word–prototype alignment, and the latter helps perceive hierarchical global semantics by exploring multi-scale global correlations between the image and text. Overall, the local and global alignment modules can boost their performances for each other via the unified model. Quantitative and qualitative experimental results on Flickr30K and MS-COCO benchmarks demonstrate that our proposed approach performs favorably against state-of-the-art methods..

Medienart:	Artikel

Erscheinungsjahr:	2022
Erschienen:	2022

Enthalten in:	Zur Gesamtaufnahme - volume:11
Enthalten in:	International journal of multimedia information retrieval - 11(2022), 4 vom: 06. Okt., Seite 525-538

Sprache:	Englisch

Beteiligte Personen:	Meng, Lingtao [VerfasserIn] Zhang, Feifei [VerfasserIn] Zhang, Xi [VerfasserIn] Xu, Changsheng [VerfasserIn]

Links:	Volltext [lizenzpflichtig]

BKL:	54.87 / Multimedia / Multimedia 54.64 / Datenbanken / Datenbanken
Themen:	Global alignment Image–text retrieval Local alignment Prototype

Anmerkungen:	© The Author(s), under exclusive licence to Springer-Verlag London Ltd., part of Springer Nature 2022. Springer Nature or its licensor (e.g. a society or other partner) holds exclusive rights to this article under a publishing agreement with the author(s) or other rightsholder(s); author self-archiving of the accepted manuscript version of this article is solely governed by the terms of such publishing agreement and applicable law.

doi:	10.1007/s13735-022-00258-1

funding:
Förderinstitution / Projekttitel:

PPN (Katalog-ID):	OLC2080172514

Internformat


LEADER	01000caa a22002652 4500
001	OLC2080172514
003	DE-627
005	20240405160100.0
007	tu
008	230131s2022 xx \|\|\|\|\| 00\| \|\|eng c
024	7		\|a 10.1007/s13735-022-00258-1 \|2 doi
035			\|a (DE-627)OLC2080172514
035			\|a (DE-He213)s13735-022-00258-1-p
040			\|a DE-627 \|b ger \|c DE-627 \|e rakwb
041			\|a eng
082	0	4	\|a 004 \|a 660 \|a 070 \|a 020 \|q VZ
084			\|a 54.87 \|2 bkl
084			\|a 54.64 \|2 bkl
100	1		\|a Meng, Lingtao \|e verfasserin \|4 aut
245	1	0	\|a Prototype local–global alignment network for image–text retrieval
264		1	\|c 2022
336			\|a Text \|b txt \|2 rdacontent
337			\|a ohne Hilfsmittel zu benutzen \|b n \|2 rdamedia
338			\|a Band \|b nc \|2 rdacarrier
500			\|a © The Author(s), under exclusive licence to Springer-Verlag London Ltd., part of Springer Nature 2022. Springer Nature or its licensor (e.g. a society or other partner) holds exclusive rights to this article under a publishing agreement with the author(s) or other rightsholder(s); author self-archiving of the accepted manuscript version of this article is solely governed by the terms of such publishing agreement and applicable law.
520			\|a Abstract Image–text retrieval is a challenging task due to the requirement of thorough multimodal understanding and precise inter-modality relationship discovery. However, most previous approaches resort to doing global image–text alignment and neglect fine-grained correspondence. Although some works explore local region–word alignment, they usually suffer from a heavy computing burden. In this paper, we propose a prototype local–global alignment (PLGA) network for image–text retrieval by jointly performing the fine-grained local alignment and high-level global alignment. Specifically, our PLGA contains two key components: a prototype-based local alignment module and a multi-scale global alignment module. The former enables efficient fine-grained local matching by combining region–prototype alignment and word–prototype alignment, and the latter helps perceive hierarchical global semantics by exploring multi-scale global correlations between the image and text. Overall, the local and global alignment modules can boost their performances for each other via the unified model. Quantitative and qualitative experimental results on Flickr30K and MS-COCO benchmarks demonstrate that our proposed approach performs favorably against state-of-the-art methods.
650		4	\|a Image–text retrieval
650		4	\|a Local alignment
650		4	\|a Global alignment
650		4	\|a Prototype
700	1		\|a Zhang, Feifei \|0 (orcid)0000-0002-8153-9977 \|4 aut
700	1		\|a Zhang, Xi \|4 aut
700	1		\|a Xu, Changsheng \|4 aut
773	0	8	\|i Enthalten in \|t International journal of multimedia information retrieval \|d Springer London, 2012 \|g 11(2022), 4 vom: 06. Okt., Seite 525-538 \|w (DE-627)684132834 \|w (DE-600)2647391-4 \|w (DE-576)9684132832 \|x 2192-6611 \|7 nnns
773	1	8	\|g volume:11 \|g year:2022 \|g number:4 \|g day:06 \|g month:10 \|g pages:525-538
856	4	1	\|u https://doi.org/10.1007/s13735-022-00258-1 \|z lizenzpflichtig \|3 Volltext
912			\|a GBV_USEFLAG_A
912			\|a SYSFLAG_A
912			\|a GBV_OLC
912			\|a SSG-OLC-PHA
936	b	k	\|a 54.87 \|j Multimedia \|j Multimedia \|q VZ
936	b	k	\|a 54.64 \|j Datenbanken \|j Datenbanken \|q VZ
951			\|a AR
952			\|d 11 \|j 2022 \|e 4 \|b 06 \|c 10 \|h 525-538

Prototype local–global alignment network for image–text retrieval

Zugang & Verfügbarkeit

Zugehörige Publikationen/Bände