Prototype local–global alignment network for image–text retrieval

Abstract Image–text retrieval is a challenging task due to the requirement of thorough multimodal understanding and precise inter-modality relationship discovery. However, most previous approaches resort to doing global image–text alignment and neglect fine-grained correspondence. Although some works explore local region–word alignment, they usually suffer from a heavy computing burden. In this paper, we propose a prototype local–global alignment (PLGA) network for image–text retrieval by jointly performing the fine-grained local alignment and high-level global alignment. Specifically, our PLGA contains two key components: a prototype-based local alignment module and a multi-scale global alignment module. The former enables efficient fine-grained local matching by combining region–prototype alignment and word–prototype alignment, and the latter helps perceive hierarchical global semantics by exploring multi-scale global correlations between the image and text. Overall, the local and global alignment modules can boost their performances for each other via the unified model. Quantitative and qualitative experimental results on Flickr30K and MS-COCO benchmarks demonstrate that our proposed approach performs favorably against state-of-the-art methods..

Medienart:

Artikel

Erscheinungsjahr:

2022

Erschienen:

2022

Enthalten in:

Zur Gesamtaufnahme - volume:11

Enthalten in:

International journal of multimedia information retrieval - 11(2022), 4 vom: 06. Okt., Seite 525-538

Sprache:

Englisch

Beteiligte Personen:

Meng, Lingtao [VerfasserIn]
Zhang, Feifei [VerfasserIn]
Zhang, Xi [VerfasserIn]
Xu, Changsheng [VerfasserIn]

Links:

Volltext [lizenzpflichtig]

BKL:

54.87 / Multimedia / Multimedia

54.64 / Datenbanken / Datenbanken

Themen:

Global alignment
Image–text retrieval
Local alignment
Prototype

Anmerkungen:

© The Author(s), under exclusive licence to Springer-Verlag London Ltd., part of Springer Nature 2022. Springer Nature or its licensor (e.g. a society or other partner) holds exclusive rights to this article under a publishing agreement with the author(s) or other rightsholder(s); author self-archiving of the accepted manuscript version of this article is solely governed by the terms of such publishing agreement and applicable law.

doi:

10.1007/s13735-022-00258-1

funding:

Förderinstitution / Projekttitel:

PPN (Katalog-ID):

OLC2080172514