Reducing Vision-Answer Biases for Multiple-Choice VQA

Multiple-choice visual question answering (VQA) is a challenging task due to the requirement of thorough multimodal understanding and complicated inter-modality relationship reasoning. To solve the challenge, previous approaches usually resort to different multimodal interaction modules. Despite their effectiveness, we find that existing methods may exploit a new discovered bias (vision-answer bias) to make answer prediction, leading to suboptimal VQA performances and poor generalization. To solve the issues, we propose a Causality-based Multimodal Interaction Enhancement (CMIE) method, which is model-agnostic and can be seamlessly incorporated into a wide range of VQA approaches in a plug-and-play manner. Specifically, our CMIE contains two key components: a causal intervention module and a counterfactual interaction learning module. The former devotes to removing the spurious correlation between the visual content and the answer caused by the vision-answer bias, and the latter helps capture discriminative inter-modality relationships by directly supervising multimodal interaction training via an interactive loss. Extensive experimental results on three public benchmarks and one reorganized dataset show that the proposed method can significantly improve seven representative VQA models, demonstrating the effectiveness and generalizability of the CMIE.

Medienart:

E-Artikel

Erscheinungsjahr:

2023

Erschienen:

2023

Enthalten in:

Zur Gesamtaufnahme - volume:32

Enthalten in:

IEEE transactions on image processing : a publication of the IEEE Signal Processing Society - 32(2023) vom: 09., Seite 4621-4634

Sprache:

Englisch

Beteiligte Personen:

Zhang, Xi [VerfasserIn]
Zhang, Feifei [VerfasserIn]
Xu, Changsheng [VerfasserIn]

Links:

Volltext

Themen:

Journal Article

Anmerkungen:

Date Revised 16.08.2023

published: Print-Electronic

Citation Status PubMed-not-MEDLINE

doi:

10.1109/TIP.2023.3302162

funding:

Förderinstitution / Projekttitel:

PPN (Katalog-ID):

NLM360559506