多模态,常用数据集

VQA

Visual Question Answeing.
并回答用自然语言表述的相关问题. 问题包括选择题,数字题, 开放题.

The goal of visual question answering (VQA) (Antol et al., 2015) is to answer a natural language question related to an image. We take VQA v2.0 dataset (Goyal et al., 2017) which reduces the answer bias compared to VQA v1.0. The dataset contains an average of 5.4 questions per image and the total amount of questions is 1.1M.

  • 例子
    多模态,常用数据集
    多模态,常用数据集

参考

  1. paper,VQA
  2. 官网网站,visualqa.org