Cross-Modal Retrieval in the Cooking Context：Learning Semantic Text-Image Embeddings

这是 ACM SIGIR2018的一篇做cross-modal retrieval的文章，paper链接 https://arxiv.org/pdf/1804.11146.pdf，作者是巴黎第六大学的PHD，作者的homepage http://webia.lip6.fr/~carvalho/static/home/，code暂时没有被released出来。
文章要做的事情(recipe retreival)：
输入：image（sentence）+dataset 　　　　　输出：sentence（image） rank list
文章中show出来的实验结果如下所示。
Cross-Modal Retrieval in the Cooking Context：Learning Semantic Text-Image Embeddings
与state-of-the-art比较的实验结果如下所示。

method
文章的framework如下所示。

文章中的三个点：

image与ingredients和instructions concatenation。
image与ingredients。
在训练的过程中采用adaptive strategy，主要思想就是对统计两个triplet loss中不为0的值得个数，然后分别用这个triplet loss除以统计的个数（对triplet loss取平均），实验结果表明这种方法效果比较好。

Cross-Modal Retrieval in the Cooking Context：Learning Semantic Text-Image Embeddings

相关推荐