Paper Notes of CVPR-0313
HashGAN: Deep Learning to Hash with Pair Conditional Wasserstein GAN
Abstract
Deep learning to hash need similarity information that is expensive to collect, which often results in substantial loss of image retrieval quality. This paper presents HshGAN, a novel architecture for deep learning to hash, whose idea is to use nearly real images synthesized from a new Pair Conditional Wasserstein GAN (PC-WGAN) conditioned on the pairwise similarity information for data augmentation.
Introduction
This paper focus on building the data-dependent hash encoding schemes which perform better than ddata-independent methods for efficient image retrieval. Since deep learning to hash need large-scale image data and sufficient supervised information, which is not suitable on many image retrieval applications, the author propose PC-WGAN that can learn compact binary hash codes from both real and large-scale synthesized images. PC-WGAN is the first GAN that enables image synthesis by incorporating pairwise similarity information. It can be trained end-to-end by back-propagation in a minimax optimization mechanism.
Related work
The authors mainly present two parts related works: Hashing Methods and Generative Models, and propose the superiority of their method in the end.
Method
The architecture of HashGAN is shown in Figure 1.
It include two parts: a pair conditional Wasserstein GAN (G and D) and a hash encoder F.
The optimization problems for discriminator D, generator G and hash encoder F are respectively computed as follows:
More details about the HashGAN please refer to “HashGAN: Deep Learning to Hash with Pair Conditional Wasserstein GAN”
Experiment
The authors evaluate the efficacy of the proposed HashGAN approach with eight state-of-the-art shallow and deep hashing methods on three benchmark datasets. The MAP(Mean Average Precision) is shown in Table 1.
The proposed HashGAN improves substantially by two perspectives: (1) HashGAN invite a novel Pair Conditional Wasserstein GAN (PC-WGAN) to synthesize nearly real images as training data which alleviate the problem of insufficient training data. (2) The model use a new loss function which can approximate the Hamming distance more accurately to learn nearly lossless hash codes.
Then the authors provide other important metrics(to this specific task) to verify the efficacy of their methods.
At the end of chapter 4, the ablation study and visualization study are shown.
In above, HashGAN-B serves as the upper bound of performance; HashGAN-Q is the HashGAN variant without using the proposed quantization loss; HashGAN-C is the variant by replacing the proposed cosine cross-entropy loss with the widely-used inner-product cross-entropy loss; HashGAN-G is the variant without the proposed PC-WGAN.
About the visualization study, some results are shown in Figure 5.
Conclusion
This paper propose a novel HashGAN which can synthesize nearly real images conditioned on the pairwise similarity information to alleviate the problem of insufficient of similarity information and improve the quality of compact binary hash codes.