Learning Deep Neural Networks for Vehicle Re-ID with Visual-spatio-temporal Path Proposals 学习笔记

Abstract and Introduce

Learning Deep Neural Networks for Vehicle Re-ID with Visual-spatio-temporal Path Proposals 学习笔记

In this paper, we propose a two-stage framework that incorporates complex spatio-temporal information for effectively regularizing the re-identification results.
If a vehicle is observed at both camera A and C, the same vehicle has to appear at camera B as well. Therefore, given a pair of vehicle images at location A and C, if an image with similar appearance is never observed at camera B at a proper time, their matching confidence should be very low.

Learning Deep Neural Networks for Vehicle Re-ID with Visual-spatio-temporal Path Proposals 学习笔记
The main contribution of our method is two-fold. (1) We propose a two-stage framework for vehicle re-identification. It first proposes a series of candidate visual-spatio-temporal paths with the query images as the starting and ending states. In the second stage, a Siamese-CNN+Path-LSTM network is utilized to determine whether each query pair has the same vehicle identity with the spatio-temporal regularization from the candidate path. In this way, all the visual-spatio-temporal states along the candidate path are effectively incorporated to estimate the validness confidence of the path. Such information is for the first time explored for vehicle re-identification. (2) To effectively generate visual-spatio-temporal path proposals, we model the paths by chain MRF, which could be optimized efficiently by the max-sum algorithm. A deep neural network is proposed to learn the pairwise visual spatio-temporal potential function.

(1)本文提出了一个两阶段的车辆重新识别框架。它首先提出了一系列候选视觉-时空路径,使用查询到的图像作为起始和结束状态。 在第二阶段,利用Siamese-CNN+Path-LSTM网络来判定每个查询图像对(A、C)是否与 来自候选路径的时空合理的图像(C) 具有相同的车辆标识。以这种方式,沿候选路径的所有视觉-时空状态被有效地结合以估计路径的置信度。这些信息是用来完成第一次车辆识别的探索。(2)为了有效地达到生成视觉-时空路径的目的,我们通过链MRF对路径建模,可以通过max-sum算法有效地优化。 本文提出了一种深度神经网络,用来学习成对图像的视觉-时空函数。

Approach

3.1. Visual-spatio-temporal path proposals

3.1.1 Chain MRF model for visual-spatio-temporal path proposal
马尔科夫随机场是典型的马尔可夫网(系统下一时刻的状态由当前状态决定,不依赖于以往的任何状态),是一种无向图模型,马尔科夫随机场用一组势函数,这是定义变量子集上的非负实函数,主要用于定义概率分布函数。

Learning Deep Neural Networks for Vehicle Re-ID with Visual-spatio-temporal Path Proposals 学习笔记

没看明白马尔可夫随机场是怎么应用的,应用结果是什么?

3.1.2 Deep neural networks as pairwise potential functions
Learning Deep Neural Networks for Vehicle Re-ID with Visual-spatio-temporal Path Proposals 学习笔记

利用外观相似性和时空相似性来判断两个图片是否是同一辆车

The visual branch (Siamese-Visual) is designed as a Siamese network with a shared ResNet-50. The visual similarity between the two images is computed as the inner-product of the two “global pooling” features followed by a sigmoid function.
Given the timestamps {ti,k, ti+1,j} and the two geolocations {li, li+1} of at cameras i and i + 1, the input features of the branch are calculated as their time difference and spatial difference

The outputs of the two branches are concatenated and input into a 2 × 1 fully-connected layer with a sigmoid function to obtain the final compatibility between the two states, which takes all visual, spatial and temporal information into consideration.

3.2. Siamese-CNN+Path-LSTM for query pair classification

Learning Deep Neural Networks for Vehicle Re-ID with Visual-spatio-temporal Path Proposals 学习笔记

Learning Deep Neural Networks for Vehicle Re-ID with Visual-spatio-temporal Path Proposals 学习笔记

Given a pair of queries, we utilize their candidate visual-spatio-temporal path as priors to determine whether the query pair has the same identity or not with a Siamese-CNN+path-LSTM network.

Experiments

4.1. Dataset and evaluation metric
The mean average precision (mAP), top-1 accuracy and top-10 accuracy are chosen as the evaluation metric
4.2. Compared With Vehicle Re-ID methods

4.3. Experiment Results
Learning Deep Neural Networks for Vehicle Re-ID with Visual-spatio-temporal Path Proposals 学习笔记

Learning Deep Neural Networks for Vehicle Re-ID with Visual-spatio-temporal Path Proposals 学习笔记

5. Conclusions

Our proposed approach incorporates important visual-spatial-temporal path information for regularization. A chain MRF model with deeply learned pairwise potential function is adopted to generate visual-spatio-temporal path proposals. Such candidate path proposals are evaluated by a Siamese-CNN+Path-LSTM to obtain similarity scores between pairs of queries. The proposed approach outperforms state-of-the-arts methods on the VeRi-776 dataset.