三维重构(13):SGM经典算法解读

Stereo Processing by Semi-Global Matching and Mutual Information

Semi-Global Matching Algorithm is proposed by “Stereo Processing by Semi-Global Matching and Mutual Information” published in 2008. The Semi-Global Matching (SGM) stereo method uses a pixelwise, Mutual Information based matching cost for compensating radiometric differences of input images. Pixelwise matching is supported by a smoothness constraint that is usually expressed as a global cost function. SGM performs a fast approximation by pathwise optimizations from all directions.

The mutual information & cost function

mutual information

Mutual information is defined as:
MII1,I2=HI1+HI2HI1,I2(1)MI_{I_1,I_2}=H_{I_1}+H_{I_2}-H_{I_1,I_2}\tag{1}, where HI=01PI(i)logPI(i)di(2)H_I=-\int_0^1P_I(i)logP_I(i)d_i\tag{2} and HI1,I2=0101PI1,I2(i1,i2)logPI1,I2(i1,i2)di1di2(3)H_{I_1,I_2}=-\int_0^1\int_0^1P_{I_1,I_2}(i_1,i_2)logP_{I_1,I_2}(i_1,i_2)d_{i_1}d_{i_2}\tag{3}. And PI(i)P_I(i) is the normalized histogram of the images, PI1,I2P_{I_1,I_2} is the joint distribution of corresponding points’ intensity. The corresponding points, for example, pixel p\textbf{p} in the left image, q\textbf{q} in the right image are corresponded, so q=ebm(p,d)=[pxd,py]\textbf{q}=e_{bm}(\textbf{p},d)=[\textbf{p}_x-d,\textbf{p}_y] with disparity dd.
PI1,I2(i,k)=1npT[(i,k)=(I1p,I2p)](4)P_{I_1,I_2}(i,k)=\frac{1}{n}\sum_{\textbf{p}}{T[(i,k)=(I_{1p},I_{2p})]}\tag{4}. T[A=B]=1,ifA=B,elseT[A=B]=0T[A=B]=1,if A=B,else T[A=B]=0

Equation 1 operates on full images and requires the disparity
image a priori, both prevent the use of MI as pixelwise matching cost. However, the calculation of the joint entropy can be transformed into a sum over pixels using Taylor expansion.
HI1,I2=phI1,I2(I1p,I2p)(5)H_{I_1,I_2}=\sum_\textbf{p}{h_{I_1,I_2}(I_{1p},I_{2p})}\tag{5}
hI1,I2(i,k)=1nlog(PI1,I2(i,k)g(i,k))g(i,k)(6)h_{I_1,I_2}(i,k)=-\frac{1}{n}log(P_{I_1,I_2}(i,k)\otimes g(i,k))\otimes g(i,k)\tag{6}
For overlapping(matching) pixels, the entropy of one image is
HI=phI(Ip)(7)H_{I}=\sum_\textbf{p}{h_{I}(I_{p})}\tag{7}
hI(i)=1nlog(PI(i)g(i))g(i)(8)h_I(i)=-\frac{1}{n}log(P_{I}(i)\otimes g(i))\otimes g(i)\tag{8}
With the same idea, the mutual information could be described as MII1,I2=pmiI1,I2(I1p,I2p)(9)MI_{I_1,I_2}=\sum_\textbf{p}{mi_{I_1,I_2}(I_{1p},I_{2p})}\tag{9} and miI1,I2(I1p,I2p)=hi1(i)+hi2(k)hI1,I2(i,k)(10)mi_{I_1,I_2}(I_{1p},I_{2p})=h_{i_1}(i)+h_{i_2}(k)-h_{I_1,I_2}(i,k)\tag{10}

To summarize, if we have the optimal disparity map DD, and the mutual information of the two images will be maximized. At the same time, the joint distribution of the corresponding points’ intensity will be like shown below, which means the corresponding points share similar intensities.
三维重构(13):SGM经典算法解读

cost function

The mutual information cost is defined as CMI(p,d)=miIb,fD(Im)(Ibp,Imq)(11)C_{MI}(\textbf{p},d)=-mi_{I_b,f_D(I_m)}(I_{b\textbf{p}},I_{m\textbf{q}})\tag{11} where fD(Im)f_D(I_m) is the transformed right image with disparity map DD.
Here, we will have a question, if we want to get the cost, we have to get the disparity map, but the disparity map is the final result we want. So the author provided an optimization method.
First, considering pixelwise cost calculation is generally ambiguous and wrong matches can easily have a lower cost than correct ones, due to noise; an additional constraint is added that supports smoothness by penalizing changes of neighboring disparities. The energy E(D)E(D) that depends on the disparity image DD.
E(D)=p(C(p,Dp)+qNpP1T[DpDq=1]+qNpP2T[DpDq>1])(12)E(D)=\sum_{\textbf{p}}{(C(\textbf{p},D_{\textbf{p}})+\sum_{\textbf{q}\in{N_{\textbf{p}}}}{P_1T[|D_{\textbf{p}}-D_{\textbf{q}}|=1]}+\sum_{\textbf{q}\in{N_{\textbf{p}}}}{P_2T[|D_{\textbf{p}}-D_{\textbf{q}}|>1]})}\tag{12}

The first term is the ‘MI’ pixel matching cost for the disparities of DD. The second and third term are penalties. Usually P2P1P_2\gg P_1. To minimize the energy E(D)E(D) is NP complete for many discontinuity preserving energies. So Dynamic Programming solution is applied by optimizing several 1D constraints from all directions to fit 2D conditions. The aggregated (smoothed) cost S(p,d)S(\textbf{p},d) for a pixel p\textbf{p} and disparity dd is calculated by summing the costs of all 1D minimum cost paths that end in pixel p\textbf{p} at disparity dd.
Finally,
Lr(p,d)=C(p,d)L_{\textbf{r}}(\textbf{p},d)=C(\textbf{p},d)
+min(Lr(p-r,d),+min(L_{\textbf{r}}(\textbf{p-r},d),
Lr(p-r,d1)+P1,L_{\textbf{r}}(\textbf{p-r},d-1)+P_1,
Lr(p-r,d+1)+P1,miniLr(p-r,i)+P2)minkLr(p-r,k)L_{\textbf{r}}(\textbf{p-r},d+1)+P_1,min_i L_{\textbf{r}}(\textbf{p-r},i)+P_2)-min_kL_{\textbf{r}}(\textbf{p-r},k)
And, S(p,d)=rLr(p,d)(13)S(\textbf{p},d)=\sum_{\textbf{r}}{L_{\textbf{r}}(\textbf{p},d)}\tag{13}
choose dd that minimize S(p,d)S(\textbf{p},d).

To summarize, the cost function finally transforms from a global energy E(D)E(D) into a pixelwize and Dynamic Programming S(p,d)S(\textbf{p},d).

Disparity Refinement

This part include:

  1. Intensity Consistent Disparity Check
  2. remove outliers
  3. Discontinuity Preserving Interpolating

Opencv result

input:
三维重构(13):SGM经典算法解读
三维重构(13):SGM经典算法解读
output:
三维重构(13):SGM经典算法解读

Summary

This blog focuses on two main questions.

  1. Apply mutual information as part of the cost. By the way, the mutual information is based on grayscale distributions and the joint distribution is about the corresponding pixels’ grayscale couple(like shown below, I1I_1 and I2I_2).三维重构(13):SGM经典算法解读
  2. Turn the cost function from energy E(D)E(D) to S(p,d)S(\textbf{p},d).

Remaining Questions

  1. Dynamic Programming process is not so clear in detail. I do not know how the question turn into a Dynamic Programming. The theory is not clear for me.
  2. Source Code in Opencv is still complex for me. The cost is not mutual information in Opencv(Block). And they spent much on memory processing, which make the code complex. So I first learn BM source code in Opencv, which would help for SGM.

Reference:
Stereo Processing by Semi-Global Matching and Mutual Information