论文链接:LUVLi Face Alignment: Estimating Landmarks’ Location, Uncertainty, and Visibility Likelihood
作者及团队: 犹他大学 & 三菱 & 曼彻斯特大学
会议及时间:CVPR 2020


Modern face alignment methods have become quite accurate at predicting the locations of facial landmarks, but they do not typically estimate the uncertainty of their predicted locations nor predict whether landmarks are visible.In this paper, we present a novel framework for jointly predicting landmark locations, associated uncertainties of these predicted locations, and landmark visibilities. We model these as mixed random variables and estimate them using a deep network trained with our proposed Location,Uncertainty, and Visibility Likelihood (LUVLi) loss. In addition,we release an entirely new labeling of a large face alignment dataset with over 19,000 face images in a full range of head poses. Each face is manually labeled with the ground-truth locations of 68 landmarks, with the additional information of whether each landmark is unoccluded,self-occluded (due to extreme head poses), or externally occluded.Not only does our joint estimation yield accurate estimates of the uncertainty of predicted landmark locations,but it also yields state-of-the-art estimates for the landmark locations themselves on multiple standard face alignment datasets. Our method’s estimates of the uncertainty of predicted landmark locations could be used to automatically identify input images on which face alignment fails, which can be critical for downstream tasks.


Modern methods for face alignment (facial landmark localization) perform quite well most of the time, but all of them fail some percentage of the time. Unfortunately, almost all of the state-of-the-art (SOTA) methods simply output predicted landmark locations, with no assessment of whether (or how much) downstream tasks should trust these landmark locations. This is concerning, as face alignment is a key pre-processing step in numerous safety-critical ap-plications,including advanced driver assistance systems(ADAS), driver monitoring, and remote measurement of vital signs [57]. As deep neural networks are notorious for producing overconfident predictions [33], similar concerns have been raised for other neural network technologies [46],and they become even more acute in the era of adversarial machine learning where adversarial images may pose a great threat to a system [14]. However, previous work in face alignment (and landmark localization in general) has largely ignored the area of uncertainty estimation.

To address this need, we propose a method to jointly estimate facial landmark locations and a parametric probability distribution representing the uncertainty of each estimated location. Our model also jointly estimates the visibility of landmarks, which predicts whether each landmark is occluded due to extreme head pose.

We find that the choice of methods for calculating mean and covariance is crucial. Landmark locations are best obtained using heatmaps, rather than by direct regression. To estimate landmark locations in a differentiable manner using heatmaps, we do not select the location of the maximum (argmax) of each landmark’s heatmap, but instead propose to use the spatial mean of the positive elements of each heatmap. Unlike landmark locations, uncertainty distribution parameters are best obtained by direct regression rather than from heatmaps. To estimate the uncertainty of the predicted locations, we add a Cholesky Estimator Network (CEN) branch to estimate the covariance matrix of a multivariate Gaussian or Laplacian probability distribution. To estimate visibility of each landmark, we add a Visibility Estimator Network (VEN). We combine these estimates using a joint loss function that we call the Location, Uncertainty and Visibility Likelihood (LUVLi) loss. Our primary goal in designing this model was to estimate uncertainty in landmark localization. In the process, not only does our method yields accurate uncertainty estimation, but it also produces SOTA landmark localization results on several face alignment datasets.

Uncertainty can be broadly classified into two categories[41]: epistemic uncertainty is related to a lack of knowledge about the model that generated the observed data, and aleatoric uncertainty is related to the noise inherent in the observations, e.g., sensor or labelling noise. The ground-truth landmark locations marked on an image by human labelers would vary across multiple labelings of an image by different human labelers (or even by the same human labeler). Furthermore, this variation will itself vary across different images and landmarks (e.g., it will vary more for occluded landmarks and poorly lit images). The goal of our method is to estimate this aleatoric uncertainty.

The fact that each image only has one ground-truth labeled location per landmark makes estimating this uncertainty distribution difficult, but not impossible. To do so,we use a parametric model for the uncertainty distribution.We train a neural network to estimate the parameters of the model for each landmark of each input face image so as to maximize the likelihood under the model of the groundtruth location of that landmark (summed across all landmarks of all training faces).
The main contributions of this work are as follows:

  • This is the first work to introduce the concept of parametric uncertainty estimation for face alignment这是引入用于面对齐的参数不确定性估计概念的第一项工作。
  • We propose an end-to-end trainable model for the joint estimation of landmark location, uncertainty, and visibility likelihood (LUVLi), modeled as a mixed random variable. 我们提出了一个端到端可训练模型,用于联合估计地标位置,不确定性和可见性可能性(LUVLi),建模为混合随机变量。
  • We compare our model using multivariate Gaussian and multivariate Laplacian probability distributions. 我们使用多元高斯和多元拉普拉斯概率分布比较我们的模型。
  • Our algorithm yields accurate uncertainty estimation and state-of-the-art landmark localization results on several face
    alignment datasets.
  • We are releasing a new dataset with manual labels of the locations of 68 landmarks on over 19;000 face images in a wide variety of poses, where each landmark is also labeled with one of three visibility categories 我们将发布一个新的数据集,其中将以手动方式标记各种姿势中超过19; 000张面部图像上68个地标的位置,其中每个地标也被标记为三种可见性类别之一。

2. Related Work

3. Proposed Method


4. New Dataset: MERL-RAV

