Dual Adaptive Pyramid Network for Cross-Stain Histopathology Image Segmentation

1. Author

Xianxu Hou1;2*, Jingxin Liu1;2;3, Bolei Xu1;2, Bozhi Liu1;2, Xin Chen4, Mohammad Ilyas5, Ian Ellis5, Jon Garibaldi4, and Guoping Qiu1;2;4
1 College of Information Engineering, Shenzhen University, Shenzhen, China
2 Guangdong Key Laboratory of Intelligent Information Processing, Shenzhen University, Shenzhen, China
3 Histo Pathology Diagnostic Center, Shanghai, China
4 School of Computer Science, University of Nottingham, Nottingham, United Kingdom
5 School of Medicine, University of Nottingham, Nottingham, United Kingdom
* Equal contribution

2. Abstract

Supervised semantic segmentation normally assumes the test data being in a similar data domain as the training data. However, in practice, the domain mismatch between the training and unseen data could lead to a significant performance drop.
Obtaining accurate pixelwise label for images in different domains is tedious and labor intensive, especially for histopathology images.
We tackle the domain adaptation problem on two levels:

  1. the image-level considers the differences of image color and style;
  2. the feature-level addresses the spatial inconsistency between two domains.

Dual Adaptive Pyramid Network for Cross-Stain Histopathology Image Segmentation

3. Introduction

Although excellent performance has been achieved on benchmark dataset, deep segmentation models have poor generalization capability to unseen datasets due to the domain shift between the training and test data. [Adversarial Discriminative Domain Adaptation]
The model trained on one (source) dataset would not generalize well when applied to the other (target)
dataset.
Although fine-tuning the model with labelled target data could possibly alleviate the impact of domain shift, manually annotating is a time-consuming, expensive and subjective process in medical area.
Therefore, it is of great interest to develop algorithms to adapt segmentation models from a source domain to a visually different target domain without requiring additional labels in the target domain.
The main insight behind these methods is trying to align visual appearance or feature distribution between the source and target domains.
The image-level adaptation considers the overall difference between source and target domain like image color and style, while feature-level adaptation addresses the spatial inconsistency of the two domains.

4. Method

Dual Adaptive Pyramid Network for Cross-Stain Histopathology Image Segmentation

4.1 Model Overview

It contains a semantic segmentation network GG and two adversarial learning modules DimgD_{img} and DfeatD_{feat}.

4.2 Segmentation Network

A dilated ResNet-18 is used as backbone to encode the input images.
In order to achieve larger receptive field of our model, we apply a Pyramid Pooling Module (PPM) from PSPNet on the last layer of the backbone network.
The different levels of features are then upsampled and concatenated as the pyramid pooling global feature.
Furthermore, we adopt skip connections from U-Net and a pyramid feature fusion architecture to achieve final segmentation.
The segmentation task is learned by minimizing both standard cross-entropy loss and Dice coefficient for images from the source domain:
Lseg=ExsXS[yslog(y~s)]+αExsXS[2ysy~sys+y~s]\mathcal{L}_{s e g}=\mathbb{E}_{x_{s} \sim X_{S}}\left[-y_{s} \log \left(\widetilde{y}_{s}\right)\right]+\alpha \mathbb{E}_{x_{s} \sim X_{S}}\left[-\frac{2 y_{s} \widetilde{y}_{s}}{y_{s}+\widetilde{y}_{s}}\right]

4.3 Domain Adaptation

4.3.1 Image-level Adaptation

In this work, image-level representation refers to the PPM outputs of the segmentation network GG.
Image-level adaptation helps to reduce the shift by the global image difference such as image color and image style between the source and target domains.
To eliminate the domain distribution mismatch, we employ a discriminator DimgD_{img} to distinguish PPM features between source images and target images.

In particular, we employ PatchGAN, a fully convolutional neural operating on image patches, from which we can get a two-dimensional feature map as the discriminator outputs.

4.3.2 Feature-level Adaptation

The feature-level representation refers to the fused feature maps before feeding into the final segmentation classifier.
Aligning the feature-level representations helps to reduce the segmentation differences in both global layout and local context.