Unsupervised Domain Adaptation via Disentangled Representations: Application to Cross-Modality Liver Segmentation

1. Author

Junlin Yang¹, Nicha C. Dvornek³, Fan Zhang², Julius Chapiro³, MingDe Lin³, and James S. Duncan^1;2;3;4
¹ Department of Biomedical Engineering, Yale University, New Haven, CT, USA
² Department of Electrical Engineering, Yale University, New Haven, CT, USA
³ Department of Radiology & Biomedical Imaging, Yale School of Medicine, New Haven, CT, USA
⁴ Department of Statistics & Data Science, Yale University, New Haven, CT, USA

2. Abstract

A deep learning model trained on some labeled data from a certain source domain generally performs poorly on data from different target domains due to domain shifts.
In this work, we achieve cross-modality domain adaptation, i.e. between CT and MRI images, via disentangled representations.
Domain adaptation is achieved in two steps:

images from each domain are embedded into two spaces, a shared domain-invariant content space and a domain-specific style space.
the representation in the content space is extracted to perform a task.

Moreover, our model achieved good generalization to joint-domain learning, in which unpaired data from different modalities are jointly learned to improve the segmentation performance on each individual modality.

3. Introduction

Real-world applications usually face varying visual domains.
The distribution differences between training and test data, i.e. domain shifts, can lead to significant
performance degradation.

Data collection and manual annotation for every new task and domain are time-consuming and expensive, especially for medical imaging, where data are limited and are collected from different scanners, protocols, sites, and modalities.

To solve this problem, domain adaptation algorithms look to build a model from a source data distribution that performs well on a different but related target data distribution. Wang, M., Deng, W.: Deep visual domain adaptation: A survey. Neurocomputing 312, 135{153 (2018)

While MRI acquisitions include more complex quantitative information than CT useful for liver segmentation, they are often less available clinically than CT images .
Thus, it would be helpful if we could learn a liver segmentation model from the more accessible CT data that also performs well on MRI images.
Unsupervised Domain Adaptation via Disentangled Representations: Application to Cross-Modality Liver
Contributions:

This is the first to achieve unsupervised domain adaptation for segmentation via disentangled representations in the field of medical imaging.
Our model decomposes images across domains into a domain-invariant content space, which preserves the anatomical information, and a domain-specific style space, which represents modality information.

4. Methodology

Our Domain Adaptation via Disentangled Representations (DADR) pipeline consists of two modules: Disentangled Representation Learning Module (DRLModule) and Segmentation Module (SegModule) (see Fig. 2). Of note, the DRL Module box in the DADR pipeline at the top is expanded in the left large box.

Unsupervised Domain Adaptation via Disentangled Representations: Application to Cross-Modality Liver

4.1 DRLModule

The module consists of two main components, a variational autoencoder (VAE) for reconstruction and a generative adversarial network (GAN) for adversarial training.

We train the VAE component for in-domain reconstruction, where reconstruction loss is minimized to encourage the encoders and generators to be inverses to each other.

The GAN component for cross-domain translation is trained to encourage the disentanglement of the latent space, decomposing it into content and style subspaces.

The overall loss function is defined as the weighted sum of the three components:
$L_{\text {total}}=\alpha L_{\text {recon}}+\beta L_{a d v}+\gamma L_{\text {latent}}$

In-domain reconstruction, $L_{\text {recon}}=L_{\text {recon}}^{1}+L_{\text {recon}}^{2}$
$L_{\text {recon}}^{i}=\mathbb{E}_{x_{i} \sim X_{i}}\left\|G_{i}\left(E_{c i}\left(x_{i}\right), E_{s i}\left(x_{i}\right)\right)-x_{i}\right\|_{1}$
Cross-domain translation, $L_{a d v}=L_{a d v}^{1 \rightarrow 2}+L_{a d v}^{2 \rightarrow 1}$
$L_{a d v}^{1 \rightarrow 2}=\mathbb{E}_{c_{1} \sim p\left(c_{1}\right), s_{2} \sim p\left(s_{2}\right)}\left[\log \left(1-D_{2}\left(x_{1 \rightarrow 2}\right)\right)\right]+\mathbb{E}_{x_{2} \sim X_{2}}\left[\log \left(D_{2}\left(x_{2}\right)\right)\right]$
Latent space reconstruction, $L_{\text {latent}}=L_{\text {recon}}^{c_{1}}+L_{\text {recon}}^{s_{1}}+L_{\text {recon}}^{c_{2}}+L_{\text {recon}}^{s_{2}}$
$L_{\text {recon}}^{c_{1}}=\left\|E_{c_{2}}\left(x_{1 \rightarrow 2}\right)-c_{1}\right\|_{1}, L_{\text {recon}}^{s_{2}}=\left\|E_{s_{2}}\left(x_{1 \rightarrow 2}\right)-s_{2}\right\|_{1}$

4.2 Domain Adaptation with Content-only Images

Once the disentangled representations are learned, content-only images can be reconstructed by using the content code $c_{i}$ without style code $s_{i}$ .

4.3 Joint-domain Learning

Joint-domain learning aims to train a single model with data from both domains that works on both domains and outperforms models trained and tested separately on each domain.