11-777 lecture 1.1 introduction

background

Recently, I find a good cources about multimodal machine learning. In this blog, I will study it and note my understanding.

O

master multimodal basic work

KR

  1. what is modality ?
  2. multimodal develop history
  3. main area in multimodal

1.what is modality ?

modality :

  • the way in which something happens or experienced.
  • it includes sensory form(touch,feel) or a certain type of information(image, speech).

Medium :

  • a means for storing or communicating information.
    11-777 lecture 1.1 introduction
    Here is examples of modalities:
    11-777 lecture 1.1 introduction

2. multimodal develop history

  1. The “behavioral” era (1970s until late 1980s)
    The McGurk Effect (1976)
    11-777 lecture 1.1 introduction

  2. The “computational” era (late 1980s until 2000)
    Audio-Visual Speech Recognition (AVSR)
    Affective Computing

  3. The “interaction” era (2000 - 2010)
    Human Multimodal Interaction ways.
    11-777 lecture 1.1 introduction

  4. The “deep learning” era (2010s until …)
    11-777 lecture 1.1 introduction

3. main areas in multimodal

multimodal has 5 core thories, 37 applicationes, 235 related work.
here are five areas.

1. Representation

Definition : Learning how to represent and summarize multimodal data in away
that exploits the complementarity and redundancy.
11-777 lecture 1.1 introduction
demo :
11-777 lecture 1.1 introduction

main framewrok :
11-777 lecture 1.1 introduction
coordinated representaions is aiming to max corrlelated and make uncorrelated ventors distincitly.
11-777 lecture 1.1 introduction

2. Alignment

find correspondences between elements of modalities.
11-777 lecture 1.1 introduction
Demo :

11-777 lecture 1.1 introduction

3. fusion

Definition: To join information from two or more modalities to perform a
prediction task.

  1. it is not talking about detail model name,But fcou on when, how, what to fusion.
    11-777 lecture 1.1 introduction
  2. Model-Based (Intermediate) Approaches
  1. Deep neural networks
  2. Kernel-based methods
  3. Graphical models

4. Translation

Definition: Process of changing data from one modality to another, where the
translation relationship can often be open-ended or subjective.
11-777 lecture 1.1 introduction

5. Co-Learning

Definition: Transfer knowledge between modalities, including their
representations and predictive models.

I will omit due I am not research it.

5. summary

11-777 lecture 1.1 introduction
11-777 lecture 1.1 introduction