11-777 lecture 1.1 introduction
文章目录
background
Recently, I find a good cources about multimodal machine learning. In this blog, I will study it and note my understanding.
O
master multimodal basic work
KR
- what is modality ?
- multimodal develop history
- main area in multimodal
1.what is modality ?
modality :
- the way in which something happens or experienced.
- it includes sensory form(touch,feel) or a certain type of information(image, speech).
Medium :
- a means for storing or communicating information.
Here is examples of modalities:
2. multimodal develop history
-
The “behavioral” era (1970s until late 1980s)
The McGurk Effect (1976) -
The “computational” era (late 1980s until 2000)
Audio-Visual Speech Recognition (AVSR)
Affective Computing -
The “interaction” era (2000 - 2010)
Human Multimodal Interaction ways. -
The “deep learning” era (2010s until …)
3. main areas in multimodal
multimodal has 5 core thories, 37 applicationes, 235 related work.
here are five areas.
1. Representation
Definition : Learning how to represent and summarize multimodal data in away
that exploits the complementarity and redundancy.
demo :
main framewrok :
coordinated representaions is aiming to max corrlelated and make uncorrelated ventors distincitly.
2. Alignment
find correspondences between elements of modalities.
Demo :
3. fusion
Definition: To join information from two or more modalities to perform a
prediction task.
- it is not talking about detail model name,But fcou on when, how, what to fusion.
- Model-Based (Intermediate) Approaches
- Deep neural networks
- Kernel-based methods
- Graphical models
4. Translation
Definition: Process of changing data from one modality to another, where the
translation relationship can often be open-ended or subjective.
5. Co-Learning
Definition: Transfer knowledge between modalities, including their
representations and predictive models.
I will omit due I am not research it.