论文笔记-Learning Latent Dynamics for Planning from Pixels

Learning Latent Dynamics for Planning from Pixels

1 介绍

论文笔记-Learning Latent Dynamics for Planning from Pixels
输入图像经过编码网络(灰色梯形)变成隐藏状态(绿色),然后隐藏状态可以被映射成奖励和图片。
论文笔记-Learning Latent Dynamics for Planning from Pixels
为了预演,我们将当前图片经过编码网络变成当前隐藏状态变成绿色,然后进行各个序列的虚拟预测,计算各个序列的奖励,最后返回最好序列的第一个action执行

2 算法

Deep planning net
论文笔记-Learning Latent Dynamics for Planning from Pixels
planning algorithm
论文笔记-Learning Latent Dynamics for Planning from Pixels
training loss
论文笔记-Learning Latent Dynamics for Planning from Pixels
论文笔记-Learning Latent Dynamics for Planning from Pixels
loss 可以分为两项,第一项为通过sts_t预测的oto_t与真实的oto_t的MSE,第二项为上图实线对应的p(stst1,at1)p(s_t|s_{t-1},a_{t-1})和虚线对应的q(stst1,at1,ot)q(s_t|s_{t-1},a_{t-1},o_t)的KL散度

3种model

论文笔记-Learning Latent Dynamics for Planning from Pixels
RNN中hth_t为确定数值,SSM中sts_t为随机变量,由均值和方差组成,RSSM结合两种model,sts_t为随机变量,hth_t为确定数值。

4 实验结果

论文笔记-Learning Latent Dynamics for Planning from Pixels
与之前的强化学习算法相比,训练效率提升50倍。