David Silver深度强化学习第4课-免模型预测

https://www.bilibili.com/video/av9831252
http://www0.cs.ucl.ac.uk/staff/D.Silver/web/Teaching_files/MC-TD.pdf

Model-Free reinforcement learning

方法1：Mente-Carlo Reinforement learning

(MC方法是最有效，应用最广泛的方法)
MC methods learn directly from episodes of experience（直接从经验片段中学习，不需要MDP的转移概率或回报等相关信息，这就是model-free）
David Silver深度强化学习第4课-免模型预测
（MC方法使用的是经验上的renturn而不是expect的return。）

MC policy evaluation分类：

First-visit MC policy evaluation
Every-visit

我们可以利用策略产生很多次试验，每次试验(an episode)都是从任意的初始状态开始直到终止状态.
The mean µ1, µ2, … of a sequence x1, x2, … can be computed
incrementally,µk