Introduction to Reinforcement Learning

Learning from interaction is a foundational idea underlying nearly all theories of learning and intelligence.The approach we explore, called reinforcement learning, is much more focused on goal-directed learning form interaction than are other approaches to machine learning.
区分强化学习和其他种类的学习方式最显著的特点是:在强化学习中,训练信息被用于评估状态和动作的好坏,而不是用于指导到底该是什么策略。
Introduction to Reinforcement Learning
Reinforcement learning is learning what to do—how to map situations to actions—so as to maximize a numerical reward signal.
These two characteristics—trial-and-error search and delayed reward—are the two most important distinguishing features of reinforcement learning.

One of the challenges that arise in reinforcement learning, and not in other kinds of learning, is the trade-off between exploration and exploitation.
Another key feather of reforcement learning is that it explicitly considers the whole problem of a goal-directed agent interacting with an uncertain environment .

One must look beyond the most obvious examples of agents and their environments to appreciate the generality of the reinforcement learning framework.

Feathers shared by cases that can use reinforcement learning:
All involve interaction between an active decision-making agent and its environment, within which the agent seeks to achieve a goal despite uncertainty about its environment.

Elements of reforcement learning:

  • agent

  • policy
    A policy is a mapping from perceived states of the environment to actions to be taken when in those states.

  • reward signal
    A reward signal defines the goal in a reinforcement learning problem.

  • value function
    Whereas rewards determine the immediate, intrinsic desirablity of environmental states, values indicate the long-term desirability of states after taking into account the states that are likely to follow, and the rewards available in those states.

  • model of the environment (optional)
    A model predicts what the environment will do next. There are two kinds of model: Transitions Model and Rewards Model. Transition model predicts the next state. (i.e. dynamics) Pssa=P[S=sS=s,A=a]
    Rewards model predicts the next (immediate reward)