增强学习 Reinforcement learning part 2 - Markov Decision Process
本文是在学习David Silver所教授的Reinforcement learning课程过程中所记录的笔记。因为个人知识的不足以及全程啃生肉,难免会有理解偏差的地方,欢迎一起交流。
课程资料:http://www0.cs.ucl.ac.uk/staff/D.Silver/web/Teaching.html
1、Markov Processes
在RL中,MDP是用来描述environment的,并且假设environment是full observable的
i.e. The current state completely characterizes the process
许多RL问题可以用MDP来表示:
- Optimal control primarily deals with continuous MDPs
- Partially observable problems can be converted into MDPs
- Bandits are MDPs with one state