- 马尔可夫决策过程(MDP)定义整理
 - 基于模型的动态规划 Planning by Dynamic Programming
 - 无模型预测 Model-Free Predication
 - 无模型控制 Model-Free Control
 - 值函数近似 Value Function Approximation
 - 策略梯度 Policy Gradient
- Actor-Critic Softmax & Gaussian Policy 代码实现
 - Deterministic Policy Gradient
 - Deep Deterministic Policy Gradient
 - DDPG 代码实现
 - Deep Reinforcement Learning In Parameterized Action Space
 - Asynchronous Methods for Deep Reinforcement Learning
 - A3C 代码实现
 - Trust Region Policy Optimization
 - High-Dimensional Continuous Control Using Generalized Advantage Estimation
 - Proximal Policy Optimization Algorithms
 - Proximal Policy Optimization 代码实现
 
 - 整合学习与规划 Integrating Learning and Planning
 
强化学习文章阅读顺序
- 本文链接: https://bluefisher.github.io/2018/05/29/强化学习文章阅读顺序/
 - 版权声明: 本博客所有文章除特别声明外,均采用 BY-NC-SA 许可协议。转载请注明出处!