- 马尔可夫决策过程(MDP)定义整理
- 基于模型的动态规划 Planning by Dynamic Programming
- 无模型预测 Model-Free Predication
- 无模型控制 Model-Free Control
- 值函数近似 Value Function Approximation
- 策略梯度 Policy Gradient
- Actor-Critic Softmax & Gaussian Policy 代码实现
- Deterministic Policy Gradient
- Deep Deterministic Policy Gradient
- DDPG 代码实现
- Deep Reinforcement Learning In Parameterized Action Space
- Asynchronous Methods for Deep Reinforcement Learning
- A3C 代码实现
- Trust Region Policy Optimization
- High-Dimensional Continuous Control Using Generalized Advantage Estimation
- Proximal Policy Optimization Algorithms
- Proximal Policy Optimization 代码实现
- 整合学习与规划 Integrating Learning and Planning
强化学习文章阅读顺序
- 本文链接: https://bluefisher.github.io/2018/05/29/强化学习文章阅读顺序/
- 版权声明: 本博客所有文章除特别声明外,均采用 BY-NC-SA 许可协议。转载请注明出处!