Model-Free Offline Reinfoecement Learning
- 1 minBCQ: Off-Policy Deep Reinforcement Learning without Exploration
Refernece
[1] A. Kumar. Conservative Q-Learning for Offline Reinforcement Learning. 2020.
[2] T. Yu. COMBO: Conservative Offline Model-Based Policy Optimization. 2021.
[3] B. Ning. Double Deep Q-Learning for Optimal Execution