Reinforcement learning resources curated
Switch branches/tags
Nothing to show
Clone or download
hshyunsookim Merge pull request #58 from oneTaken/master
remove extra white space to avoid latent confusion for beginners
Latest commit 85a4e91 Sep 22, 2018
Failed to load latest commit information. Merge pull request #58 from oneTaken/master Sep 22, 2018

Awesome Reinforcement Learning Awesome

A curated list of resources dedicated to reinforcement learning.

We have pages for other topics: awesome-rnn, awesome-deep-vision, awesome-random-forest

Maintainers: Hyunsoo Kim, Jiwon Kim

We are looking for more contributors and maintainers!


Please feel free to pull requests

Table of Contents





  • Richard Sutton and Andrew Barto, Reinforcement Learning: An Introduction (1st Edition, 1998) [Book] [Code]
  • Richard Sutton and Andrew Barto, Reinforcement Learning: An Introduction (2nd Edition, in progress, 2018) [Book] [Code]
  • Csaba Szepesvari, Algorithms for Reinforcement Learning [Book]
  • David Poole and Alan Mackworth, Artificial Intelligence: Foundations of Computational Agents [Book Chapter]
  • Dimitri P. Bertsekas and John N. Tsitsiklis, Neuro-Dynamic Programming [Book (Amazon)] [Summary]
  • Mykel J. Kochenderfer, Decision Making Under Uncertainty: Theory and Application [Book (Amazon)]
  • Deep Reinforcement Learning in Action [Book(Manning)]


  • Leslie Pack Kaelbling, Michael L. Littman, Andrew W. Moore, Reinforcement Learning: A Survey, JAIR, 1996. [Paper]
  • S. S. Keerthi and B. Ravindran, A Tutorial Survey of Reinforcement Learning, Sadhana, 1994. [Paper]
  • Matthew E. Taylor, Peter Stone, Transfer Learning for Reinforcement Learning Domains: A Survey, JMLR, 2009. [Paper]
  • Jens Kober, J. Andrew Bagnell, Jan Peters, Reinforcement Learning in Robotics, A Survey, IJRR, 2013. [Paper]
  • Michael L. Littman, "Reinforcement learning improves behaviour from evaluative feedback." Nature 521.7553 (2015): 445-451. [Paper]
  • Marc P. Deisenroth, Gerhard Neumann, Jan Peter, A Survey on Policy Search for Robotics, Foundations and Trends in Robotics, 2014. [Book]
  • Kai Arulkumaran, Marc Peter Deisenroth, Miles Brundage, Anil Anthony Bharath, A Brief Survey of Deep Rei nforcement Learning, IEEE Signal Processing Magazine, 2017. [Paper]

Papers / Thesis

Foundational Papers

  • Marvin Minsky, Steps toward Artificial Intelligence, Proceedings of the IRE, 1961. [Paper] (discusses issues in RL such as the "credit assignment problem")
  • Ian H. Witten, An Adaptive Optimal Controller for Discrete-Time Markov Environments, Information and Control, 1977. [Paper] (earliest publication on temporal-difference (TD) learning rule)


  • Dynamic Programming (DP):
    • Christopher J. C. H. Watkins, Learning from Delayed Rewards, Ph.D. Thesis, Cambridge University, 1989. [Thesis]
  • Monte Carlo:
    • Andrew Barto, Michael Duff, Monte Carlo Inversion and Reinforcement Learning, NIPS, 1994. [Paper]
    • Satinder P. Singh, Richard S. Sutton, Reinforcement Learning with Replacing Eligibility Traces, Machine Learning, 1996. [Paper]
  • Temporal-Difference:
    • Richard S. Sutton, Learning to predict by the methods of temporal differences. Machine Learning 3: 9-44, 1988. [Paper]
  • Q-Learning (Off-policy TD algorithm):
    • Chris Watkins, Learning from Delayed Rewards, Cambridge, 1989. [Thesis]
  • Sarsa (On-policy TD algorithm):
    • G.A. Rummery, M. Niranjan, On-line Q-learning using connectionist systems, Technical Report, Cambridge Univ., 1994. [Report]
    • Richard S. Sutton, Generalization in Reinforcement Learning: Successful examples using sparse coding, NIPS, 1996. [Paper]
  • R-Learning (learning of relative values)
    • Andrew Schwartz, A Reinforcement Learning Method for Maximizing Undiscounted Rewards, ICML, 1993. [Paper-Google Scholar]
  • Function Approximation methods (Least-Square Temporal Difference, Least-Square Policy Iteration)
    • Steven J. Bradtke, Andrew G. Barto, Linear Least-Squares Algorithms for Temporal Difference Learning, Machine Learning, 1996. [Paper]
    • Michail G. Lagoudakis, Ronald Parr, Model-Free Least Squares Policy Iteration, NIPS, 2001. [Paper] [Code]
  • Policy Search / Policy Gradient
    • Richard Sutton, David McAllester, Satinder Singh, Yishay Mansour, Policy Gradient Methods for Reinforcement Learning with Function Approximation, NIPS, 1999. [Paper]
    • Jan Peters, Sethu Vijayakumar, Stefan Schaal, Natural Actor-Critic, ECML, 2005. [Paper]
    • Jens Kober, Jan Peters, Policy Search for Motor Primitives in Robotics, NIPS, 2009. [Paper]
    • Jan Peters, Katharina Mulling, Yasemin Altun, Relative Entropy Policy Search, AAAI, 2010. [Paper]
    • Freek Stulp, Olivier Sigaud, Path Integral Policy Improvement with Covariance Matrix Adaptation, ICML, 2012. [Paper]
    • Nate Kohl, Peter Stone, Policy Gradient Reinforcement Learning for Fast Quadrupedal Locomotion, ICRA, 2004. [Paper]
    • Marc Deisenroth, Carl Rasmussen, PILCO: A Model-Based and Data-Efficient Approach to Policy Search, ICML, 2011. [Paper]
    • Scott Kuindersma, Roderic Grupen, Andrew Barto, Learning Dynamic Arm Motions for Postural Recovery, Humanoids, 2011. [Paper]
    • Konstantinos Chatzilygeroudis, Roberto Rama, Rituraj Kaushik, Dorian Goepp, Vassilis Vassiliades, Jean-Baptiste Mouret, Black-Box Data-efficient Policy Search for Robotics, IROS, 2017. [Paper]
  • Hierarchical RL
    • Richard Sutton, Doina Precup, Satinder Singh, Between MDPs and Semi-MDPs: A Framework for Temporal Abstraction in Reinforcement Learning, Artificial Intelligence, 1999. [Paper]
    • George Konidaris, Andrew Barto, Building Portable Options: Skill Transfer in Reinforcement Learning, IJCAI, 2007. [Paper]
  • Deep Learning + Reinforcement Learning (A sample of recent works on DL+RL)
    • V. Mnih, et. al., Human-level Control through Deep Reinforcement Learning, Nature, 2015. [Paper]
    • Xiaoxiao Guo, Satinder Singh, Honglak Lee, Richard Lewis, Xiaoshi Wang, Deep Learning for Real-Time Atari Game Play Using Offline Monte-Carlo Tree Search Planning, NIPS, 2014. [Paper]
    • Sergey Levine, Chelsea Finn, Trevor Darrel, Pieter Abbeel, End-to-End Training of Deep Visuomotor Policies. ArXiv, 16 Oct 2015. [ArXiv]
    • Tom Schaul, John Quan, Ioannis Antonoglou, David Silver, Prioritized Experience Replay, ArXiv, 18 Nov 2015. [ArXiv]
    • Hado van Hasselt, Arthur Guez, David Silver, Deep Reinforcement Learning with Double Q-Learning, ArXiv, 22 Sep 2015. [ArXiv]
    • Volodymyr Mnih, Adrià Puigdomènech Badia, Mehdi Mirza, Alex Graves, Timothy P. Lillicrap, Tim Harley, David Silver, Koray Kavukcuoglu, Asynchronous Methods for Deep Reinforcement Learning, ArXiv, 4 Feb 2016. [ArXiv]


Game Playing

Traditional Games

  • Backgammon - "TD-Gammon" game play using TD(λ) (Tesauro, ACM 1995) [Paper]
  • Chess - "KnightCap" program using TD(λ) (Baxter, arXiv 1999) [arXiv]
  • Chess - Giraffe: Using deep reinforcement learning to play chess (Lai, arXiv 2015) [arXiv]

Computer Games


  • Policy Gradient Reinforcement Learning for Fast Quadrupedal Locomotion (Kohl, ICRA 2004) [Paper]
  • Robot Motor SKill Coordination with EM-based Reinforcement Learning (Kormushev, IROS 2010) [Paper] [Video]
  • Generalized Model Learning for Reinforcement Learning on a Humanoid Robot (Hester, ICRA 2010) [Paper] [Video]
  • Autonomous Skill Acquisition on a Mobile Manipulator (Konidaris, AAAI 2011) [Paper] [Video]
  • PILCO: A Model-Based and Data-Efficient Approach to Policy Search (Deisenroth, ICML 2011) [Paper]
  • Incremental Semantically Grounded Learning from Demonstration (Niekum, RSS 2013) [Paper]
  • Efficient Reinforcement Learning for Robots using Informative Simulated Priors (Cutler, ICRA 2015) [Paper] [Video]
  • Robots that can adapt like animals (Cully, Nature 2015) [Paper] [Video] [Code]
  • Black-Box Data-efficient Policy Search for Robotics (Chatzilygeroudis, IROS 2017) [Paper] [Video] [Code]


  • An Application of Reinforcement Learning to Aerobatic Helicopter Flight (Abbeel, NIPS 2006) [Paper] [Video]
  • Autonomous helicopter control using Reinforcement Learning Policy Search Methods (Bagnell, ICRA 2001) [Paper]

Operations Research

  • Scaling Average-reward Reinforcement Learning for Product Delivery (Proper, AAAI 2004) [Paper]
  • Cross Channel Optimized Marketing by Reinforcement Learning (Abe, KDD 2004) [Paper]

Human Computer Interaction

  • Optimizing Dialogue Management with Reinforcement Learning: Experiments with the NJFun System (Singh, JAIR 2002) [Paper]

Tutorials / Websites

Online Demos

Open Source Reinforcement Learning Platforms

  • OpenAI gym - A toolkit for developing and comparing reinforcement learning algorithms
  • OpenAI universe - A software platform for measuring and training an AI's general intelligence across the world's supply of games, websites and other applications
  • DeepMind Lab - A customisable 3D platform for agent-based AI research
  • Project Malmo - A platform for Artificial Intelligence experimentation and research built on top of Minecraft by Microsoft
  • ViZDoom - Doom-based AI research platform for reinforcement learning from raw visual information
  • Retro Learning Environment - An AI platform for reinforcement learning based on video game emulators. Currently supports SNES and Sega Genesis. Compatible with OpenAI gym.
  • torch-twrl - A package that enables reinforcement learning in Torch by Twitter
  • UETorch - A Torch plugin for Unreal Engine 4 by Facebook
  • TorchCraft - Connecting Torch to StarCraft
  • rllab - A framework for developing and evaluating reinforcement learning algorithms, fully compatible with OpenAI Gym
  • TensorForce - Practical deep reinforcement learning on TensorFlow with Gitter support and OpenAI Gym/Universe/DeepMind Lab integration.
  • OpenAI lab - An experimentation system for Reinforcement Learning using OpenAI Gym, Tensorflow, and Keras.
  • keras-rl - State-of-the art deep reinforcement learning algorithms in Keras designed for compatibility with OpenAI.
  • BURLAP - Brown-UMBC Reinforcement Learning and Planning, a library written in Java
  • MAgent - A Platform for Many-agent Reinforcement Learning.
  • Ray RLlib - Ray RLlib is a reinforcement learning library that aims to provide both performance and composability.
  • SLM Lab - A research framework for Deep Reinforcement Learning using Unity, OpenAI Gym, PyTorch, Tensorflow.
  • Unity ML Agents - Create reinforcement learning environments using the Unity Editor
  • Intel Coach - Coach is a python reinforcement learning research framework containing implementation of many state-of-the-art algorithms.