The journey of a 1000 posts begins with a single article: The One with reinforcement learning

“ Hello guys it’s Estelle ! ” , Oh … My … God …, I need to stop … please help :(

I know it’s been a while since my last article, so let me make up for it with this one. I assume that you are a little bit familiar with machine learning stuff, if not let me know so I can make an article about that, but for now, let’s talk about reinforcement learning.

First let’s present what are the model of the reinforcement learning approach :

Source : A Brief Survey of Deep Reinforcement Learning, 2017

We have an agent ( or multiple ) in a state who choses an action to do from a set of actions based on a certain policy ( could be fixed like taking always the action that maximizes the next reward ), this action is going to change ( or not ) the environment, and the agent receives a reward signal ( the values of the reward could be continuous/discrete, positive/negative, it depends on the case where you are using RL ), and the agent changes its state to a new state. That’s basically the main idea of reinforcement learning, now let’s see where reinforcement learning is located among other fields :

Source : David Silver’s Reinforcement learning course, Introduction to RL

A little scary right ? How can you study a thing that needs knowledge in all these fields ? Well it’s all right, because you do not have to be an expert to understand reinforcement learning, but the first step is to be curious and try to understand. So, how’s RL different from other machine learning paradigms such as supervised learning ?

The first difference is that there’s no supervisor : the agent judges the choices he’s making based only on a reward signal ( also called trial-and-error paradigm )
The feedback is delayed, not instantaneous, which means that the impact of a choice that the agent make could be after so many steps and decide whether it was or not a good move.
Time really matters, in other words, it’s a sequential process of decision making which makes it a dynamic system where data is not i.i.d. ( independent and identically distributed ) like in supervised or unsupervised learning.
Agent’s actions affect the subsequent data it receives, imagine a red light, where to cars are waiting, when the green light comes, they could pick the same road, which leads us to the same data distribution in the case of RL, but they can choose different paths, which means observing different things, and receiving different rewards in RL.

We can compare RL vs SL like this :

Source : CIFAR Reinforcement Learning Summer School (RLSS) 2017, Montréal

I don’t want to fill you with a lot of information, so that’s all for this article. I will leave some examples of what’s done with reinforcement learning :