DEV Community

Cover image for Reinforcement Learning-Why And What
S.HARIHARA SUDHAN
S.HARIHARA SUDHAN

Posted on

Reinforcement Learning-Why And What

The motive for Reinforcement Learning

In supervised learning, we observed algorithms that attempted to replicate the labels y provided in the training set in their outputs. The labels provided a clear "correct response" for each of the inputs x in that situation. Contrarily, giving a learning algorithm this kind of explicit supervision can be quite challenging for many sequential decision-making and control problems. For instance, if we have just created a four-legged robot and are attempting to programme it to walk, we will first be unsure of the "right" behaviors to perform to make it walk, and will therefore be unsure of how to provide explicit supervision for a learning algorithm to attempt to replicate.

In the case of Reinforcement Learning, we will instead provide our algorithms only a reward function, which indicate to the learning agent when it is doing well, and when it is doing poorly. In the four-legged walking example, the robot might receive positive rewards for moving forward and negative rewards for either moving backward or falling over from the reward function. The learning algorithm's task will then be to determine how to choose activities over time in order to obtain huge rewards.

Some of the major applications that are still using reinforcement learning include autonomous helicopter flight, robot-legged locomotion, cell phone network routing, marketing strategy selection, factory control, and efficient web page indexing.

The below diagram shows how an autonomous helicopter flies in different directions using reinforcement learning:

Autonomous Helicopter

What is Reinforcement Learning ?

By executing actions at each state and observing the outcomes of those actions, an agent learns how to behave in a given environment via reinforcement learning, a feedback-based machine-learning technique. The agent receives compliments for each positive activity and is penalized or given negative feedback for each negative action.
In reinforcement learning, an agent's main objective is to maximize positive rewards while doing better. The agent learns through hit-and-miss and depending on its experience, it develops the skills necessary to carry out the mission more effectively. Thus, "Reinforcement learning is a form of machine learning method where an intelligent agent (computer program) interacts with the environment and learns to function within that".

Consider the following scenario:
An AI agent is present in a maze environment, and his objective is to locate the diamond. The agent interacts with the environment by taking certain actions and depending on those activities, the agent's state is altered, and it also receives feedback in the form of rewards or penalties. The agent keeps doing these three things—take action, alter his state or remain in it, and obtain feedback—and by doing so, he learns and explores the surroundings. The agent gains knowledge of which behaviors result in positive feedback or rewards and which behaviors result in negative feedback or penalties. The agent receives positive points for rewards and negative points for penalties.

Simple Reinforcement Learning Model :

RL Model

Elements of Reinforcement Learning :

1) Policy: A policy can be described in a way which an agent acts at a particular moment at a given time. It connects the perceived environmental conditions to the responses to those states. The fundamental component of Reinforcement Learning is a policy because only a policy may specify how an agent will behave. It might be a straight forward function or a lookup table in some situations, but general computing like a search procedure might be necessary for others. It could be a stochastic or deterministic policy.

For deterministic policy: a = π(s)
For stochastic policy: π(a | s) = P[At =a | St = s]
Where π is defined as the policy that has to be applied on a given state.

2) Reward Signal: The reward signal establishes the purpose of reinforcement learning. The environment immediately transmits a signal known as a reward signal to the learning agent at each state. These incentives are offered in accordance with the agent's successful and unsuccessful acts. The agent's principal goal is to increase the overall quantity of incentives for doing the right thing. The policy can be altered by the reward signal. For instance, if an action chosen by the agent yields a poor reward, the policy may be altered to choose different behaviors in the future.

3) Value Function: The value function informs an agent about the merits of a given scenario and course of action, as well as the potential rewards. A value function defines the good states and actions for the future, but a reward indicates the immediate signal for each good and bad activity. The reward is a necessary component of the value function because the value cannot exist without it. To have additional rewards, one uses value estimation.

4) Model: The model, which imitates the behavior of the environment, is the final component of reinforcement learning. One can draw conclusions about the behavior of the environment using the model.

Approaches to Implement Reinforcement Learning:

1)Value-based: The value-based approach aims to identify the maximum value that can be achieved at a state under any policy, or the optimal value function. As a result, the agent anticipates a long-term return in any state under the policy(π).

2)Policy-based: Without using the value function, a policy-based approach seeks to identify the best course of action for maximizing potential future rewards. In this method, the agent seeks to implement a policy in a way that each action serves to maximize the reward in the future.
The two primary categories of policies in the policy-based approach are

Deterministic: The policy (π) at any state results in the same action.
Stochastic: The resultant action in this strategy is determined by probability.

3)Model-based: The agent learns about the environment by interacting with a virtual model of it that has been developed. This method lacks a specific solution or algorithm because each environment's model representation is unique.

Simple Implementation of Reinforcement Learning

The entities in RL's world are,

The agent Class: A thing, or person, that tries to gain rewards by interaction. In practice, the agent is a piece of code that implements some policy

The environment Class: It's a model of the world that is external to the agent.It provides observations and rewards to agent.

With this basic understanding, let's try to implement

To make things very simple, let's create a dummy environment that gives the agent some random rewards every time, regardless of the agent's actions.

Though this is not of any practical usage, it allows us to focus on the implementation of environment and agent classes.
Our environment class should be capable of handling actions received from the agent. This is done by the action method, which checks the number of steps left and returns a random reward, by ignoring the agent's action.

____init constructor is called to set the number of episodes for the event, get_observation() method is supposed to return the current environment's observation to the agent, but in this case, returns a zero vector.

import random
from typing import List

class SampleEnvironment:
    def __init__(self):
        self.steps_left = 20

    def get_observation(self) -> List[float]:
        return [0.0, 0.0, 0.0]

    def get_actions(self) -> List[int]:
        return [0, 1]

    def is_done(self) -> bool:
        return self.steps_left == 0

    def action(self, action: int) -> float:
        if self.is_done():
            raise Exception("Game is over")
        self.steps_left -= 1
        return random.random()

Enter fullscreen mode Exit fullscreen mode

The agent's Class simple and includes only two methods: the constructor and the method that performs one step in the environment

Intitially the total reward collected is set to zero by the constructor.

The step function accepts environment instance as an argument and allows agent to perform the following actions:

Observe the environment
Make a decision about the action to take based on the observations
Submit the action to the environment
Get the reward for the current step

random.choice([0,1])

Enter fullscreen mode Exit fullscreen mode

Gets the random choice between 0 and 1

class Agent:
    def __init__(self):
        self.total_reward = 0.0

    def step(self, env: SampleEnvironment):
        current_obs = env.get_observation()
        print("Observation {}".format(current_obs))
        actions = env.get_actions()
        print(actions)
        reward = env.action(random.choice(actions))
        self.total_reward += reward
        print("Total Reward {}".format(self.total_reward))

Enter fullscreen mode Exit fullscreen mode
if __name__ == "__main__":
    env = SampleEnvironment()
    agent = Agent()
    i=0

    while not env.is_done():
        i=i+1
        print("Steps {}".format(i))
        agent.step(env)

    print("Total reward got: %.4f" % agent.total_reward)

Enter fullscreen mode Exit fullscreen mode

Rewards at each step

Rewards at each step

Advantages of Reinforcement Learning

  1. Reinforcement learning can be used to tackle extremely difficult issues that are intractable using traditional methods.

  2. Long-term outcomes, which are exceedingly challenging to accomplish, are best achieved with this strategy.

  3. This learning paradigm closely resembles how people learn. Hence, it is close to achieving perfection.

  4. The model has the ability to fix mistakes made during training using different policies.

  5. Once a model has fixed a mistake, there is extremely little probability that it will happen again.

Disadvantages of Reinforcement Learning

The main problem with the Reinforcement Learning algorithm is that some of the parameters may affect the speed of the learning, such as delayed feedback.

And finally as we come to the end I want to recommend one research work which was written by Andrew Y.Ng, Adam Coates, Pieter Abbeel
if you want to know how the autonomous helicopter will work using RL algorithm.

Autonomous Helicopter Flight Using Reinforcement Learning

Top comments (2)

Collapse
 
integerman profile image
Matt Eland

Very nice article.

One tip: when you're using blocks of code, you can specify the language for syntax highlighting by putting putting in py immediately after your triple backticks for the opening code block.

Collapse
 
hariharaswq profile image
S.HARIHARA SUDHAN

Sure Thanks