## DEV Community is a community of 695,394 amazing developers

We're a place where coders share, stay up-to-date and grow their careers.

AWS Community Builders

# Fine-tuning the performance of the DeepRacer model

Vivek0712
Data Scientist | AWS Community Builder -ML | Lead @ Azure Developer Community 15x Multi-Cloud Certified | | Speaker, Mentor, 15x Hackathon Winner

In the first part of the AWS DeepRacer Blog series, we saw how to create your AWS DeepRacer Model. How cool is to watch your AWS DeepRacer car navigating autonomously every corner and turn in the circuit with your Reinforcement Learning model guiding it? Now its time to fine-tune the model so that we will try to clock the best time.

We will try to dive deeper into this tutorial and discuss more on the Action Space, Reward Function and Hyper-parameter tuning. I have explained of creating and fine-tuning the model using two algorithms and action spaces. You can choose either one and ignore the rest.

## Understanding the fundamentals

### Reinforcement Learning Algorithms

We can train models using either of two RL Algorithms

• Proximal Policy Optimization (PPO)
• Soft Actor Critic (SAC)

The main differences between the two algorithms are

PPO SAC
Works in both discrete and continuous action spaces Works in a continuous action space
On-policy Off-policy
Uses entropy regularization Adds entropy to the maximization objective

### Action Space

Action Space is the set of actions that is subjected to Maximum speed, Speed Granularity, Maximum Steering Angle and Steering angle Granularity. At a particular instance of the input image from the sensor (Front-facing camera), the model tries to pick one of the actions and the evaluates the reward obtained for the particular action.

• Discrete Action Space: The set of actions is defined by the user by specifying the maximum steering angle, speed values, and their respective granularities to generate the corresponding combinations of speed and steering actions. Therefore, the policy returns a discrete distribution of actions.

• Continuous Action Space: The policy only outputs two discrete values. These values are interpreted to be the mean and standard deviation of a continuous normal distribution. You define a range for speed and steering angle. The action for an observed state is chosen from this user-defined range of speed and steering by sampling the normal distribution, defined by the mean and standard deviation returned from the policy.

### Hyperparameters:

Hyperparameters are variables to control your reinforcement learning training. They can be tuned to optimize the training time and model performance.
We have provision to fine-tune seven hyperparameters. We will try to understand how each hyperparameter influences model training. Hyperparameter tuning is all about iterative improvement through trial and error method.

• Gradient descent batch size: The batch is a subset of an experience buffer that is composed of images captured by the camera mounted on the AWS DeepRacer vehicle and actions taken by the vehicle.
• Number of epochs: The number of passes through the training data to update the neural network weights during gradient descent.
• Learning rate: The learning rate controls how much a gradient-descent (or ascent) update contributes to the network weights.
• Entropy: The added uncertainty helps the AWS DeepRacer vehicle explore the action space more broadly.
• Discount factor: The discount factor of 0 means the current state is independent of future steps, whereas the discount factor 1 means that contributions from all of the future steps are included.
• Loss type: The type of objective function to update the network weights.
• The number of experience episodes between each policy-updating iteration: The size of the experience buffer used to draw training data from for learning policy network weights.
• SAC (Alpha Value): You can tune the amount of entropy to use in SAC with the hyperparameter SAC alpha, with a value between 0.0 and 1.0. The maximum value of the SAC alpha uses the whole entropy value of the policy and favors exploration. The minimum value of SAC alpha recovers the standard RL objective and there is no entropy bonus to incentivize the exploration. A good SAC alpha value to kick off your first model is 0.5.

### Reward Function

The reward function and the action space go hand in hand. The reward function must be compatible with the values specified in the action list.
Last time, we had selected one of the default reward function and trained the model. Though that's the best way to start, we need to understand the parameters of the vehicle input and design our own reward function to clock the best time.
The parameter that is passed to the reward function is ‘params’ which is a Python Dictionary object.

I recommend you to read the official documentation and understand each parameter here.

• Creating vehicle - Discrete Action Space

Click "Create Vehicle" and give a suitable name of your choice and select any vehicle shell and click Next.

In the Vehicle Mod Specifications - select "Camera" as the sensor (Since we are going to race in Time Trials) and click Next

In the Action Space, Choose your action space type, select Discrete and change the maximum and granularity values and notice the action list gets updated.

These action list define the behaviour of the model on the track.

• Creating vehicle - Continuous Action Space

Follow same steps in creating a new vehicle and giving new name and selecting the Camera as Sensor.

For Choose your action space type, select Continuous.

Once you are ready with the vehicles, you should have your vehicles listed in the Garage.

## Create a model

Give a name for the model you are creating.

In this blog, the model deepracerblog-ppo for Discrete and deepracerblog-sac for Continuous Action Space and select Environment Simulation as "ReInvent 2018 Track"

For Race type, select the type (for this post, we select Time trial).

### Training Algorithms and Hyperparameters

• deepracerblog-ppo Model

Training Algorithm: PPO
Hyperparameters: Configure the parameters
Agent: Select the Model you have created with discrete/ continuous action space.

• deepracerblog-ppo Model

Training Algorithm: PPO
Hyperparameters: Configure the parameters
Agent: Select the Model you have created with discrete/ continuous action space.

Once you have fine tuned your model hyper parameters, Choose Next. Lastly, you can write a reward function to configure your vehicle based on the hyperparameters and the total training time. Click Validate to validate the reward function

## Training and Evaluating the model

We need to specify the stopping time for the model training. But there is a catch here. If we train our model for lesser time, there is a high chance of model underfitting, that is, the car may not perform well.

If the model trained for higher time, the model may experience overfitting, that is the model may perform really well for the current track you had trained for, but may not perform well in other tracks.

To achieve the generalization, we need to set optimum hours of training, so that the model converges. From other fellow developers, it is said that 4 hours of training yields good result.

Keep an eye on the evaluation Simulation and the Logs of the model, so that you can improve in future models.

Now we are ready to conquer the checkered flag! Do participate in AWS DeepRacer League and secure top position!