DEV Community

Cover image for Introduction To Reinforcement Learning With AWS DeepRace
Zaynul Abedin Miah
Zaynul Abedin Miah

Posted on

Introduction To Reinforcement Learning With AWS DeepRace

Machine learning is a method that enables algorithms to learn from data and make predictions without explicit programming. It can be divided into three main fields: supervised learning, unsupervised learning, and reinforcement learning. Supervised learning uses labeled training data to find patterns and make predictions, while unsupervised learning finds patterns in data without labels. Reinforcement learning is a process of learning by experience and interacting with the environment, similar to how a dog learns through its experiences. It is also used for playing games where the model learns to maximize rewards by achieving specific objectives.

Reinforcement learning

Reinforcement learning models can be used to control traffic signals, improve traffic flow, and learn how to drive safely and efficiently. A practical example of reinforcement learning is the game Pexipawn, played on a 3x3 board with two teams making moves to capture each other's pieces. The game is played using a Matchbox computer, inspired by the concept of Hexapawn by an AI researcher in the 1960s.
Learn more about Hexapawn, you can find the original article written by AI researcher Martin Gardner here: β€œHow to build a game-learning machine and then teach it to play and win” by Martin Gardner, Scientific American: http://cs.williams.edu/~freund/cs136-073/GardnerHexapawn.pdf

Image description

AWS DeepRacer

AWS DeepRacer is a small racing car that uses reinforcement learning to drive autonomously. Its main goal is to drive around a track as quickly as possible. The car has a camera facing the front and can also have extra sensors like another camera and a lidar sensor. The software on DeepRacer is called the "agent" and its main goal is to complete laps as fast as possible.

Image description

Reinforcement learning involves agents taking actions in their environment to get the most rewards possible. Deterministic policies have a simple relationship where each state corresponds to a specific action, while stochastic policies provide a variety of actions along with their respective probabilities. The value function evaluates actions and updates policies to encourage desirable behavior. It helps determine how to update policies to encourage good actions and adjust probabilities if their value is low.

Reward functions are used to determine and give rewards based on how well actions help achieve the goal. In DeepRacer, rewards are represented by numbers, and higher numbers mean better performance. Input parameters are given by the system, and reward functions are used to determine and provide a reward value based on certain input parameters.
Other Input parameters of the AWS DeepRacer reward function

Image description

Creating a personalized reward function is important, as clicking the validate button helps check for syntax errors in the reward function. However, there are no guarantees about how well the reward function will work in real-life situations. The AWS DeepRacer vehicle is a small racing car with Wi-Fi connectivity and a maximum of two cameras and an optional lidar sensor. It is powered by an onboard computer with an Intel Atom processor and runs Ubuntu Linux 20.04. The decision-making machine learning models are stored on the computer inside the machine. Intel OpenVINO is a tool used together with the AWS DeepRacer to solve the challenges of decision-making.

The AWS DeepRacer vehicle uses a trained model to make decisions while driving on a real track. The inference process requires a lot of compute power and happens in real-time while the vehicle is in motion. Factors like CPU speed, GPU acceleration, and system RAM all impact the inference process. Machine learning frameworks such as TensorFlow, PyTorch, and Apache MXNet are used for training.

Performing inference locally on the AWS DeepRacer device is important because it helps avoid delays when sending data to a remote server. Performing inference at the edge can be difficult due to limited computing resources. The main goal is to understand the functioning of the AWS DeepRacer device.

There are three steps to the optimization process:

  • Convert and optimize
  • Further tuning for performance (optional)
  • Deploy the model

The Intel OpenVINO toolkit helps optimize models and improve performance on the DeepRacer device. It has three main components: the model optimizer, inference engine, and pre-trained models. The optimization process generates intermediate representations (IR) that the DeepRacer device can understand. Advanced users can use additional tools like the Post Training Optimisation Tool (POTS) to improve the model's performance by reducing precision. Benchmarking and accuracy checkers gather data on how well a model performs and how accurate it is on specific hardware.

Image description

After optimizing and fine-tuning the model, it is ready to be deployed on the AWS DeepRacer vehicle for real-time inference. The OpenVINO inference engine, powered by the Intel Atom processor, allows the device to efficiently run the optimized model.

In most situations, the optimization process can be done automatically when exporting the model from the DeepRacer console, allowing users to move on to deployment without any extra steps.

How to train your first own DeepRacer model?

This guide will help you train your first AWS DeepRacer model, which uses reinforcement learning. To complete the process, you need to follow these steps:

  1. Create a model
    πŸš€ To begin, you need to access the AWS DeepRacer student console. Once you're there, click on 'create a model'.

  2. Give it a name
    It's important to give your model a specific and descriptive name, preferably including the track name and version number.

  3. Select a training algorithm
    🏁 Choose a track to use for your training. Keep in mind that harder tracks will need more practise and might also require a more complicated way to measure success. You can select between two training algorithms: Proximal Policy Optimisation (PPO) or Soft Actor-Critic (SAC).

  4. Customise the reward function
    🎁 Modify the reward function to suit your preferences. For beginners, it is recommended to follow the centre line of the track. This will give the agent a reward for staying near the centre line.

  5. Set the training time
    To begin, it is recommended to set the training time to 60 minutes.

  6. Provide a model description.
    Additionally, please provide a brief description of the chosen algorithm and reward function for the model.

Once the training is complete, you can easily view the model and even make clones of it for additional training or modifications.

Pro tips from top racers in the AWS DeepRacer Pro League:

  • 🏎️ Small changes in the reward function can have a big impact on training.
  • πŸ”„ Making small changes gradually helps in understanding how it affects the performance of a model.
  • βš–οΈ To prevent extreme behaviour, it is important to adjust the balance factors in the reward function.
  • 🏁 Using a consistent naming scheme makes it easier to manage and keep track of models.
  • 🎯 To improve performance, it is important to reward progress and aim for specific goals or routes.
  • πŸ’‘ Keep things simple for faster progress and optimize complex functions gradually.
  • πŸ“Š Analyze logs to understand racer performance and make better decisions.

Top comments (0)