DEV Community

Cover image for Eddie’s Take: Deep Reinforcement Learning with Relational Inductive Biases
Edel Prado
Edel Prado

Posted on

Eddie’s Take: Deep Reinforcement Learning with Relational Inductive Biases

Hello everyone!

I recently came across a conference paper by Zambaldi et al. on deep reinforcement learning with relational inductive biases that I found interesting, but the reading is a bit hard to swallow. With that in mind, I decided to do my take on the paper as a high level overview.


NOTE: I do recommend taking the time to read Deep Reinforcement Learning with Relational Inductive Biases. It explains in detail their entire process of using reinforcement learning. I would highlight the Appendix section. Very cool stuff!


Before we get into the heat of it, let's ask two general questions.

What is reinforcement learning?

Reinforcement Learning (RL) is a method of machine learning where an ‘agent’ learns through interactions with its environment that maximizes the rewards it receives. A good way to think of it would be training a dog using treats as a reward.

What are relational inductive biases?

Relational Inductive Bias places constraints or policies on relationships and interactions among systems in a learning process.

With that out of the way, let’s dive in!

Introduction

The conference paper starts with a quick introduction to recent deep reinforcement learning systems. The positives of recent RL systems are its flexibility in how they learn and utilize the statistical number crunching that is the foundation of our observations and reward signals. Think of the game of pong between a human and a agent using RL. Our agent is being fed frames of the game at hand (observation). If the agent is able to score a point, it will be rewarded positively, and if the human scores a point, the agent will be rewarded negatively. These are the reward signals. By doing this, the agent learns entirely by itself by trying to maximize the rewards.

As for the negatives of RL systems, they include low sample efficiency and poor scaling beyond the set policies of the training environment. What we mean by scaling is that we wouldn’t be able to use the trained RL agent with another game that isn’t pong.

To help counter the negatives, many have turned to using relational inductive biases. The benefits of using relational inductive biases in RL include flexible statistical learning (which helps with the low sample efficiency) and increased structured approaches (helps with scalability).

The authors decided to incorporate relational inductive biases into their RL agent. This led to the RL agent being able to handle challenging tasks. The agent was able to reach state-of-the-art performance on six out of seven StarCraft II mini games. The RL agent even surpassed grandmaster level on those four mini games! On top of that, the authors created what they called a “Box-World” which factored out complex vision spaces. Because of this, the RL agent with relational inductive bias had a higher performance, better efficiency, and is able to generalize an image/frame to solve problems with more complex solutions.

Here is a quick look at the Box-World that is used as input for the RL agent as well as shots of the StarCraft II mini game.

Box-World


Relational Deep RL Agent Architecture

Here, we will be diving into how the model works. For a visual representation of the agent architecture, see below.

Image description

The authors started with a deep RL algorithm based on the A2C method
What is the A2C method? Mike Wang says it best in his published blog on Towards Data Science

In the field of Reinforcement Learning, the Advantage Actor Critic (A2C) algorithm combines two types of Reinforcement Learning algorithms (Policy Based and Value Based) together. Policy Based agents directly learn a policy (a probability distribution of actions) mapping input states to output actions

If it seems a bit complicated, it's because it is! To keep it simple, the process goes something like this.

  1. Input Module - the input is an image of a scene. This would be where our Box-World maps come in handy! The model takes the image and digests it for the relational module to use.

  2. Relational Module - Think of this as the brain of the RL agent. It fires neurons, just like our actual brain, to understand the input and then computes using past data that it collected to find the best course of action.

  3. Output Module - This takes the best course of actions and transforms it to be used in the game.

This is a high level take on the RL agent architecture, so if you'd like to know the math behind it all, I would suggest checking out section 2 of the conference paper.

Experiments and Results

Box-World

This section is all about Box-World! I love the idea of they turned the StarCraft II mini game map into a simplified yet combinatorially complex image. It is made up of a 12x12 pixel room that has keys and boxes randomly scattered. It also has the agent itself in the pixel base room that can move!

To break down how the Box-World works, let talk about how one of the mini game works. There are loose keys (shown as a single colored pixel) that the agent needs to pick up to open boxes (depicted as two colored pixels. One to label it as a box and the second to label the type of key it needs to unlock). Here's the catch! There are multiple boxes and each box has a corresponding key. So the agent will need to understand the key location, move to it, pick it up, move to the corresponding box, open said box, and then collect what is inside. Using the simplified map, it allows the agent to reduce processing a highly detailed image, and generalizes the map which can make it easier to apply in other situations.

Here is Box-World in action!

Box-World playing the mini game

With the Box-World finished, the team then began experimenting with other StarCraft II mini-games developed for the StarCraft II Learning Environment. The result led to the RL agent to surpass a human grandmaster in for of them!

Here is a performance chart on this.

rl_agent_performance

This, in my opinion, is incredible!

The paper then continues with more details into how it all works as well as applicability to other tasks.


Ending Note: All the images came from the conference paper (the topic of this blog) which be found here. It is a well thought out and informative paper that I'm glad I decided to read.


BONUS: Check out this video of DeepMind Starcraft going up against Grzegorz 'Mana' Komincz who plays on Team Liquid esports.

Top comments (0)