Photo by Ryan Quintal on Unsplash

 

In the recent past, OpenAI becomes famous amongst IT professionals because of its achievements. OpenAI has large community support that tries to create human-like intelligence through reinforcement learning. OpenAI provides its community with gym.

The gym is a laboratory for engineers who want to train different reinforcement learning agents. It has different environments like games and robots in the simulator. In this article, we’ll use one of the environments to train an RL agent from scratch.

Prerequisites

  1. Python
  2. TensorFlow
  3. Basic understanding of deep learning

Basics for OpenAI Gym

Before moving to code, we need to learn some important terms.

  1. State: Situation of player at any given time is called its state.
  2. Action: Any move that the agent takes is called action.
  3. Reward: Based on the action, the agent will get positive or negative rewards.
  4. Episodes: Episode can be described as the time between the start of the game and the end of the game.
  5. Policy: The set of rules for Agents to follow in the environment at any given state, aiming for high rewards.

How Agent in RL learns?

Learning in RL is explained clearly in the below figure:

 

 

Agent is getting information regarding its state from the environment. So based on the state it will try to perform some action. Action performed will return some reward along with the new state of Agent. The agent will try to maximize its rewards at the end of an episode.

The gym provides all these parameters in its environment. Agent is to be created from multiple algorithms available with keeping in mind, the type of AI we want to develop. Some of those algorithms are listed below:

  1. DQN (Deep Q-learning)
  2. DDPG (Deep Deterministic Policy Gradient)
  3. SARSA (Stage Action Reward Stage Action)
  4. NAF (Normalized Advantage Function)

In our example, we’ll be using the DQN algorithm. Which works well with most of the environments of OpenAI. The issue with DQN is that it doesn’t support continuous learning.

Into the code

We’ll be training our model on Breakout-ram-v0. There are hundreds of such games and environments to choose from in OpenAI gym.

You can install gym and Keras-rl python library from pip using the following command:

pip install gym keras-rl

First, we’ll start with importing libraries.

We need to feed a DNN to DQNAgent to give the brain to random decision-maker. DNN will be created using the Keras library.

For the RL agent, the Keras-rl library is used. We’re importing EpsGreedyQPolicy as a policy for Agent. SequentialMemory will save the whole Q-table for referencing it as a cheat sheet for all possible state-actions.

Now, start by loading the environment to gym and set the random seed for creating randomness in the environment. Extract out different actions in the environment.

Let’s create a DNN model to pass into DQNAgent. Bigger DNN will result in better accuracy of the model. But before increasing the size of DNN, the memory aspect should be taken into considerations. Bigger DNN will require more memory to store and compute different values.

We first use Epsilon Greedy Policy to give an Agent set of rules to follow. SequentialMemory is used with a limit of 50000. Initialize DQNAgent by providing the DNN model and other features as parameters.

At last, fit model with the data from the environment and it will start training the model. Here, we are visualizing the model for better understanding but it will slow the learning process and consume memory resources.

This line will close the environment and visual window. It will clear the RAM required by the environment and model to train.

What’s next?

In this article, we just covered model-based reinforcement learning. These types of algorithms are not capable of handling continuous environments. The DDPG is used for the environment with continuous action space. I’ll cover this algorithm in a separate article.

The gym has different continuous environments to train your model. Mujoco and Robotics contain such environments.

Conclusion

In conclusion, OpenAI gym is very useful for emerging as well as intermediate Reinforcement Learning developers. Researchers could use the gym to test multiple models and find the best performing model.

Not only that, OpenAI is an open-source library that makes it easier for everyone to stay updated about RL revolutions and all can learn at the same time.