Introduction

Until the last post, we looked at how to extract human playing data from replay file and implement agent and network of AlphaStar. When these three are ready, we can now proceed training process of paper. The first step is training the network of agent with data extracted from replay through Supervised Learning.

Image for post
Image for post
Training process of AlphaStar

Model structure of TensorFlow for batch training

Unlike the inference process used in the step function agent, training requires multiple datasets to be entered at once by batch size. There, the network can deal to variable batch size.

Core function for batch datasets

When using reshape function in the Core created in the previous post, tensor size of that part will be set automatically according to the batch size if you put -1 in size argument of reshape function. The model can be used regardless of the batch size if same method are applied to the reshape function of encoders, heads. …


Introduction

In the last post, we check how to implement an encoder part that handles the state for Deep Reinforcement Learning. In this article, I am going to describe head part for action.

Image for post
Image for post

The code for head network can be found at https://github.com/kimbring2/AlphaStar_Implementation/blob/master/network.py. After completing whole network, we can train network using replay data.

Head network in agent class

Like the encoder network, declaring with agent class and using in step function of class is ideal structure. The state information combined by the encoder network goes to the head network via the core network.

Let’s see how AlphaStar agent selects an action by referring above code. First, agent selects what to do next from the Action Type Head. For example, _BUILD_SUPPLY_DEPOT, _BUILD_BARRACKS, _BUILD_REFINERY, _TRAIN_MARINE, _TRAIN_MARAUDER, _ATTACK_MINIMAP, _BUILD_TECHLAB can be list of action type for simple Terran agent. Second, Selected Units Head decides which unit will execute that action type. If the action is _BUILD_SUPPLY_DEPOT, it would be desirable for network to select one of SCV. …


Introduction

In past series, we looked at how to implement overall training structure of AlphaStar in code, except for the Neural Network part.

In this series, let’s implement Neural Network in code, which needs to be preceded for Reinforcement Learning and Supervised Learning for agent.

Image for post
Image for post

Let’s take a look at the how to implement three encoder using TensorFlow and input of them by processing an observations of PySC2.

The code for each network can be found at https://github.com/kimbring2/AlphaStar_Implementation/blob/master/network.py. Due to problem of post length, this post should only raise the code for observation preprocessing.

Encoder network in agent class

First, let’s take a quick look at the class structure of the agent of AlphaStar. The scalar encoder, spatial encoder, entity encoder, and core network is declared in the init function. Next, in the step function, each encoder calculate the output value from processed observation. After that, core network will send these value to action head after concatenating them. …


Image for post
Image for post
Training infrastructure of AlphaStar

Introduction

The main file in the AlphaStar is alphastar.py. After completing the entire code, you should run the AlphaStar program using ‘python alphastar.py’ command. This file contains four python classes, SC2Environment, Coordinator, ActorLoop, and Learner, respectively. In this post, let's take a closer look at the role of each class.

SC2Environment class

The SC2Environment class has the role of changing the environment input and output of PySC2 according to the AlphaStar format, and is consist of step and reset function. Furthermore, AlphaStar uses the Self-Play method when training, and the state of the agent itself and enemy agent can be obtained by observation [0] and observation [1], respectively. Under the same conditions, the action of each agent will be entered into the step function format like a [home_action, away_action]. Finally, confirming the end of each episode, in the case of PySC2, a peculiarity different from the OpenAI Gym. In the case of the end, observation [0] [0] and observation [1] [0] give a value like a StepType.LAST. Additionally, you can check StepType.START at the beginning and StepType.MID at the middle of the episode. …


Image for post
Image for post

Introduction

In 2019, a competition (http://minerl.io/competition/) was held for Minecraft which is a famous game where participant uses human game playing data for training Deep Reinforcement Learing agent.

I participated at the time, but my result was not great because I was not familiar with method for applying human demonstraion data to Deep Reinforment Learning.

Agent of winner of contest

The final winner of the contest used the method called Hierarchical Deep Q-Network from Imperfect Demonstrations. They published the method they used in a paper (https://arxiv.org/pdf/1912.08664v2.pdf). However, it seems that the source code has not been released yet.

Therefore, I set goal to implement paper the winner and I will record my work contents here. …


Introduction

I recently make a simple Terran agent using Rule-based system using PySC2 of DeepMind. Up to Marauder, I can use same method, but it feels like the program become too complicated for making controlling higher tech units.

Because of that problem, I want to use Deep Learning method instead of the Rule-based method. Thus, I should read the AlphaGo paper, which shows the best performance in Starcraft2 area.

While reading the AlphaGo paper published in DeepMind before, I felt that there was not enough reference material for implementing the contents of paper. …

About

Dohyeong Kim

I am a Deep Reinforcement Learning researcher of South Korea. My final goal is making a AI robot which can cook, cleaning for me using Deep Learning.

Get the Medium app

A button that says 'Download on the App Store', and if clicked it will lead you to the iOS App store
A button that says 'Get it on, Google Play', and if clicked it will lead you to the Google Play store