AlphaStar implementation series - part1


I recently make a simple Terran agent using Rule-based system using PySC2 of DeepMind. Up to Marauder, I can use same method, but it feels like the program become too complicated for making controlling higher tech units.

Because of that problem, I want to use Deep Learning method instead of the Rule-based method. Thus, I should read the AlphaGo paper, which shows the best performance in Starcraft2 area.

While reading the AlphaGo paper published in DeepMind before, I felt that there was not enough reference material for implementing the contents of paper. However, it seems like I am able to replicate it at this time because enough supplementary materials are provided in this paper .

You can download a supplementary materials from It consist of detailed-architecture, pseudocode. Let’s look and make your own AlphaStar using that.

Architecture material

The first supplementary material is the detailed-architecture which contains a detailed description for figure of of the paper.

It shows detailed information such as input data preprocessing method, type and structure of Neural Network, output processing method. Thus, it is essential material for implementation.

Pseudocode material

The second material is a pseudocode in which the contents explained in the paper are implemented roughly. Let’s explain the usefulness of this material by looking Supervised Learning part of paper saying "From each replay, we extract a statistic z that encodes each player’s build order, defined as the first 20 constructed buildings and units, and cumulative statistics, defined as the units, buildings, effects, and upgrades that were present during a game”.

Given only this sentence, converting it directly into code would take a bit of time and make a wrong code with high probability. However, you can see the code that corresponds to this sentence by looking at the file included in the pseudocode .

# Extract build order and build units vectors from replay
bo = replay.get_BO(replay.home_player)
bu = replay.get_BU(replay.home_player)

Based on this code I am able to know that I have to make a code for getting build order, units information from the replay file. After trying various way, I am able to achieve this.

Below is a link of GitHub what I use for that process. Thanks again to the authors for saving my day.

  1. Download replay file :
  2. Extracting observation, action from replay file :

After trying various methods, I confirm that the unit and building production information with the progress of the game must be acquired by history of action performed by the user during the game.

For example, first 20 units and buildings that one Protoss user produced in the early stage from one replay file can be obtained like a left . This information can be used for build_order and build_unit input described in paper and pseudocode.

The python code to get that information is like below.

In addition, the necessary information mentioned in the paper is cumulative statistics, which is information on mineral, gas, units, and buildings collected and consumed during the game. This information can be obtained from observation of replay.

Image for post
Image for post
Players information of replay file

Finally, the replay file used for training is selected based on race and score of the player. This information is saved in replay file together.

Thank you for reading. I’ll update the next series soon.

Written by

I am a Deep Reinforcement Learning researcher of South Korea. My final goal is making a AI robot which can cook, cleaning for me using Deep Learning.

Get the Medium app

A button that says 'Download on the App Store', and if clicked it will lead you to the iOS App store
A button that says 'Get it on, Google Play', and if clicked it will lead you to the Google Play store