Introduction

In past series, we looked at how to implement overall training structure of AlphaStar in code, except for the Neural Network part.

In this series, let’s implement Neural Network in code, which needs to be preceded for Reinforcement Learning and Supervised Learning for agent.

Image for post
Image for post

Let’s take a look at the how to implement three encoder using TensorFlow and input of them by processing an observations of PySC2.

The code for each network can be found at https://github.com/kimbring2/AlphaStar_Implementation/blob/master/network.py. Due to problem of post length, this post should only raise the code for observation preprocessing.

Encoder network in agent class

First, let’s take a quick look at the class structure of the agent of AlphaStar. The scalar encoder, spatial encoder, entity encoder, and core network is declared in the init function. Next, in the step function, each encoder calculate the output value from processed observation. After that, core network will send these value to action head after concatenating them.

Scalar Encoder

Scalar encoder needs an agent_statistics, race, upgrades, time information of game. First, agent_statistics can be gained the score_by_categorys value, which is a two-dimensional array included in observation of PySC2. That value is converted into a one-dimensional array and applied logarithmic after adding 1. For the race value, value from 0~3 is determined based on race of each players like a Protoss(0), Terran(1), Zerg(2), and Unknown(3) as a one-dimensional one-hot array with a length of 5. For upgrades value, one-dimensional array of 0 for all upgrades lengths that exist in PySC2 is created at first and one of array value is changed to 1 when the corresponding upgrade is appeared duringthe game. A list of all upgrades can be found at https://github.com/deepmind/pysc2/blob/master/pysc2/lib/upgrades.py. The last value is the duration of the game usually has value from 0 to 16000, which can be obtained by applying positional encoding for Transformer. More details can be found at https://www.tensorflow.org/tutorials/text/transformer#positional_encoding.

The information below is the exact size of the input.

  1. agent_statistics.shape: (55,)

Spatial Encoder

Spatial encoder uses the screen image of the game for input, so there is no need to make major changes.

The information below is the exact size of the input.

  1. feature_screen.shape: (27, 128, 128)

Entity Encoder

The entity encoder is responsible for the details of each unit and building that exists on the screen. That information is consist of unit_type, current_health, current_shields, current_energy, x_position, y_position, assigned_harvesters, ideal_harvesters, weapon_cooldown, weapon_upgrades, armor_upgrades, shield_upgrades, is_selected which I can understood clearly among informations of papers. That values is combined into 2 dimensional array after converted into one-hot or, boolean array.

The information below is the exact size of the input.

  1. embedded_feature_units.shape: (512, 464)

Core Network

The observations preprocessed by the above method pass through the encoders of the agent and concatenated together, and finally pass through the core network consisting of the LSTM and is changed to the reference value for the agent to select the action.

Conclusion

In this post, I investigate the process of acquiring and processing game information of AlphaStar agent. In the next post, let’s use this information to select action for playing game.

Written by

I am a Deep Reinforcement Learning researcher of South Korea. My final goal is making a AI robot which can cook, cleaning for me using Deep Learning.

Get the Medium app

A button that says 'Download on the App Store', and if clicked it will lead you to the iOS App store
A button that says 'Get it on, Google Play', and if clicked it will lead you to the Google Play store