Until the last post, we looked at how to extract human playing data from replay file and implement agent and network of AlphaStar. When these three are ready, we can now proceed training process of paper. The first step is training the network of agent with data extracted from replay through Supervised Learning.
Unlike the inference process used in the step function agent, training requires multiple datasets to be entered at once by batch size. There, the network can deal to variable batch size.
When using reshape function in the Core created in the previous post, tensor size of that part will be set automatically according to the batch size if you put -1 in size argument of reshape function. The model can be used regardless of the batch size if same method are applied to the reshape function of encoders, heads. …
In the last post, we check how to implement an encoder part that handles the state for Deep Reinforcement Learning. In this article, I am going to describe head part for action.
The code for head network can be found at https://github.com/kimbring2/AlphaStar_Implementation/blob/master/network.py. After completing whole network, we can train network using replay data.
Like the encoder network, declaring with agent class and using in step function of class is ideal structure. The state information combined by the encoder network goes to the head network via the core network.
Let’s see how AlphaStar agent selects an action by referring above code. First, agent selects what to do next from the Action Type Head. For example, _BUILD_SUPPLY_DEPOT, _BUILD_BARRACKS, _BUILD_REFINERY, _TRAIN_MARINE, _TRAIN_MARAUDER, _ATTACK_MINIMAP, _BUILD_TECHLAB can be list of action type for simple Terran agent. Second, Selected Units Head decides which unit will execute that action type. If the action is _BUILD_SUPPLY_DEPOT, it would be desirable for network to select one of SCV. …
In past series, we looked at how to implement overall training structure of AlphaStar in code, except for the Neural Network part.
In this series, let’s implement Neural Network in code, which needs to be preceded for Reinforcement Learning and Supervised Learning for agent.
Let’s take a look at the how to implement three encoder using TensorFlow and input of them by processing an observations of PySC2.
The code for each network can be found at https://github.com/kimbring2/AlphaStar_Implementation/blob/master/network.py. Due to problem of post length, this post should only raise the code for observation preprocessing.
First, let’s take a quick look at the class structure of the agent of AlphaStar. The scalar encoder, spatial encoder, entity encoder, and core network is declared in the init function. Next, in the step function, each encoder calculate the output value from processed observation. After that, core network will send these value to action head after concatenating them. …
The main file in the AlphaStar is alphastar.py. After completing the entire code, you should run the AlphaStar program using ‘python alphastar.py’ command. This file contains four python classes, SC2Environment, Coordinator, ActorLoop, and Learner, respectively. In this post, let's take a closer look at the role of each class.
The SC2Environment class has the role of changing the environment input and output of PySC2 according to the AlphaStar format, and is consist of step and reset function. Furthermore, AlphaStar uses the Self-Play method when training, and the state of the agent itself and enemy agent can be obtained by observation  and observation , respectively. Under the same conditions, the action of each agent will be entered into the step function format like a [home_action, away_action]. Finally, confirming the end of each episode, in the case of PySC2, a peculiarity different from the OpenAI Gym. In the case of the end, observation   and observation   give a value like a StepType.LAST. Additionally, you can check StepType.START at the beginning and StepType.MID at the middle of the episode. …
In 2019, a competition (http://minerl.io/competition/) was held for Minecraft which is a famous game where participant uses human game playing data for training Deep Reinforcement Learing agent.
I participated at the time, but my result was not great because I was not familiar with method for applying human demonstraion data to Deep Reinforment Learning.
The final winner of the contest used the method called Hierarchical Deep Q-Network from Imperfect Demonstrations. They published the method they used in a paper (https://arxiv.org/pdf/1912.08664v2.pdf). However, it seems that the source code has not been released yet.
Therefore, I set goal to implement paper the winner and I will record my work contents here. …
I recently make a simple Terran agent using Rule-based system using PySC2 of DeepMind. Up to Marauder, I can use same method, but it feels like the program become too complicated for making controlling higher tech units.
Because of that problem, I want to use Deep Learning method instead of the Rule-based method. Thus, I should read the AlphaGo paper, which shows the best performance in Starcraft2 area.
While reading the AlphaGo paper published in DeepMind before, I felt that there was not enough reference material for implementing the contents of paper. …