
Introduction
The main file in the AlphaStar is alphastar.py. After completing the entire code, you should run the AlphaStar program using ‘python alphastar.py’ command. This file contains four python classes, SC2Environment, Coordinator, ActorLoop, and Learner, respectively. In this post, let's take a closer look at the role of each class.
SC2Environment class
The SC2Environment class has the role of changing the environment input and output of PySC2 according to the AlphaStar format, and is consist of step and reset function. Furthermore, AlphaStar uses the Self-Play method when training, and the state of the agent itself and enemy agent can be obtained by observation [0] and observation [1], respectively. Under the same conditions, the action of each agent will be entered into the step function format like a [home_action, away_action]. Finally, confirming the end of each episode, in the case of PySC2, a peculiarity different from the OpenAI Gym. In the case of the end, observation [0] [0] and observation [1] [0] give a value like a StepType.LAST. Additionally, you can check StepType.START at the beginning and StepType.MID at the middle of the episode. I will omit explanation for the reset function since it is the same as the step function.
Coordinator class
If you look at the main function of the alphastar.py file, you can see that the League class is imported from another Python file. Coordinator class for this section calculate wins and loses between players belonging to this League class. In addition to this, it decide which player will match at next round. This part seems to have just moved in the early code, as it doesn’t need to be created additionally. You don’t have to add code for this class yet.
ActorLoop class
The ActorLoop class determines the action based on the state that occurs between the matching with the players on SC2Environment declared earlier. Furthermore, it collects trajectory data of each agent and sends it to the Learner class.
Learner class
Like the Coordinator class, I do not add a code for the Learner class yet. I am going to add a Tensorflow code after analyzing rl.py file which include loss function for training.
Conclusion
When you run alphastar.py after downloading all file of https://github.com/kimbring2/AlphaStar_Implementation/tree/master/pseudocode, you can check that producing PySC2 environment and running code go well without error although any training is not done yet.
In the next series, I am going to write about training method of Supervised agent using information of replay.