AlphaStar implementation series — Network
Starcraft 2 uses a multiple states and actions unlike the Atari games used for Deep Learning research. For that reason, agent of Starcraft 2 should have multiple Neural Networks unlike agent of Atari game which has only 1 Encoder network for state, head network for action.
Network Architecture
There are a total of 9 states in Starcraft 2 that I use: feature screen, feature minimap, player, game_loop, available_actions, build_queue, single_select, multi_select, score_cumulative. Moreover, action consist of 14 elements: action type and screen, minimap, screen2, queued, control_group_act, control_group_id, select_point_act, select_add, select_unit_act, select_unit_id, select_worker, build_queue_id, and unload_id arguments those are decided by action type. Another network called baseline is added to use for Reinforcement Learning.
The Encoder network and Head network are connected through the Core network. To decide action according state, Action Type network first receive output of Core network. Information of selected action is delivered to other Network for argument. That method is called as auto-regressive policy. Finally, we directly connect a screen, minimap state to networks of screen, minimap, screen2 argument.
When implementing as code, spatial encoder network is used for feature_screen, feature_minimap state.
And scalar encoder network is used for player, game_loop, available_actions, build_queue, single_select, multi_select, score_cumulative state.
And LSTM is used for the Core network to recognize the sequential data.
Head network of action type returns an additional embedding to pass to the argument head network.
Head network of screen, minimap, screen2 argument basically receive values from Core and action type Head network. Furthermore, screen and minimap state are additionally added.
The Head network of the remaining arguments except screen, minimap, and screen2 receives values from Core and action type Head network.
The Baseline network used in Reinforcment Learning receives the value from the Core network.
Then, we can create entire network using the spatial encoder, scalar encoder, core, action type head, argument head, and baseline network listed above.
State memory of LSTM needs to be managed separately as input and output of the network.
You can see the full code for this post at https://github.com/kimbring2/AlphaStar_Implementation/blob/master/network.py.
Conclusion
In this post, we explore how to create a network to handle the complex states and actions of Starcraft 2. In the next post, we are going to train the built network using the Reinforcement Learning method.