AlphaStar implementation series — Network

2 min readNov 15, 2020

Starcraft 2 uses a multiple states and actions unlike the Atari games used for Deep Learning research. For that reason, agent of Starcraft 2 should have multiple Neural Networks unlike agent of Atari game which has only 1 Encoder network for state, head network for action.

Network Architecture

There are a total of 9 states in Starcraft 2 that I use: feature screen, feature minimap, player, game_loop, available_actions, build_queue, single_select, multi_select, score_cumulative. Moreover, action consist of 14 elements: action type and screen, minimap, screen2, queued, control_group_act, control_group_id, select_point_act, select_add, select_unit_act, select_unit_id, select_worker, build_queue_id, and unload_id arguments those are decided by action type. Another network called baseline is added to use for Reinforcement Learning.

The Encoder network and Head network are connected through the Core network. To decide action according state, Action Type network first receive output of Core network. Information of selected action is delivered to other Network for argument. That method is called as auto-regressive policy. Finally, we directly connect a screen, minimap state to networks of screen, minimap, screen2 argument.

When implementing as code, spatial encoder network is used for feature_screen, feature_minimap state.

Encoder for spatial feature

And scalar encoder network is used for player, game_loop, available_actions, build_queue, single_select, multi_select, score_cumulative state.

Encoder for scalar feature

And LSTM is used for the Core network to recognize the sequential data.

Core network to connect between encoder and head

Head network of action type returns an additional embedding to pass to the argument head network.

Head network for action type

Head network of screen, minimap, screen2 argument basically receive values from Core and action type Head network. Furthermore, screen and minimap state are additionally added.

Head network for spatial argument

The Head network of the remaining arguments except screen, minimap, and screen2 receives values from Core and action type Head network.

Head network for scalar argument

The Baseline network used in Reinforcment Learning receives the value from the Core network.

Baseline network for Reinforcement Learning

Then, we can create entire network using the spatial encoder, scalar encoder, core, action type head, argument head, and baseline network listed above.

Main network of AlphaStar

State memory of LSTM needs to be managed separately as input and output of the network.

You can see the full code for this post at https://github.com/kimbring2/AlphaStar_Implementation/blob/master/network.py.

Conclusion

In this post, we explore how to create a network to handle the complex states and actions of Starcraft 2. In the next post, we are going to train the built network using the Reinforcement Learning method.

AlphaStar implementation series — Network

Network Architecture

Conclusion

Written by Dohyeong Kim

No responses yet