In the last post, we check how to implement an encoder part that handles the state for Deep Reinforcement Learning. In this article, I am going to describe head part for action.

The code for head network can be found at After completing whole network, we can train network using replay data.

Head network in agent class

Like the encoder network, declaring with agent class and using in step function of class is ideal structure. The state information combined by the encoder network goes to the head network via the core network.

Let’s see how AlphaStar agent selects an action by referring above code. First, agent selects what to do next from the Action Type Head. For example, _BUILD_SUPPLY_DEPOT, _BUILD_BARRACKS, _BUILD_REFINERY, _TRAIN_MARINE, _TRAIN_MARAUDER, _ATTACK_MINIMAP, _BUILD_TECHLAB can be list of action type for simple Terran agent. Second, Selected Units Head decides which unit will execute that action type. If the action is _BUILD_SUPPLY_DEPOT, it would be desirable for network to select one of SCV.

Next, when the action type and selected unit are determined, Target Unit Head and Location Head give unit or position the selected unit applies action type.

If the SCV needs to do _BUILD_SUPPLY_DEPOT action, Location Head should choose an empty ground in game screen . In this case, the Target Unit Head do not need to give a value. As can be seen here, the first two heads must have a value in any case, and second two heads can have ‘None’ value based on situation.

Next, let’s take a quick look structure, input, ouput of each head.

Action Type Head

  1. Input: lstm_output, scalar_context
  2. Output: action_type_logits, action_type, autoregressive_embedding

Selected Units Head

  1. Input: autoregressive_embedding, action_acceptable_entity_type_binary, entity_embeddings
  2. Output: units_logits, units, autoregressive_embedding

Target Unit Head

  1. Input: autoregressive_embedding, action_acceptable_entity_type_binary, entity_embeddings
  2. Output: target_unit_logits, target_unit

Location Head

  1. Input: autoregressive_embedding, action_acceptable_entity_type, map_
  2. Output: target_location_logits, target_location


In this post, we examine how AlphaStar chooses an action. To summarize briefly, the head network receives processed information from the encoder network through the core network and determines the final action based on action type, selected unit, target unit, and position value.

Written by

I am a Deep Reinforcement Learning researcher of South Korea. My final goal is making a AI robot which can cook, cleaning for me using Deep Learning.

Get the Medium app

A button that says 'Download on the App Store', and if clicked it will lead you to the iOS App store
A button that says 'Get it on, Google Play', and if clicked it will lead you to the Google Play store