How to build the Deep Learning agent for Minecraft with code— Tutorial 2

Dohyeong Kim
3 min readMar 4, 2022
Minecraft Furnace

In the previous post, I created two model needed to create Deep Learning agent for Minecraft.

Learning-Based Model

In this post, we are going to learn how to train a Learning-Based Model using the Supervised Learning and Reinforcement Learning methods.

You can find every code for this post at https://github.com/kimbring2/minecraft_ai.

Supervised Learning

We can acquire data for training by using the data API of MineRL.

TreeTrajetoryDataset function

In Minecraft, we can discretize the action because it is not too complicated.

Supervised Train function

We need to calculate the action policy of network by putting the state of the expert data to network.

Supervised Replay function

Then, we can obtain the loss for training network by using the difference between policy and the actual action of expert data.

Loss of Supervised Learning

In the case of Supervised Learning, deceasing loss means network is being trained well.

Reinforcement Learning

Model trained through Supervised Learning are still not enough to play game well. Therefore, the agent needs to interact with environment to be improved. We can use Reinforcement Learning method here. In this project, we are going to use the IMPALA technique which allows multiple environments to be used for training .

First, we need to make the Actor which produces the training data for Learner. Communication between the Learner and the Actor can be done using the ZeroMQ package.

Actor for Minecraft

The Learner saves the certain length data sent from the Actor in the queue and send the action calculated by the network to the Actor again.

Data Thread for Minecraft

The data stored in the queue is periodically extracted from another thread in Learner to train the network.

Train Thread for Minecraft

In addition to the loss that is commonly used when training Actor-Critic network, the loss calculated from the KL divergence between the expert network and agent network is used for faster training.

Update function of MineRL

In the case of Reinforcement Learning, increasing reward means that learning is good.

The Reward of Reinforcement Learning

After finishing the training, we can see that the agent searches and collects the trees.

Game Play of Trained Agent

Conclusion

In this post, we investigate how to train a Learning-Based model using the Supervised Learning and Reinforcement Learning method.

Loss for MineRL agent
Training Method for MineRL

In the next post, we are going to apply the method of this post to the MineRLObtainDiamondDense-v0 environment with a Rule-Based model to solve the multi-task job.

--

--

Dohyeong Kim

I am a Deep Learning researcher. Currently, I am trying to make an AI agent for various situations such as MOBA, RTS, and Soccer games.