How to build the Deep Learning agent for Minecraft with code— Tutorial 2
In the previous post, I created two model needed to create Deep Learning agent for Minecraft.
In this post, we are going to learn how to train a Learning-Based Model using the Supervised Learning and Reinforcement Learning methods.
You can find every code for this post at https://github.com/kimbring2/minecraft_ai.
Supervised Learning
We can acquire data for training by using the data API of MineRL.
In Minecraft, we can discretize the action because it is not too complicated.
We need to calculate the action policy of network by putting the state of the expert data to network.
Then, we can obtain the loss for training network by using the difference between policy and the actual action of expert data.
In the case of Supervised Learning, deceasing loss means network is being trained well.
Reinforcement Learning
Model trained through Supervised Learning are still not enough to play game well. Therefore, the agent needs to interact with environment to be improved. We can use Reinforcement Learning method here. In this project, we are going to use the IMPALA technique which allows multiple environments to be used for training .
First, we need to make the Actor which produces the training data for Learner. Communication between the Learner and the Actor can be done using the ZeroMQ package.
The Learner saves the certain length data sent from the Actor in the queue and send the action calculated by the network to the Actor again.
The data stored in the queue is periodically extracted from another thread in Learner to train the network.
In addition to the loss that is commonly used when training Actor-Critic network, the loss calculated from the KL divergence between the expert network and agent network is used for faster training.
In the case of Reinforcement Learning, increasing reward means that learning is good.
After finishing the training, we can see that the agent searches and collects the trees.
Conclusion
In this post, we investigate how to train a Learning-Based model using the Supervised Learning and Reinforcement Learning method.
In the next post, we are going to apply the method of this post to the MineRLObtainDiamondDense-v0 environment with a Rule-Based model to solve the multi-task job.