Deep Q-learning from Demonstrations (DQfD) for Minecraft — Tutorial 1
In 2019, a competition (http://minerl.io/competition/) was held for Minecraft which is a famous game where participant uses human game playing data for training Deep Reinforcement Learing agent.
I participated at the time, but my result was not great because I was not familiar with method for applying human demonstraion data to Deep Reinforment Learning.
The final winner of the contest used the method called Hierarchical Deep Q-Network from Imperfect Demonstrations. They published the method they used in a paper (https://arxiv.org/pdf/1912.08664v2.pdf). However, it seems that the source code has not been released yet.
Therefore, I set goal to implement paper the winner and I will record my work contents here.
Collecting wood in Minecraft
Wood is a very important resource in Minecraft. The Crafting Table needed to make a Wood Axe can be made quickly by collecting only that.
DQFD Tensorflow code for collection wood can be found on my Github repository(https://github.com/kimbring2/MineRL/blob/master/dqfd_treechop.py).
The code network structure is the same as a general DQN network, and differs from the existing DQN in that human demonstrations are used for training.
The above code consist of training Deep Q-learning network using human demonstration data using Supervised Learning method. After finishing that training part, agent start to interact with real environnment. It collects a new data from that process and use it with human demonstration data for training network using Reinforcement Learining.
These kind of method is called Deep Q-learning from Demonstrations which was published by DeepMind in 2017. That method often used in hard environments where it is difficult to solve using only Deep Q-learning.
Collecting wood with Minecraft is not easy from a Reinforcement Learning perspective. It is relatively easy for an agent to get a tree near. However, agent should attatk a specific location certain number of times.
If Reinforcement Learning is used for that task only, training would take a very long time.
Ways to improve agent performance
When you confirm DQFD training goes well using first code, it is necessary to improve the performance of the agent by changing the network structure and action type. Since Minecraft is a first-person game, there is a limitation of information that is recognized once. Therefore, it is necessary to additionally use RNN in addition to CNN.
DRQFD Tensorflow code for collection wood can be found on my Github repository(https://github.com/kimbring2/DQFD_Minecraft/blob/master/drqfd_treechop.py).
Unlike the network structure of first code, you are able to see that RNN network is added to CNN network.
RNN added agent shows better performance compared to DQFD agents. In particular, agent try to collect remaining tree after chopping front side tree in when RNN is added.