Playing the Pong game using CNN, not DNN
The Pong game has been used widely to measure the performance of Reinforcement Learning. In most of that examples, the DNN policy works well even if the observation is the image frame.
CNN vs DNN comparison at image classification task
However, DNN is not a choice when dealing with image data because it can not learn spatial information. You can easily know it from the MNIST classification experiment.
Based on that result, we should obtain a good result when using CNN than DNN.
Train the RL using CNN policy
Let’s train our CNN model and see the result using the below code.
The reason why I want to use CNN in Pong is that there is a need to learn more about various movements for the multiplayer version of Pong. The opponent of single-player pong has a very simple mechanism. Therefore, it can be defeated using the DNN.
Interestingly, the results using CNN are really poor than DNN. Why doesn’t CNN work like classification in RL?
Based on this paper, the main reason for that problem is that CNN needs to learn good representations (upon which a policy can be trained), they require large amounts of training data.
Recently, a method of adding a Variational Autoencoder when learning has been used to solve the problem of using CNNs in such RL.
To implement this, the encoded feature of VAE is used as an input of RL. Additionally, the reconstruction loss of VAE is added to the loss of RL.
Train the RL using the CVAE policy
Let’s add the CVAE network to the previous code. The official tutorial code of Tensorflow is used.
Interestingly, the result using CVAE is much better than those of CNN.
Let’s look at the reconstructed image per the training step to confirm that CVAE learn the representation of an image frame.
Even if the learning is not perfect yet, the CVAE can reconstruct the shape of the peddle on both sides of the Pong game more and more over training time.
Conclusion
In this post, we looked at how to use the CVAE policy in RL when the input is an image frame. As can be seen from the experimental results, it is also important to train the representation of the image itself. In the next post, we will learn to use the method in multi-player Pong games where the movement of agents should be more complex than in single-player games.