Playing the Pong game using CNN, not DNN

3 min readMay 4, 2023

The Pong game has been used widely to measure the performance of Reinforcement Learning. In most of that examples, the DNN policy works well even if the observation is the image frame.

DNN policy example for the Pong game. From the legacy blog of Karpathy.

CNN vs DNN comparison at image classification task

However, DNN is not a choice when dealing with image data because it can not learn spatial information. You can easily know it from the MNIST classification experiment.

Based on that result, we should obtain a good result when using CNN than DNN.

Train the RL using CNN policy

Let’s train our CNN model and see the result using the below code.

The reason why I want to use CNN in Pong is that there is a need to learn more about various movements for the multiplayer version of Pong. The opponent of single-player pong has a very simple mechanism. Therefore, it can be defeated using the DNN.

RL training code for CNN policy

Interestingly, the results using CNN are really poor than DNN. Why doesn’t CNN work like classification in RL?

Based on this paper, the main reason for that problem is that CNN needs to learn good representations (upon which a policy can be trained), they require large amounts of training data.

Recently, a method of adding a Variational Autoencoder when learning has been used to solve the problem of using CNNs in such RL.

To implement this, the encoded feature of VAE is used as an input of RL. Additionally, the reconstruction loss of VAE is added to the loss of RL.

The network architecture of the DeepMind paper

Train the RL using the CVAE policy

Let’s add the CVAE network to the previous code. The official tutorial code of Tensorflow is used.

RL training code for CVAE policy

Interestingly, the result using CVAE is much better than those of CNN.

Let’s look at the reconstructed image per the training step to confirm that CVAE learn the representation of an image frame.

Learning screen representations using CVAE

Even if the learning is not perfect yet, the CVAE can reconstruct the shape of the peddle on both sides of the Pong game more and more over training time.

Conclusion

In this post, we looked at how to use the CVAE policy in RL when the input is an image frame. As can be seen from the experimental results, it is also important to train the representation of the image itself. In the next post, we will learn to use the method in multi-player Pong games where the movement of agents should be more complex than in single-player games.

Playing the Pong game using CNN, not DNN

CNN vs DNN comparison at image classification task

Train the RL using CNN policy

Train the RL using the CVAE policy

Conclusion

Written by Dohyeong Kim

No responses yet