Playing Sonic The Hedgehog 2 using Deep Learning — Part 1

Published in

AI Mind

6 min readAug 31, 2023

The 2D side-scrolling video games like Sonic are suitable environments to test the performance of AI agents because it has various obstacles, enemies, and puzzle elements.

From April 5 to June 5, 2018, the OpenAI held an AI agent competition called a Retro Contest to create this Sonic game. Thankfully, they released an API in the form of OpenAI Gym so everybody can run the Sonic game using Python.

The result of that competition is available at https://openai.com/blog/first-retro-contest-retrospective/.

I started to use the Sonic environment again for research because it requires a relatively simple action compared to Minecraft but the tasks to be handled are similar.

The source code of this post is available from my GitHub.

Network for Sonic

The actions of Sonic consist of key combinations from[“B”, “A”, “MODE”, “START”, “UP”, “DOWN”, “LEFT”, “RIGHT”, “C”, “Y”, “X”, “Z”]. Among them, the number of combinations that are actually used in human playing is 23.

Otherwise, the screen frame is used for state after being resized to (84x84x3).

Training Method

In the case of the spin dash action of the Sonic game, it is quite complicated to be randomly generated from nowhere.

Spin dash action of Sonic(From the author's playing)

Therefore, it is also unreasonable to train the agent with only Deep Reinforcement Learning.

First, the agent is trained by the Supervised Learning method using human expert data. Supervised Learning is not able to deal with every situation of the game. Thus, the trained model is used as an aid when running Deep Reinforcement Learning.

In the case of Deep Reinforcement learning, the IMPALA method in which one Learner is trained by the data passed from multiple Actors to speed up the training.

Due to the long steps of the Sonic game, which is usually over 10000 steps, the data of the Actor is divided into 100 lengths. Therefore, the Learner has the code for off-policy corrections.

Collect the Human Expert Dataset

Unlike previous competitions where boss fights were excluded and only one level was tested, this series includes boss fights, and all available levels of the Sonic 2 game to make a complete agent.

EmeraldHillZone.Act1

EmeraldHillZone.Act2

AquaticRuinZone.Act1

AquaticRuinZone.Act2

ChemicalPlantZone.Act1

ChemicalPlantZone.Act2

MetropolisZone.Act1

MetropolisZone.Act2

MetropolisZone.Act3

OilOceanZone.Act1

OilOceanZone.Act2

MysticCaveZone.Act1

MysticCaveZone.Act2

HillTopZone.Act1

HillTopZone.Act2

CasinoNightZone.Act1

CasinoNightZone.Act2

WingFortressZone.Act1

All levels get harder toward the end, especially the last level requires extremely hard control. All datasets are available from my Google Drive.

Parsing the Human Expert Dataset

For ease of training, using only one action network is good for agents. In such cases, we must manually convert the key combinations used in the human expert data into single integer numbers.

After analyzing all human expert data, a total of 23 actions are used as shown in the table above. Therefore, the number of outputs in the action network should be 23.

Training result — Emerald Hill Zone.Act 1

Before training the agent versus the harder level, the easiest level is used to confirm every code works well as we plan. First, we can confirm that loss of Supervised Learning decreases well.

The next step is playing the game from the saved model from Supervised Learning. It shows that for the simplest tasks, the Agent can finish the stage without further training by Reinforcement Learning.

The Evaluation Result Of Supervised Learning

The following graph shows the results of training by the Reinforcement Learning method. After a certain number of episodes, the agent starts to clear the stage with a high probability of over 90%.

The Agent using the trained model from Reinforcement Learning is also able to clear the stage smoothly.

The Evaluation Result Of the Reinforcement Learning

Training result — Emerald Hill Zone.Act 2

An agent can clear Act 1 easily by learning the dash action properly. For that reason, there was no need to add more networks to the model that is composed of CNN + LSTM like a Minecraft TreeChop case.

However, that method did not work well in Act 2 because the boss appeared at the end of the episode. After a long testing time, it is confirmed that more network is needed to clear this level because of the long jump action of Sonic game. To do the long jump, the jump key should be pressed continuously over a specific time.

Various jump actions of Sonic(From the author’s playing)

However current network can not learn how long the same action needs to be continued because it only uses the game screen as the state.

One additional problem is that the training speed is really slow. It took almost 7 days to learn the 10,000-step episode. As confirmed in another test, this was a problem that occurred when CNN was used directly as a feature extractor.

To solve the two problems mentioned above, two networks are added to the existing model. The first is a CVAE network that learns game screen features separately aside from the policy network. The other is a CNN network that uses the history of previous actions as the input state.

Code for making the action history

Model architecture for using the action history and CVAE

The network code is available from here.

The results of reconstructing the game screen using the learned CVAE network are as follows. It seems like that most game screen information could be trained.

Below is the result of the training loss. It was confirmed that both Policy loss and CVAE loss decreased as training progressed.

Training result when using the action history as input

As a result of the evaluation using the weight of the saved model, the agent can beat up the boss well even without the help of Reinforcement Learning.

Evaluation video when using the action history as input

Additionally, you can check the effect using action history as input for training by comparing it with the case with non-action history training.

Effect of using the action history as input for training loss

Evaluation video of not using the action history as input

Like an act 1 training result, the agent can learn the dash action, but it is not possible to get close to the boss at the same time as the action history case.

Conclusion

In this post, we learned about the basic methods of collecting data to clear all stages of the Sonic game through Deep Learning. It was confirmed that, unlike games such as Minecraft, it is essential to use action history as a state in order to perform complex actions of the Sonic game.

In the case of Sonic games, as the state level increases, more precise control is required. In the next post, we will check whether ChemicalPlantZone can be cleared using the existing method.

A Message from AI Mind

Thanks for being a part of our community! Before you go:

👏 Clap for the story and follow the author 👉
📰 View more content in the AI Mind Publication
🧠 Improve your AI prompts effortlessly and FREE
🧰 Discover Intuitive AI Tools

AI Mind

Playing Sonic The Hedgehog 2 using Deep Learning — Part 1

Network for Sonic

Training Method

Collect the Human Expert Dataset

Parsing the Human Expert Dataset

Training result — Emerald Hill Zone.Act 1

Training result — Emerald Hill Zone.Act 2

Conclusion

A Message from AI Mind

Sign up to discover human stories that deepen your understanding of the world.

Free

Membership

Published in AI Mind

Written by Dohyeong Kim

No responses yet