AI Mind

Learn, explore, or build the future of AI with top stories on the latest trends, tools, and technology. Share your crazy success stories or AI-fueled fails in a supportive community.

Follow publication

Playing Sonic The Hedgehog 2 using Deep Learning — Part 1

--

Poster of Sonic TheHedgehog Movie 2020

The 2D side-scrolling video games like Sonic are suitable environments to test the performance of AI agents because it has various obstacles, enemies, and puzzle elements.

From April 5 to June 5, 2018, the OpenAI held an AI agent competition called a Retro Contest to create this Sonic game. Thankfully, they released an API in the form of OpenAI Gym so everybody can run the Sonic game using Python.

The result of that competition is available at https://openai.com/blog/first-retro-contest-retrospective/.

I started to use the Sonic environment again for research because it requires a relatively simple action compared to Minecraft but the tasks to be handled are similar.

The source code of this post is available from my GitHub.

Network for Sonic

The actions of Sonic consist of key combinations from[“B”, “A”, “MODE”, “START”, “UP”, “DOWN”, “LEFT”, “RIGHT”, “C”, “Y”, “X”, “Z”]. Among them, the number of combinations that are actually used in human playing is 23.

Agent Network Version 1

Otherwise, the screen frame is used for state after being resized to (84x84x3).

Training Method

In the case of the spin dash action of the Sonic game, it is quite complicated to be randomly generated from nowhere.

Spin dash action of Sonic(From the author's playing)

Therefore, it is also unreasonable to train the agent with only Deep Reinforcement Learning.

Loss for Sonic agent

First, the agent is trained by the Supervised Learning method using human expert data. Supervised Learning is not able to deal with every situation of the game. Thus, the trained model is used as an aid when running Deep Reinforcement Learning.

In the case of Deep Reinforcement learning, the IMPALA method in which one Learner is trained by the data passed from multiple Actors to speed up the training.

Training method for Sonic agent

Due to the long steps of the Sonic game, which is usually over 10000 steps, the data of the Actor is divided into 100 lengths. Therefore, the Learner has the code for off-policy corrections.

Collect the Human Expert Dataset

Unlike previous competitions where boss fights were excluded and only one level was tested, this series includes boss fights, and all available levels of the Sonic 2 game to make a complete agent.

EmeraldHillZone.Act1
EmeraldHillZone.Act2
AquaticRuinZone.Act1
AquaticRuinZone.Act2
ChemicalPlantZone.Act1
ChemicalPlantZone.Act2
MetropolisZone.Act1
MetropolisZone.Act2
MetropolisZone.Act3
OilOceanZone.Act1
OilOceanZone.Act2
MysticCaveZone.Act1
MysticCaveZone.Act2
HillTopZone.Act1
HillTopZone.Act2
CasinoNightZone.Act1
CasinoNightZone.Act2
WingFortressZone.Act1

All levels get harder toward the end, especially the last level requires extremely hard control. All datasets are available from my Google Drive.

Parsing the Human Expert Dataset

For ease of training, using only one action network is good for agents. In such cases, we must manually convert the key combinations used in the human expert data into single integer numbers.

After analyzing all human expert data, a total of 23 actions are used as shown in the table above. Therefore, the number of outputs in the action network should be 23.

Training result — Emerald Hill Zone.Act 1

Before training the agent versus the harder level, the easiest level is used to confirm every code works well as we plan. First, we can confirm that loss of Supervised Learning decreases well.

The Supervised Learning Result

The next step is playing the game from the saved model from Supervised Learning. It shows that for the simplest tasks, the Agent can finish the stage without further training by Reinforcement Learning.

The Evaluation Result Of Supervised Learning

The following graph shows the results of training by the Reinforcement Learning method. After a certain number of episodes, the agent starts to clear the stage with a high probability of over 90%.

The Reinforcement Learning Result

The Agent using the trained model from Reinforcement Learning is also able to clear the stage smoothly.

The Evaluation Result Of the Reinforcement Learning

Training result — Emerald Hill Zone.Act 2

An agent can clear Act 1 easily by learning the dash action properly. For that reason, there was no need to add more networks to the model that is composed of CNN + LSTM like a Minecraft TreeChop case.

However, that method did not work well in Act 2 because the boss appeared at the end of the episode. After a long testing time, it is confirmed that more network is needed to clear this level because of the long jump action of Sonic game. To do the long jump, the jump key should be pressed continuously over a specific time.

Various jump actions of Sonic(From the author’s playing)

However current network can not learn how long the same action needs to be continued because it only uses the game screen as the state.

One additional problem is that the training speed is really slow. It took almost 7 days to learn the 10,000-step episode. As confirmed in another test, this was a problem that occurred when CNN was used directly as a feature extractor.

To solve the two problems mentioned above, two networks are added to the existing model. The first is a CVAE network that learns game screen features separately aside from the policy network. The other is a CNN network that uses the history of previous actions as the input state.

Code for making the action history
Model architecture for using the action history and CVAE

The network code is available from here.

The results of reconstructing the game screen using the learned CVAE network are as follows. It seems like that most game screen information could be trained.

Reconstructed image from CVAE

Below is the result of the training loss. It was confirmed that both Policy loss and CVAE loss decreased as training progressed.

Training result when using the action history as input

As a result of the evaluation using the weight of the saved model, the agent can beat up the boss well even without the help of Reinforcement Learning.

Evaluation video when using the action history as input

Additionally, you can check the effect using action history as input for training by comparing it with the case with non-action history training.

Effect of using the action history as input for training loss
Evaluation video of not using the action history as input

Like an act 1 training result, the agent can learn the dash action, but it is not possible to get close to the boss at the same time as the action history case.

Conclusion

In this post, we learned about the basic methods of collecting data to clear all stages of the Sonic game through Deep Learning. It was confirmed that, unlike games such as Minecraft, it is essential to use action history as a state in order to perform complex actions of the Sonic game.

In the case of Sonic games, as the state level increases, more precise control is required. In the next post, we will check whether ChemicalPlantZone can be cleared using the existing method.

A Message from AI Mind

Thanks for being a part of our community! Before you go:

Sign up to discover human stories that deepen your understanding of the world.

Free

Distraction-free reading. No ads.

Organize your knowledge with lists and highlights.

Tell your story. Find your audience.

Membership

Read member-only stories

Support writers you read most

Earn money for your writing

Listen to audio narrations

Read offline with the Medium app

--

--

Published in AI Mind

Learn, explore, or build the future of AI with top stories on the latest trends, tools, and technology. Share your crazy success stories or AI-fueled fails in a supportive community.

Written by Dohyeong Kim

I am a Deep Learning researcher. Currently, I am trying to make an AI agent for various situations such as MOBA, RTS, and Soccer games.

No responses yet

Write a response