Playing MOBA game using Deep Reinforcement Learning — part 2
In the last post, we learn how to train a simple MOBA game using Deep Reinforcement Learning. In this post, I am going to explain what we need to know before applying the same method to the Dota2.
Code for this post can be found here: https://github.com/kimbring2/MOBA_RL/blob/main/dota2/env_test.py
You just need to run the DotaService and that code together at same PC.
Training Environment
Unlike Derk training, each headless environment of Dota2 requires more than 1GB of RAM memory. Therefore, it is better to use a separate PC for running only environment because DRL training is usually better when there are many environments. Of course, GPU is unnecessary for that PC because we do not run the training code here.
For a PC of the Seed RL, GPU is needed we are going to use the TensorFlow here. Communication between the DotaService and Seed RL is possible through socket communication if we know the IP address under the same router case.
Network Structure
The first difference of Dota2 from Derk is that each hero has it’s own unique property according to ability, status. This means that we need to make different code for each hero.
In the Shadow Fiend case, there are a total of 4 non Passive type abilities. Therefore, action network for ability has 4 output.
On the other hand, network of the Omniknight just needs 3 output.
In the case of observation network, there is no need to change them because all hero are under same condition for that.
Managing Item and ability
Second, Dota2 here can choose various item and ability during game time.
Furthermore, each item has different target and active method.
For example, the Tango is most basic item can be purchased at the store when start of game. Hero can use it on one of the near tree to regenerate the health.
The hero can purchase and use Tango items like a below video.
Each ability also has different target and active method.
The Shadowraze is basic ability of Shadow fiend hero. That ability does not require the target. Instead, hero can cast it by distance to the enemy
It would best to use that ability when an enemy hero or creep is within range of it like a below video.
The hero of Dota2 can upgrade low level items to high level one by using recipe system.
The video below shows how to obtain the magic wand from recipe.
Unlike the Derk game, where map size small, the distance of Dota2 between starting and battle point are long.
Therefore, the hero should use the Town Portal Scroll to join and exit from battle quickly. Hero receives one TP scroll at the beginning of the game, and it must be purchased from the store after using.
Below video shows how to come back from battle point to starting point quickly using the TP scroll.
The Town Portal Scroll is usually used to escape from an emergency situation. For quick item buying without moving to starting point every time, hero can use Courier to deliver items.
If hero isn’t around the store and you buy an item, it’s stored in stash. The Courier can either retrieve the item here or buy the item instead at the secret shop and give it to the hero.
Below video shows how the hero at battle point obtains an item without moving to the starting point using the Courier.
In MOBA games, there is a heroes who is mainly in charge of attack, and other heroes with support ability can assists them.
For example, Omniknight hero has ability for recovering HP of same team to fight well.
The following video shows an example of recovering HP of same team hero.
Conclusion
In this post, we see how to use the function related to the item and ability of Dota2. In the next post, I will explain how to use such function with previous Deep Reinforcement Learning together.