Article
2D Missile Dodger
Updated a year ago
776
1
Attempt at Training a 2D Spaceship at Dodging Waves of Missiles with Curriculum-Based Reinforcement Learning

## Introduction

I recently became interested in experimenting with machine learning so I decided that this would be a great way to get my hands dirty. I went with a dodging game where the goal is to survive for a certain amount of time while waves of missiles will auto track and try to destroy you.

## Setup

After initial setup of the game mechanics, I then implemented the ml-agents framework into project. Actions are a simple 3 discrete choices: Nothing, Left Rotation, Right Rotation. The nothing action was given a small reward to try and avoid jitteriness. For state went with a continuous type of 9: the normalized rotation of the ship and of the closest in range missile, the distance between them in 2d space, and the academy parameters.
There are four avenues for reward. The two main stimuli are survival and destruction with a +10 reward for getting to the goal survival time (as this is the hardest to achieve and it's best to go for positive reinforcement) and -1 for being hit by a missile. The other two were meant to be more subtle in order to push the spaceship in the right dodging mentality. I gave it an increasing reward based on staying alive and not being tracked:
``reward = Mathf.Clamp(0.01f * _timeSinceLastTracked, 0.01f, 0.04f);``
And then an increasing punishment based on proximity to ship and length of tracking:
``reward = Mathf.Clamp(-0.03f * (1f / _distanceFromTargetSqr) * _timeTracking, -0.03f, -0.01f);``

## Curriculum Training

The curriculum plan is based on four parameters: the length of goal timer, the amount of missiles spawned per wave, their max speed, and the fuel amount before self-destructing and spawning a new wave. Most were incremented for each lesson except for speed in order to accommodate the much increased complexity of the missile amounts.
Hyperparameters were kept mostly the same as the ppo defaults except for some tweaks here and there:
After many iterations this is the result of around one hour of curriculum training:
You can see the initial bump due to the easiness of the first couple lessons and then the dip after the missiles start increasing in number. What is interesting is the gradual climb afterwards as it begins to adapt and then hits another lesson.
The rest of the TensorBoard stats:
After one hour of training:

## Conclusion

Although the agent manages to do some impressive dodges on its own it still feels very random at times and can sometimes be just straight up suicidal. I unfortunately didn't have the time to really figure out the best approach to get it to survive any swarm I throw at it. It seems like a lot of determining the correct training is still partly guesswork combined with patience. For instance I'm not sure if some of the problems lie in the state, figuring out better rewards, some other esoteric issue, or a combination of all three. I do think one problem with my approach is that I decided to constantly search for the nearest missile in range. Besides the performance hit this adds I feel like it would be impossible for the ship to survive against a massive swarm of missiles due to not knowing the other positions of inbound missiles that are yet to be in closer proximity. However, it doesn't seem possible at the moment to have a dynamic range to your state to accommodate this.

## Future improvements/changes

• Goal area instead of time.
• Variety of missile types and waves.
• Spaceship health levels, better flight mechanics.
• Special effects (time dilation on successful dodge, screen shake on hit).
• Fine-tuned hyperparameters, rewards and state.

## Lessons learned

• Watch out for time dependent functionality when messing with timescale. Things at a normal timescale might not work the same as at a timescale of 100.
• Better to ramp up quickly in curriculum lessons. Let's you know when things aren't working correctly.
• Do an initial pass with inference to make sure it is working as intended. Do not just depend on the TensorBoard stats.
• AI will find avenues to gain rewards unconventionally. Thinking that it will behave a certain way based on your initial ideas of punishment and reward will lead to folly and tears.
Ultimately I had a great time working on this and look forward to what the future will bring with the combination of ML and game development.
Link to Github Repo:
https://github.com/paulmg/Unity-ML-Challenge

Paul Graffam
Senior Interactive Developer - Programmer
1