Notifications
Article
Shooter
Published 9 months ago
187
0
My objective was to train a ML-Agent to be able to hit targets that would be randomly spawned in the field.
I was not able to get the results that I was expecting. However, it was a very pleasant experience and made me really want to learn more about Machine Learning.

The Agent had 3 types of actions
1 - Rotate around the X axis
2 - Rotate around the Y axis
3 - Shoot

I tried a lot of different combinations of capturing the states of the game, but in the end I was using a total of 70 states that were normalized.
The states included the rotation of the weapon, if a shoot was given, information about the target and the bullets shoot.

I tried to play a lot with the hyperparameters and I also had to make a lot of experiencing with how I should do the reward.

About the hyperparameters I am not knowledgeable in the area to make good predictions of what I could have done better, but in the rewards I think I was able to make some guesses.

My initial idea was to make a penalty for each ball missed and reward each target destroyed. However, I think that maybe because of the difference in time between the action of shooting and the bullet hitting a target or missing was too long, so the Agent was not being able to correlate the action with the reward/penalty and it was taking too long to learn something.

So I decided to make the penalty in the moment the shoot was given and give a better reward for hitting a target. But that led the Agent to think that was better to don't shoot as in the beggining he will miss most of his shoots.

As I wanted him to at least start hitting the targets I decided to remove the penalty for shooting, giving a very small penalty each step and ending the episode when he reached 20 points. That was the scenario that I used in the training bellow:

I used the Curriculum Learning to achieve these results, I started using a fixed position for the targets and after that introduce the movement to the targets, but my Agents were not able to get to the lessons where the targets starts to move.
This was my curriculum file
{ "measure" : "reward", "thresholds" : [10 , 10 , 10 , 10 , 10 , 10 , 10 , 10 , 10 , 10 , 10 , 10 , 10 , 10 , 10 , 10 , 10 , 10 , 10 , 10 , 10 , 10 , 10 , 10 , 10 , 10 , 10 , 10 , 10 , 10 , 10 , 10 , 10 , 10 ], "min_lesson_length" : 10, "signal_smoothing" : true, "parameters" : { "MaxTargetsInScene": [1.0, 1.0, 1.0, 1.0, 1.0, 1.0 , 1.0 , 1.0 , 1.0 , 1.0 , 1.0 , 1.0 , 1.0 , 1.0 , 1.0 , 1.0 , 1.0 , 1.0 , 1.0 , 1.0 , 1.0 , 1.0 , 1.0 , 1.0 , 1.0 , 1.0 , 1.0 , 1.0 , 1.0 , 1.0 , 1.0 , 3.0 , 5.0 , 7.0 , 10.0 ], "SpawningRandomXMin": [0.0, 0.0, 0.0, 0.0, 0.0, 0.0 , 0.0 , 0.0 , -1.0, -2.5, -3.5, -5.0, 0.0 , 0.0 , 0.0 , 0.0 , -5.0, -2.0, -3.0, -4.0, -5.0, -5.0, -5.0, -5.0, -5.0, -5.0, -5.0, -5.0, -5.0, -5.0, -5.0, -5.0, -5.0, -5.0, -5.0], "SpawningRandomXMax": [0.0, 0.0, 0.0, 0.0, 0.0, 0.0 , 0.0 , 0.0 , 1.0 , 2.5 , 3.5 , 5.0 , 0.0 , 0.0 , 0.0 , 0.0 , 5.0 , 2.0 , 3.0 , 4.0 , 5.0 , 5.0 , 5.0 , 5.0 , 5.0 , 5.0 , 5.0 , 5.0 , 5.0 , 5.0 , 5.0 , 5.0 , 5.0 , 5.0 , 5.0 ], "SpawningRandomYMin": [0.0, 0.0, 0.0, 0.0, 0.0, 0.0 , 0.0 , 0.0 , 0.0 , 0.0 , 0.0 , 0.0 , 0.0 , 0.0 , 0.0 , 0.0 , 0.0 , 0.0 , 0.0 , 0.0 , 0.0 , 0.0 , 0.0 , 0.0 , 0.0 , 0.0 , 0.0 , 0.0 , 0.0 , 0.0 , 0.0 , 0.0 , 0.0 , 0.0 , 0.0 ], "SpawningRandomYMax": [0.0, 0.0, 0.0, 0.0, 0.0, 0.0 , 0.0 , 0.0 , 0.0 , 0.0 , 0.0 , 0.0 , 1.0 , 2.5 , 3.5 , 5.0 , 5.0 , 1.0 , 2.0 , 3.5 , 5.0 , 5.0 , 5.0 , 5.0 , 5.0 , 5.0 , 5.0 , 5.0 , 5.0 , 5.0 , 5.0 , 5.0 , 5.0 , 5.0 , 5.0 ], "SpawningRandomZMin": [3.5, 4.0, 4.5, 5.0, 3.5, 3.5 , 3.5 , 3.5 , 10.0, 10.0, 10.0, 10.0, 10.0, 10.0, 10.0, 10.0, 10.0, 5.0 , 5.0 , 5.0 , 5.0 , 5.0 , 5.0 , 5.0 , 5.0 , 5.0 , 5.0 , 5.0 , 5.0 , 5.0 , 5.0 , 5.0 , 5.0 , 5.0 , 5.0 ], "SpawningRandomZMax": [3.5, 4.0, 4.5, 5.0, 5.0, 6.5 , 8.5 , 10.0, 10.0, 10.0, 10.0, 10.0, 10.0, 10.0, 10.0, 10.0, 10.0, 7.0, 10.0, 15.0, 20.0 , 20.0 , 20.0 , 20.0 , 20.0 , 20.0 , 20.0 , 20.0 , 20.0 , 20.0 , 20.0 , 20.0 , 20.0 , 20.0 , 20.0 ], "SpawningRandomSpeedXMin":[0.0, 0.0, 0.0, 0.0, 0.0, 0.0 , 0.0 , 0.0 , 0.0 , 0.0 , 0.0 , 0.0 , 0.0 , 0.0 , 0.0 , 0.0 , 0.0 , 0.0 , 0.0 , 0.0 , 0.0 , -1.0 , -2.0 , -2.0 , -2.0 , -2.0 , -2.0 , -2.0 , -2.0 , -2.0 , -2.0 , -2.0 , -2.0 , -2.0 , -2.0 ], "SpawningRandomSpeedXMax":[0.0, 0.0, 0.0, 0.0, 0.0, 0.0 , 0.0 , 0.0 , 0.0 , 0.0 , 0.0 , 0.0 , 0.0 , 0.0 , 0.0 , 0.0 , 0.0 , 0.0 , 0.0 , 0.0 , 0.0 , 1.0 , 2.0 , 2.0 , 2.0 , 2.0 , 2.0 , 2.0 , 2.0 , 2.0 , 2.0 , 2.0 , 2.0 , 2.0 , 2.0 ], "SpawningRandomSpeedYMin":[0.0, 0.0, 0.0, 0.0, 0.0, 0.0 , 0.0 , 0.0 , 0.0 , 0.0 , 0.0 , 0.0 , 0.0 , 0.0 , 0.0 , 0.0 , 0.0 , 0.0 , 0.0 , 0.0 , 0.0 , 0.0 , 0.0 , -1.0 , -2.0 , -2.0 , -2.0 , -2.0 , -2.0 , -2.0 , -2.0 , -2.0 , -2.0 , -2.0 , -2.0 ], "SpawningRandomSpeedYMax":[0.0, 0.0, 0.0, 0.0, 0.0, 0.0 , 0.0 , 0.0 , 0.0 , 0.0 , 0.0 , 0.0 , 0.0 , 0.0 , 0.0 , 0.0 , 0.0 , 0.0 , 0.0 , 0.0 , 0.0 , 0.0 , 0.0 , 1.0 , 2.0 , 2.0 , 2.0 , 2.0 , 2.0 , 2.0 , 2.0 , 2.0 , 2.0 , 2.0 , 2.0 ], "SpawningRandomSpeedZMin":[0.0, 0.0, 0.0, 0.0, 0.0, 0.0 , 0.0 , 0.0 , 0.0 , 0.0 , 0.0 , 0.0 , 0.0 , 0.0 , 0.0 , 0.0 , 0.0 , 0.0 , 0.0 , 0.0 , 0.0 , 0.0 , 0.0 , 0.0 , 0.0 , -1.0 , -2.0 , -2.0 , -2.0 , -2.0 , -2.0 , -2.0 , -2.0 , -2.0 , -2.0 ], "SpawningRandomSpeedZMax":[0.0, 0.0, 0.0, 0.0, 0.0, 0.0 , 0.0 , 0.0 , 0.0 , 0.0 , 0.0 , 0.0 , 0.0 , 0.0 , 0.0 , 0.0 , 0.0 , 0.0 , 0.0 , 0.0 , 0.0 , 0.0 , 0.0 , 0.0 , 0.0 , 1.0 , 2.0 , 2.0 , 2.0 , 2.0 , 2.0 , 2.0 , 2.0 , 2.0 , 2.0 ], "MaxShootsAlive": [0.0, 0.0, 0.0, 0.0, 0.0, 0.0 , 0.0 , 0.0 , 0.0 , 0.0 , 0.0 , 0.0 , 0.0 , 0.0 , 0.0 , 0.0 , 0.0 , 0.0 , 0.0 , 0.0 , 0.0 , 0.0 , 0.0 , 0.0 , 0.0 , 0.0 , 0.0 , 50.0 , 40.0 , 30.0 , 20.0 , 20.0 , 20.0 , 20.0 , 20.0 ] } }
This is the final result

Guilherme
Programmer
1
Comments