I downloaded the free Space Shooter application from the asset store and modified it so that I could insert multiple instances of the game in one scene. I imported the package, ML-AgentsWithPlugin. Then I started experimenting with the Python Tensorflow kit and curriculum learning.
The first video shows the results with one brain, which was trained with the Jupyter notebook, PPO. The notebook has maximum number of steps set to 150,000. The agents run 5000 steps before the academy evaluates the results. The curriculum consists of 5 steps, where each lesson has one more target than the previous lesson. The first lesson starts with two targets. The last lesson has an enemy that fires at the player.
The reward algorithm rewards the player for firing at targets and for hitting targets. Also, the reward algorithm encourages the player to stay in the center of the game's boundaries as much as possible. The reward decreases as the player moves away from the center. Finally, the player is penalized when he is killed.
Notice in the first video that the score reaches approximately 1200 points after 5000 steps.
The second video shows the results with two brains, which were trained with the same Jupyter notebook, PPO. The second brain in this experiment is an Heuristic brain, which is attached to the player. This brain augments the weapons system.
Notice in the second video that the score reaches approximately 2300 points after 5000 steps.
The image of the first set of Tensorboard graphs was taken after the first experiment. Readings were taken every 1000 steps. The values appear jagged because every new wave of enemies placed them randomly, and this upset the brain's algorithm. As you can see, it recovers. The same problem occurred after every lesson when additional enemies were added to the wave.
The image of the second set of Tensorboard graphs was taken after the second experiment. It shows similar, jagged lines. The value graph starts at an initial, higher reward due to the additional brain's weapon system. The randomness of the attack waves upsets the algorithm, nonetheless, and the brain had to establish control over the wave. It took some time to recover.
The third image is a superimposed image of the first two images.
You can see the project at the Github repository, https://github.com/rmgalante/Unity-ML-Agents-Challenge.