At first, the curriculum file was not defined as above. The first max distance was set to 1.1 equal to the min distance and threshold set to 10.
After a while I realized that it may over-fit the spaced distance of boxes. Then the first max distance was set to 1.2 and threshold set to 5.
After a long time training the lesson was not switching. I checked issues of GitHub and got that academy needs to be set done or max steps not 0. Then the academy was set to done when game over.
Long training process followed...
Brief Training Process - Model#2
The model above performed not well. I began thinking how to simplify the model to make it learn easier.
I did the followings to speed up the training:
Reconstruct the scene to make multiple games run at the same time
reduce state size to 2 including the distance from next box to player and the size of next box
remove random direction for training. As the input is distance now, the direction doesn't matter
In this model, the reward system is +0.5 when the player jumps to the next box, +1 when the player jumps to the center of the next box, -1 when the player jump to ground.
After 3 hours training, the results is quite amazing as the gif above. But there is a problem is that the players try to jump by a very small step to stay on the same box to avoid penalty -1 of jumping to ground.
Brief Training Process - Model#3
The problem that players try to jump a very small step is the wrong corresponding relation with states and reward. The Frame to Skip is set to 30 before and this is not enough. The player may be still in the air in the next 30 frames.
I tried to unsubscribe and resubscribe to brain and this is now working with multiple game instances running at the same time.
At last, after fixing the bugs of the game, I got a best Frame to Skip value and here is the result.
The agent's max steps value is set to 100 and the max cumulative_reward is 100. Mean reward got 70 after 200K steps.
Step: 349000. Mean Reward: 88.70098911968347. Std of Reward: 10.67634076995865.
Step: 350000. Mean Reward: 89.22838773491591. Std of Reward: 9.210647049058082.
Step: 351000. Mean Reward: 89.44072978303747. Std of Reward: 12.138629221972357.
Thank Unity for giving the opportunity to take a challenge like this.
The projects here are all excellent and I learnt a lot of Unity and Machine Learning.