A Practical Example of Unity ML Agents Usage in Pong Game
I wanted to work on a small but practical case of Unity ML Agents usage in a game due to shortage of time, where training agents on a computer with weak processing power takes a lot of it. So, I decided to create a Pong game which could use the outcome of the training as a rival player and provide a computer versus player experience.
You can reach the repository of the code and the playable game down below.
Training and Results
Firstly, for simplicity and as a proof of concept, agent (the paddle) collected only normalized y axis values of the paddle and the ball. In each training session ball is randomly thrown to right side of the scene. If it hits the paddle session ends with reward = 1, if it goes out of the play area session ends with reward = -1, that simple. The paddle had 3 actions to take, which were to go up, to go down or stay still. This approach proved to work quickly about 15e4 frames long training.
One down side of this simple approach was that the paddle followed the ball wherever it was, and this created a scripted, unbeatable opponent feeling and look. Additionally, the paddle was really shaky even though training went on for a 1e6 frames long training.
Before beginning the new trainings I started using curriculum learning to reach better results in shorter time. Only variable I changed is the size of the paddle and the curriculum had 3 lessons which scales the paddle down from 10 units to 4 units.
To reduce problematic results, several steps had to be taken. First, to enable agent to figure out when the ball is not moving towards it and when to the other side of the scene, direction of the ball have been registered to the collected states. Secondly, to let the agent know how close the ball is and change its speed accordingly, the x axis value of the ball is also registered to the collected states. Lastly to solve the shakiness problem and to complete the first two steps mentioned, staying still is encouraged by a small reward of 0.005.
These assumptions were purely hypothetical yet proved to be working to some extend. First thing I had to change was decreasing the reward value from 0.005 to 0.001, which taught the agent not moving at all is more profitable than moving. Then I tried different combinations of lastly added 3 states, best case came out of where I used all 3 of them. The paddle started to reduce its speed as the ball moved away, and the shakines cured hugely.
I expected training longer may solve the reducing accuracy problem and went for a 1e7 frames long training. It is proved if the model is flawed longer education does not make the agents do better. So I decided to examine Unity's own ML examples some more and improve the outcome.
One thing I noticed was that, in the examples changing variables of curriculum were also saved as states. I added one more variable to the curriculum which is ball speed and added both ball speed and paddle scale to the states. After this there were total 7 states. Also increased the number of lessons and reduced the change in the numbers per lesson, which helped me avoid sudden drops in the cumulative reward but let agent adapt slowly. This new model is trained for 2e6 frames long and it has got the best outcome among the all trainings results I got so far.
Finally, how I finished the project is:
2e6 frames long training with original hyperparameter values on PPO
7 states which are paddle size [0,1], ball speed [0,1], ball direction x [-1,1], ball direction y [-1, 1], paddle position y [-1, 1], ball position x [-1, 1], ball position y [-1, 1]
1 reward if ball hits the paddle, -1 reward if ball goes out of area, 0.001 reward if paddle choses to keep still
6 lessons in the curriculum, 2 changing variables: paddle scale and ball speed
To turn on the play mode of the game, simply mark the isTraining flag true and play. You can use W and S to move up and down, alternatively up and down arrows. The AI stays on the left side of the scene.
You can find further instructions about how to play the game from the project in repository.