Updated a year ago
Kivi ( Platform Runner with ML-Agents )
This is a small pet project aiming to teach an agent to complete a track and collect collectibles.( The plan also included Obstacles and vs. Player mode but couldn't be implemented due to lack of time.)
The following video demonstrates the learned agents completing the empty track .
( It might look confusing but the block moves and not the track ) :


  • Teach the agent to stay on the platform.
  • Introduce and train Collectible Collection.
  • Introduce and train Obstacle Avoidance.(Not Achieved)
  • Have a human player compete against it.(Not Achieved)


The Brain receives 3 state based information :
  1. X - Coordinate of Agent.
  2. X - Coordinate of Collectible.
  3. Z - Coordinate of Collectible.


The Agent was initially allowed 2 possible actions , moveLeft or moveRight but I later realized that doing nothing was also a valid move for the agent so I added a third action doNothing. This introduced a new problem where the agent would stay constant for the most part and ignore Collectibles( seeing them as risk ). To tackle this I Introduced a small punishment for spamming doNothing.

Reward Condition

The Agent is rewarded under the following conditions :
  • Agent reaches the end of the track.
  • Agent collects a Collectible.
  • Agent stays on the track.

Punishment Condition

The Agent is punished under the following conditions :
  • Agents falls off the background.
  • Agent misses a collectible (punishment increases for each consecutive miss) .
  • Agent plays safe (Takes no action).


  • The Model was first trained and tested with no collectibles to ensure feasibility.
  • Hyper parameters were adjusted to improve performance.
  • Curriculum was used here to first teach to agent the simplest task, that is, to stay on and finish the track.This was followed by introduction of collectibles. Further Curriculum included increase in the length of the track to be covered along with frequency of Collectibles.
  • The training was performed on a GTX 1080.


Following is the Tensorboard of the final Curriculum training.


The Agent learns staying on track pretty well, but has trouble understanding collectibles.There is definitely a huge scope of improvement and high potential to be a fun game to play. An improvement in reward function or more fine tuned Hyper Parameters may help in training. I'll continue development on this in the time to come. Here is the Final Video of the Agent Playing.
Github Repo :
Prabhav Bhatnagar
Student AI Researcher / Aspiring Game Dev - Student