Here's my entry to the Unity ML Agent Challenge. It's an attempt to auto-solve a very simplified version of the puzzle game Buttons!! I published some years ago on the apple app store. As I'm a mere hobbyist, I lacked the time to update the project when it was required - so it currently is not available for iOS11. However it might be available to older versions.
I plan to relaunch it sometimes this year or next with some of the graphics used here in the challenge.
You can see the trained agent in action here:
The goal of the game is to move the round button with the orange circle = actor (Button Nr 3) to the orange highlighted target platform (target).
To move things around theres two types of buttons: turn-buttons = turn (Button Nr 1) and blank-buttons (the aforementioned actor (Button Nr 3) and a non orange helper (Button Nr 2)
If a turn-button is pressed, all adjacent and unpressed buttons move around the turn-button clock-wise. A pressed button will be jumped over.
if a blank-button is pressed, all adjacent and pressed buttons are 'unpressed'
to press a button hit the corresponding number on the key-pad
The plan to train the MLAgent is to start with a very simple board, where the actor and target aren't too far apart. The higher the lessons further apart the buttons are placed.
Here you can see the training graphs for a simple board with 3 lessons: the graphs look like a textbook example of curriculum learning :)
Findings, Thoughts and Questions
It seemed the learning stopped at around 3'000'000 steps, no matter how many lessons I would introduce. Can anyone tell me, whether thats a problem of the vanishing gradient?
And if its possible to overcome with different learning rates?
Also my intuition would be to at least partly reset the learning rate with every lesson - but maybe that would break everything?
the reward to keep button nr3 close to button-nr2 should not be done related to the distance on the board, but in hops to each other (as they can only move clockwise) - I will improve this in the future
I wasn't really sure wether increasing memory would help and what it does exactly. I didn't find any documentation about it - nor any other examples that use it. Does anyone have an idea about memory and memory size? Is it comparable to the sequence size in LSTMs and GRUs or am I confusing things here?
Would my whole setup benefit from having parallel agents learning like in the balancing example?
Where to go from here
I intended to increase the board size (set lessonNr to 4 in the Academy to get the idea) and thus have even bigger distances between the actor and the target. However as you can see on those graphs below, everything went pair shaped when the new board was introduced. One theory I have, is that the turn-button2 never was used in the training before and the weights for it were completely neglected. And as the learning rate continually lowers, it wasn't enough to pick it up in later stages. If you have other ideas, or see anything else in my setup that might cause it, please let me know.
The original puzzle (and the one I plan to relaunch) had many more levels, bigger and more complicated boards and more types of buttons with different actions. As I see how the ml-agent struggles with my little board here, I wonder, whether my process is fundamentally flawed and if anyone has an idea, what I could improve to get past those baby steps here?
To try it out
Follow the instructions on the Github page and let me know if anything is broken. I had to fiddle with some of the bigger files due to Githubs size restrictions.
And let me know if you have any thoughts on my musing above.