1st Prize
Pass the Butter // Pancake bot
Updated 3 years ago
21.6 K
What is my purpose?

Update: Added Pass the Butter Robot, see bottom of article

My goal is to train a robotic arm to make pancakes. As a first test, curriculum learning was used to get the arm to toss a pancake onto a plate. My motivation for the project is my complete lack of cooking ability
Initially, I connected three rigidbody limbs with configurable joints. All components used gravity, and the machine learning system could apply torque to each joint.
The first reward system was simple - a small reward was given for every frame in the session, and the session ends when the pancake hits the floor. I thought this would incentivize the algorithm to keep the pancake in the pan as long as possible
What it actually did was try to fling the pancake as far as it possibly could, maximizing its time in the air. While it would have achieved more total points by keeping the pancake in the pan, it seemed to have gotten itself stuck in this local minimum. Score - PancakeBot: 1, Me: 0.
I revised my reward function to give points based on how close the pancake is to the center of the pan. A small reward was still given for each frame the pancake remained in the session, and the session ended when the pancake hit the floor.
This time the pancake bot did a great job at keeping the pancake in the pan until the end of the session! The movement is hilariously noisy, but improved by tuning physics settings (increasing friction, angular drag, etc).
In order to get the pancakebot to perform more complex tasks, I realized I had to design the input actions more like a real robotic arm. I made the limbs kinematic, and rotations are applied to each joint.
** Side note: For anyone implementing a system like this, I found that I had to use Rigidbody.MoveRotation on FixedUpdate; if I rotated the limbs normally, the pan teleported around, and the pancake fell straight through the colliders - See BreakfastAgentRotate.cs on github
I was getting good results training the bot to land the pancake onto a stationary target (the plate), but was having difficulty hitting a moving target. The error increased, but curriculum learning made it possible to generalize to different target locations. First, the robotic arm learns to land the pancake on a stationary target. Once the system achieves a reward above 60, the target is placed at a random location each session, a small amount from the initial target (0.4 meters in x and z). Once the reward reaches 60 again, the distance the target can move is increased to 0.8 meters, and then to 1.2 meters.
Next up, making another robot whose purpose is to pass the butter:

Pass the butter

The butter robot starts at one end of the counter, and must travel to the plate. The bot can only swivel around the y-axis, and move forward (it can reverse if the action is negative). Obstacles are randomly placed on the counter, and the placement function ensures that obstacles are not too close together, so they do not completely block the robot’s path.
The ml model knows the position of the plate, but it does not know the positions of any of the obstacles - it uses 9 raycasts originating from the “camera” on its head, and returns 1 if the raycast hits an obstacle, and 0 otherwise. Each raycast extends less than a meter from the robot, keeping it’s knowledge of the scene fairly limited

Reward function:

  • +2 if the butter reaches the plate (and the session ends)
  • -2 if the bot hits the floor (and the session ends)
  • -0.05 for every frame (to encourage the bot to reach the goal quickly)
  • -0.15 if the bot hits an obstacle
  • +0.075 * a small reward for how close the bot is to the goal (to help guide it to the larger reward of reaching the plate)
I had originally set the last part of the reward too high (for getting close to the goal). The robot would get as close as possible to the plate and stop, to prevent the session from ending and continue racking up points. Outsmarted again!
After balancing the reward function, the model performed quite well!

Next steps

Next, I’ll update the scripts to use curriculum learning to progressively add more obstacles.
I’m not sure what the functionality of the observation camera is, but I’d love for the whole system to be run by a camera input, rather than using raycasts, and knowing the position of the goal outright - btw, I’m calling this omniscient knowledge of the scene “easy mode” for the rest of the post
Rather than just feeding in a pixel array and asking the system to train itself to reach the goal, it would be interesting to run the system in “easy mode,” and once the model is well trained, record pairings of the camera feed with the instructions that the model was trained to perform at that moment in time. Instead of just giving the model a camera feed input, and rewarding it when it reaches the goal, you could input the camera feed, and reward the model when it performs the same action as the “easy mode” model. Would love if anyone had advice on how to integrate a convolutional neural network into the ml-agents plug-in, or would want to collaborate on something like this!
Christine Barron
3 months ago
This is cool work, thanks for sharing :)
Saurabh Saxena
a year ago
Founder of Delhi Technology Club
Amazing :) It would be awesome if its connected with a real arm robot .
agun domoro
a year ago
Property jakarta
wow, this is a technology that must be developed. I really like this kind of technology, hopefully in the future someone has made it @Christine Barron
Eli Mather
a year ago
This is Tremendous. Two thumbs up! I love the idea, the challenge, the description and the narrative. Great story!
Christine Barron
2 years ago
Abhimanyu AryanI need some help. Does anyone know his twitter handle/email or something?
Her*, and here is the github link: Wanted to mention that it will differ from the screenshots, because 3rd party assets had to be stripped out to be shared publicly. However the c# scripts should give you an idea of how it runs