Update: Added Pass the Butter Robot, see bottom of article
My goal is to train a robotic arm to make pancakes. As a first test, curriculum learning was used to get the arm to toss a pancake onto a plate. My motivation for the project is my complete lack of cooking ability
Initially, I connected three rigidbody limbs with configurable joints. All components used gravity, and the machine learning system could apply torque to each joint.
The first reward system was simple - a small reward was given for every frame in the session, and the session ends when the pancake hits the floor. I thought this would incentivize the algorithm to keep the pancake in the pan as long as possible
What it actually did was try to fling the pancake as far as it possibly could, maximizing its time in the air. While it would have achieved more total points by keeping the pancake in the pan, it seemed to have gotten itself stuck in this local minimum. Score - PancakeBot: 1, Me: 0.
I revised my reward function to give points based on how close the pancake is to the center of the pan. A small reward was still given for each frame the pancake remained in the session, and the session ended when the pancake hit the floor.
This time the pancake bot did a great job at keeping the pancake in the pan until the end of the session! The movement is hilariously noisy, but improved by tuning physics settings (increasing friction, angular drag, etc).
In order to get the pancakebot to perform more complex tasks, I realized I had to design the input actions more like a real robotic arm. I made the limbs kinematic, and rotations are applied to each joint.
** Side note: For anyone implementing a system like this, I found that I had to use Rigidbody.MoveRotation on FixedUpdate; if I rotated the limbs normally, the pan teleported around, and the pancake fell straight through the colliders - See BreakfastAgentRotate.cs on github
I was getting good results training the bot to land the pancake onto a stationary target (the plate), but was having difficulty hitting a moving target. The error increased, but curriculum learning made it possible to generalize to different target locations. First, the robotic arm learns to land the pancake on a stationary target. Once the system achieves a reward above 60, the target is placed at a random location each session, a small amount from the initial target (0.4 meters in x and z). Once the reward reaches 60 again, the distance the target can move is increased to 0.8 meters, and then to 1.2 meters.
Next up, making another robot whose purpose is to pass the butter:
Pass the butter
The butter robot starts at one end of the counter, and must travel to the plate. The bot can only swivel around the y-axis, and move forward (it can reverse if the action is negative). Obstacles are randomly placed on the counter, and the placement function ensures that obstacles are not too close together, so they do not completely block the robot’s path.
The ml model knows the position of the plate, but it does not know the positions of any of the obstacles - it uses 9 raycasts originating from the “camera” on its head, and returns 1 if the raycast hits an obstacle, and 0 otherwise. Each raycast extends less than a meter from the robot, keeping it’s knowledge of the scene fairly limited
+2 if the butter reaches the plate (and the session ends)
-2 if the bot hits the floor (and the session ends)
-0.05 for every frame (to encourage the bot to reach the goal quickly)
-0.15 if the bot hits an obstacle
+0.075 * a small reward for how close the bot is to the goal (to help guide it to the larger reward of reaching the plate)
I had originally set the last part of the reward too high (for getting close to the goal). The robot would get as close as possible to the plate and stop, to prevent the session from ending and continue racking up points. Outsmarted again!
After balancing the reward function, the model performed quite well!
Next, I’ll update the scripts to use curriculum learning to progressively add more obstacles.
I’m not sure what the functionality of the observation camera is, but I’d love for the whole system to be run by a camera input, rather than using raycasts, and knowing the position of the goal outright - btw, I’m calling this omniscient knowledge of the scene “easy mode” for the rest of the post
Rather than just feeding in a pixel array and asking the system to train itself to reach the goal, it would be interesting to run the system in “easy mode,” and once the model is well trained, record pairings of the camera feed with the instructions that the model was trained to perform at that moment in time. Instead of just giving the model a camera feed input, and rewarding it when it reaches the goal, you could input the camera feed, and reward the model when it performs the same action as the “easy mode” model. Would love if anyone had advice on how to integrate a convolutional neural network into the ml-agents plug-in, or would want to collaborate on something like this!