Project is at: https://github.com/XBLDev/Unity3dMLAgentChallenge1
At the beginning I tried to train the drone to fly over the wall in the same way the sample trains the block to jump over the wall: gradually increasing the height of the wall(Curriculum).
It didn't work, at least no obvious mean reward increase for a very long time.
The drone/RL Agent in my case has 5 actions: increase the forces applied to the 4 corners of it to simulate the effect of propellers, and the last action is just simply not doing anything since once it flies over the wall I want it to land on a goal that's on the ground, which requires dropping itself/doing nothing.
The movements generated by these 5 actions are far more complex than the block which only jumps/moves in 4 directions and has no rotation. And the result of that complexity is that, it never reaches the goal on the other side, or even goes over the wall, therefore curriculum, whose goal is to let the agent achieve easier goals first then increase the difficulty, in this case is useless because the agent will take forever to fly to the goal on the other side, even with wall height being 0.
Eventually I found a very hacky way to make it fly over: instead of increasing the height of the wall, I changed the position of the goal, from very close to the drone to on top of the wall to the other side of the wall, with x position randomized within a range.
This way because the drone can reach the goal easily at the beginning, it can reach goals whose distances from the drone increase gradually over curriculum training.
Behaviorally with the stimuli in the environment changing gradually, the agent can first behave based on what it already learned from the previous lesson, from there choose actions that may or may not receive positive reward, and changing the weight of certain actions when exposed to certain stimuli based on the result of that action, which eventually will lead to the drone always choosing the actions that take itself to the goal.
The downside is that it takes too long to train: it takes about 500000 steps to just pass lesson 1, where the goal is really close to the drone. I'd like to find a more efficient way to train it. Also the drone is provided with "way points" during training, which means if given just a slightly different environment(the wall higher than this environment), it's very likely what it learns from the training will stop work.