As you can see this experiment's video has two types of agents: workers push the ball and aggressors attack them. I was inspired by SoccerTwos. The agents of the two types have different observation and action. They had to be taught separately from each other. First I teached the worker, and it wasn't effectively! So I decided to simplify the scene. In the beginning, I take away aggressors, then I take away the other workers. Finally there was one worker agent left so it would able to move ball to goal alone. But it turned out that the more effective scene was: two worker agents from two opposite teams stayed to learn overnight ;) In the latest version the trained model worker have littel glitch: it sometimes stays near top border.
I wrote earlier in this post that the aggressor's training had good start. I was wrong. The aggressors attacked trained workers very bad, despite that they are similar to the SoccerTwo's agent. To overturn workers was a very difficult goal for this agent. The aggressor was sometimes highly accelerated and bumped into the crowd of workers, but this wasn't a good result. Also imitation learning didn' help. In addition, I start to get tired of this project and decided to find an easier solution. After that, the aggressor got two trigger at his endings. The triggers force workers to jump and to overturn, when they contact the aggressor's trigger. I simplify the scene a bit (the workers slower run). After two days my agent learned to find and attack workers. Now the battle for something looked like a real battle! Judging by the tensorboard schedule, max_steps could be greatly reduced, but after a successful result I didn't want to experiment anymore.