PPOL - A Machine Learning approach to crowd modeling
Published 2 years ago
1.2 K
N. Bisagno, N. Garau, A. Montagner


In the demo, we use ML-Agents to model a crowded environment in which agents navigate a simulated scenario containing realistic 3D models of obstacles and objects.
Each agent is provided with a neural-network brain, which has been trained using the Proximal Policy Optimization Reinforcement Learning algorithm provided by Unity. During the training phase, a reward system is used to allow agents to learn how reach their goal, simultaneously avoiding collisions with other people and objects, in a similar fashion to the Social Force model.


The training phase has been done in a Curriculum-fashioned way, increasing from time to time the number of agents in the scene, as well as the agents' speed and possible moving directions.
The training took 5500000 iterations (<6 hours) for each brain in order to avoid collisions with other agents.
On the following link you can find a visual comparison of our method with existing solutions for crowd modeling, such as:
  • Heuristic models
  • Agent-based models such as the Social Forces model
which do not exploit machine learning techniques, but are based on hard-coded rules.


The training parameters that showed a good fit for our training phase are the following:
### General parameters max_steps = 5e10 ##5e5 # Set maximum number of steps to run environment. run_path = "ppo" # The sub-directory name for model and summary statistics load_model = False # Whether to load a saved model. train_model = True # Whether to train the model. summary_freq = 10000 # Frequency at which to save training statistics. save_freq = 50000 # Frequency at which to save model. env_name = "ObstacleCurriculum" # Name of the training environment file. curriculum_file = None ### Algorithm-specific parameters for tuning gamma = 0.99 #0.99 # Reward discount rate. lambd = 0.95 ##0.95 # Lambda parameter for GAE. time_horizon = 2048 ##2048 # How many steps to collect per agent before adding to buffer. beta = 1e-3 ##1e-3 # Strength of entropy regularization num_epoch = 5 ##5 # Number of gradient descent steps per batch of experiences. num_layers = 2 ##2 # Number of hidden layers between state/observation encoding and value/policy layers. epsilon = 0.2 ##0.2 # Acceptable threshold around ratio of old and new policy probabilities. buffer_size = 4096 ##2048 # How large the experience buffer should be before gradient descent. learning_rate = 3e-4 # Model learning rate. hidden_units = 128 ##64 # Number of units in hidden layer. batch_size = 32 ##64 # How many experiences per gradient descent update step. normalize = False ### Logging dictionary for hyperparameters hyperparameter_dict = {'max_steps':max_steps, 'run_path':run_path, 'env_name':env_name, 'curriculum_file':curriculum_file, 'gamma':gamma, 'lambd':lambd, 'time_horizon':time_horizon, 'beta':beta, 'num_epoch':num_epoch, 'epsilon':epsilon, 'buffe_size':buffer_size, 'leaning_rate':learning_rate, 'hidden_units':hidden_units, 'batch_size':batch_size}

Additional Features

Additionally, we also decided to include these two interesting features to the project (Figure 2):
  • The modeling of an advanced camera model, which can reproduce a variety of features, such as zoom, focal length, PTZ movement, chromatic aberration, lens distortion and sensor noise. By integrating OpenCV code in Unity, we are thus able to track each person's movement in the scene.
  • Unity ML-Agents monitors, to better visualise some aspects of the agents' behaviour, such as their distance from the goal

Conclusions and Future Work


This demo shows how easy can be to train a brain in Unity using ML-Agents to model a simple crowd behaviour. However, this method does not model a proper human behavioural model, because it relies on a fully unsupervised learning.
The obtained crowd flow is not an exact representation of a realistic crowd environment. Nonetheless, agents in the scene move in a pedestrian-like fashion, avoiding each other and obstacles, as according to the social zones.

Future Work

We are currently working on two main developments:
Improving the current situation for PPO-based crowd modeling, introducing continuous actions instead of discrete, better tuning the training parameters and giving more time for the the learning phase
Embedding the Social LSTM implementation into Unity, instead of the PPO, thus relying on a supervised method, which can lead to a proper modeling of human behaviour in crowded environments


YouTube Link (Video 1):
GitHub repository: