Running Multiple Simulations
FLAME GPU 2 provides CUDAEnsemble
as a facility for executing batch runs of multiple configurations of a model.
Creating a CUDAEnsemble
An ensemble is a group of simulations executed in batch, optionally using all available GPUs. To use an ensemble, construct a RunPlanVector
and CUDAEnsemble
instead of a CUDASimulation
.
First you must define a model as usual, followed by creating a CUDAEnsemble
:
flamegpu::ModelDescription model("example model");
// Fully define the model
...
// Create a CUDAEnsemble
flamegpu::CUDAEnsemble ensemble(model);
// Handle any runtime args
ensemble.initialise(argc, argv);
model = pyflamegpu.ModelDescription("example model")
# Fully define the model
...
# Create a CUDAEnsemble
ensemble = pyflamegpu.CUDAEnsemble(model)
# Handle any runtime args
ensemble.initialise(sys.argv)
Creating a RunPlanVector
RunPlanVector
is a data structure which can be used to build run configurations, specifying; simulation speed, steps and initialising environment properties. These are a std::vector
of RunPlan
(which was introduced in the previous chapter), with some additional methods included to enable easy configuration of batches of runs.
Operations performed on the vector, will apply to all elements, whereas individual elements can also be updated directly.
It is also possible to specify subdirectories for a particular runs’ logging output to be sent to, this can be useful when constructing large batch runs or parameter sweeps:
// Create a template run plan
flamegpu::RunPlanVector runs_control(model, 128);
// Ensure that repeated runs use the same Random values within the RunPlans
runs_control.setRandomPropertySeed(34523);
{ // Initialise values across the whole vector
// All runs require 3600 steps
runs_control.setSteps(3600);
// Random seeds for each run should take the values (12, 13, 14, 15, etc)
runs_control.setRandomSimulationSeed(12, 1);
// Initialise environment property 'lerp_float' with values uniformly distributed between 1 and 128
runs_control.setPropertyLerpRange<float>("lerp_float", 1.0f, 128.0f);
// Initialise environment property 'random_int' with values uniformly distributed in the range [0, 10]
runs_control.setPropertyUniformRandom<int>("random_int", 0, 10);
// Initialise environment property 'random_float' with values from the normal dist (mean: 1, stddev: 2)
runs_control.setPropertyNormalRandom<float>("random_float", 1.0f, 2.0f);
// Initialise environment property 'random_double' with values from the log normal dist (mean: 2, stddev: 1)
runs_control.setPropertyLogNormalRandom<double>("random_double", 2.0, 1.0);
// Initialise environment property array 'int_array_3' with [1, 3, 5]
runs_control.setProperty<int, 3>("int_array_3", {1, 3, 5});
// Iterate vector to manually assign properties
for (RunPlan &plan:runs_control) {
// e.g. manually set all 'manual_float' to 32
plan.setProperty<float>("manual_float", 32.0f);
}
}
// Create an empty RunPlanVector, that we will construct by mutating and copying runs_control several times
flamegpu::RunPlanVector runs(model, 0);
for (const float &mutation : {0.2f, 0.5f, 0.8f, 1.5f, 1.9f, 2.5f}) {
// Dynamically generate a name for mutation sub directory
char subdir[24];
sprintf(subdir, "mutation_%g", mutation);
runs_control.setOutputSubdirectory(subdir);
// Fill in specialised parameters
runs_control.setProperty<float>("mutation", mutation);
// Append to the main run plan vector
runs += runs_control;
}
# Create a template run plan
runs_control = pyflamegpu.RunPlanVector(model, 128)
# Ensure that repeated runs use the same Random values within the RunPlans
runs_control.setRandomPropertySeed(34523)
# Initialise values across the whole vector
# All runs require 3600 steps
runs_control.setSteps(3600)
# Random seeds for each run should take the values (12, 13, 14, 15, etc)
runs_control.setRandomSimulationSeed(12, 1)
# Initialise environment property 'lerp_float' with values uniformly distributed between 1 and 128
runs_control.setPropertyLerpRangeFloat("lerp_float", 1.0, 128.0)
# Initialise environment property 'random_int' with values uniformly distributed in the range [0, 10]
runs_control.setPropertyUniformRandomInt("random_int", 0, 10)
# Initialise environment property 'random_float' with values from the normal dist (mean: 1, stddev: 2)
runs_control.setPropertyNormalRandomFloat("random_float", 1.0, 2.0)
# Initialise environment property 'random_double' with values from the log normal dist (mean: 2, stddev: 1)
runs_control.setPropertyLogNormalRandomDouble("random_double", 2.0, 1.0)
# Initialise environment property array 'int_array_3' with [1, 3, 5]
runs_control.setPropertyArrayInt("int_array_3", (1, 3, 5))
# Iterate vector to manually assign properties
for plan in runs_control:
# e.g. manually set all 'manual_float' to 32
plan.setPropertyFloat("manual_float", 32.0)
# Create an empty RunPlanVector, that we will construct by mutating and copying runs_control several times
runs = pyflamegpu.RunPlanVector(model, 0)
for mutation in [0.2, 0.5, 0.8, 1.5, 1.9, 2.5]:
# Dynamically generate a name for mutation sub directory
runs_control.setOutputSubdirectory("mutation_%g"%(mutation))
# Fill in specialised parameters
runs_control.setPropertyFloat("mutation", mutation)
# Append to the main run plan vector
runs += runs_control
Creating a Logging Configuration
Next you need to decide which data will be collected, as it is not possible to export full agent states from a CUDAEnsemble
.
A short example is shown below, however you should refer to the previous chapter for the comprehensive guide.
One benefit of using CUDAEnsemble
to carry out experiments, is that the specific RunPlan
data is included in each log file, allowing them to be automatically processed and used for reproducible research. However, this does not identify the particular version or build of your model.
// Specify the desired LoggingConfig or StepLoggingConfig
flamegpu::StepLoggingConfig step_log_cfg(model);
{
// Log every step (not available to LoggingConfig, for exit logs)
step_log_cfg.setFrequency(1);
step_log_cfg.logEnvironment("random_float");
step_log_cfg.agent("boid").logCount();
step_log_cfg.agent("boid").logMean<float>("speed");
}
flamegpu::LoggingConfig exit_log_cfg(model);
exit_log_cfg.logEnvironment("lerp_float");
// Pass the logging configs to the CUDAEnsemble
cuda_ensemble.setStepLog(step_log_cfg);
cuda_ensemble.setExitLog(exit_log_cfg);
# Specify the desired LoggingConfig or StepLoggingConfig
step_log_cfg = pyflamegpu.StepLoggingConfig(model);
#Log every step (not available to LoggingConfig, for exit logs)
step_log_cfg.setFrequency(1);
step_log_cfg.logEnvironment("random_float");
step_log_cfg.agent("boid").logCount();
step_log_cfg.agent("boid").logMeanFloat("speed");
exit_log_cfg = pyflamegpu.LoggingConfig (model)
exit_log_cfg.logEnvironment("lerp_float")
# Pass the logging configs to the CUDAEnsemble
cuda_ensemble.setStepLog(step_log_cfg)
cuda_ensemble.setExitLog(exit_log_cfg)
Configuring & Running the Ensemble
Now you can execute the CUDAEnsemble
from the command line, using the below parameters, it will execute the runs and log the collected data to file.
Long Argument |
Short Argument |
Description |
---|---|---|
|
|
Print the command line guide and exit. |
|
|
Comma separated list of GPU ids to be used to execute the ensemble. By default all devices will be used. |
|
|
The number of concurrent simulations to run per GPU. By default 4 concurrent simulations will run per GPU. |
|
|
Directory and format (JSON/XML) for ensemble logging. |
|
|
Don’t print ensemble progress to console. |
|
|
Print config, progress and timing (-t) information to console. |
|
|
Output timing information to console at exit. |
|
|
Silence warnings for unknown arguments passed after this flag. |
|
|
The |
|
Allow the operating system to enter standby during ensemble execution. The standby blocking feature is currently only supported on Windows, where it is enabled by default. |
You may also wish to specify your own defaults, by setting the values prior to calling initialise()
:
// Fully declare a ModelDescription, RunPlanVector and LoggingConfig/StepLoggingConfig
...
// Create a CUDAEnsemble to run the RunPlanVector
flamegpu::CUDAEnsemble ensemble(model);
// Override config defaults
ensemble.Config().out_directory = "results";
ensemble.Config().out_format = "json";
ensemble.Config().concurrent_runs = 1;
ensemble.Config().timing = true;
ensemble.Config().error_level = CUDAEnsemble::EnsembleConfig::Fast;
ensemble.Config().devices = {0};
// Handle any runtime args
// If this is instead performed before overriding defaults, overridden args will be ignored from command line
ensemble.initialise(argc, argv);
// Pass the logging configs to the CUDAEnsemble
cuda_ensemble.setStepLog(step_log_cfg);
cuda_ensemble.setExitLog(exit_log_cfg);
// Execute the ensemble using the specified RunPlans
const unsigned int errs = ensemble.simulate(runs);
# Fully declare a ModelDescription, RunPlanVector and LoggingConfig/StepLoggingConfig
...
# Create a CUDAEnsemble to execute the RunPlanVector
ensemble = pyflamegpu.CUDAEnsemble(model);
# Override config defaults
ensemble.Config().out_directory = "results"
ensemble.Config().out_format = "json"
ensemble.Config().concurrent_runs = 1
ensemble.Config().timing = True
ensemble.Config().error_level = pyflamegpu.CUDAEnsembleConfig.Fast
ensemble.Config().devices = pyflamegpu.IntSet([0])
# Handle any runtime args
# If this is instead performed before overriding defaults, overridden args will be ignored from command line
ensemble.initialise(sys.argv)
# Pass the logging configs to the CUDAEnsemble
cuda_ensemble.setStepLog(step_log_cfg)
cuda_ensemble.setExitLog(exit_log_cfg)
# Execute the ensemble using the specified RunPlans
errs = ensemble.simulate(runs)
Error Handling Within Ensembles
CUDAEnsemble
has three supported levels of error handling.
Level |
Name |
Description |
---|---|---|
0 |
Off |
Runs which fail do not cause an exception to be raised. |
1 |
Slow |
If any runs fail, an |
2 |
Fast |
An |
The default error level is “Slow” (1), which will cause an exception to be raised if any of the simulations fail to complete. However, all simulations will be attempted first, so partial results will be available.
Alternatively, calls to simulate()
return the number of errors, when the error level is set to “Off” (0). Therefore, failed runs can be probed manually via checking that the return value of simulate()
does not equal zero.