CORE - Using a Pipeline

Dedicated code in Core/Pipelines module.

General Policy

Several Pipelines are available with DeepPhysX, allowing the user to:

  • Generate synthetic data from simulations → DataGeneration

  • Train artificial neural networks with synthetic data → Training

  • Use the predictions of trained networks inside a simulation → Prediction

A Pipeline is always associated with a working session, whether it already exists or whether it is automatically created when the Pipeline is launched.

The main policy for creating and using a Pipeline is to first define a Configuration for each component involved by this Pipeline. Configurations can then be mandatory for Dataset, Network and Environment if one wants to create a Pipeline. Once these Configurations are defined, the Pipeline can be created and launched.

Note

More details are provided for each Configuration in dedicated sections.

Pipeline - Data generation

The DataGeneration will only involve an Environment and a Dataset, so this Pipeline requires the corresponding Configurations.

As the purpose of this Pipeline is only to create synthetic data, the working session will always be created at the same time. The name of the repository to create can be provided as a parameter.

Furthermore, users have to define which data to save and how much :

  • The number of batches to produce

  • The number of samples in a single batch

Warning

This pipeline does not involve a Network, the get_prediction request from Environment is then disabled. Trying to launch this request it will lead to an error, so make sure there are no such requests in the data production procedure in your Environment.

See following example:

# Import BaseDataGeneration and Config objects
from DeepPhysX.Core.Pipelines.BaseDataGeneration import BaseDataGeneration
...

# Define configs
dataset_config = ...
environment_config = ...

# Create the pipeline
data_generator = BaseDataGeneration(session_dir='sessions',
                                    session_name='my_data_generation',
                                    dataset_config=dataset_config,
                                    environment_config=environment_config,
                                    batch_nb=500,
                                    batch_size=16)

# Launch the pipeline
data_generator.execute()

Pipeline - Training

The Training can involve an Environment, a Dataset and a Network, so this Pipeline might require the corresponding Configurations. There are several ways to use this pipeline:

Training a Network from scratch

To train a Network from scratch, the Training requires the whole set of Configurations. A new working session will be created, whose name can be set as a parameter.

Training a Network with an existing Dataset

Training a new Network with an existing Dataset is considered as creating a new working session. The path to the Dataset to use has to be provided as a parameter. Using an Environment is not mandatory since the training data can already have the right format to feed the Network. If some data computation must be performed between the Dataset and the Network, an Environment can be created with the specific Configuration (see more in the dedicated section).

Training a Network from an existing Network state

Training from an existing Network state can be done both in an existing session or in a new session. If you want to work in the same session, you have to configure the Training to do so, otherwise a new working session will be automatically created. In the same session, a new set of trained parameters will be added in the network repository, either trained with data from an external Dataset (whose path must be provided) or with data from the Environment (whose Configuration must be provided).

The last parameters to set in the Trainer are:

  • The number of epochs the training loop should complete during the session

  • The number of batches used during a single epoch

  • The number of samples in a single batch

Note

By default, the training data will be produced inside the Environment during the first epoch and then re-used from the Dataset for the remaining epochs. If you always need to use data from the Environment, you can specify this in its Configuration.

See following example:

# Import BaseTraining and Config objects
from DeepPhysX.Core.Pipelines.BaseTraining import BaseTraining
...

# Define configs
dataset_config = ...
environment_config = ...
network_config = ...

# Create the pipeline
trainer = BaseTraining(session_dir='sessions',
                       session_name='my_training',
                       dataset_config=dataset_config,
                       environment_config=env_config,
                       network_config=net_config,
                       epoch_nb=100,
                       batch_nb=500,
                       batch_size=16)

# Launch the pipeline
trainer.execute()

Pipeline - Prediction

The Prediction always requires a Network to compute predictions and an Environment to apply them, so this Pipeline will always require the corresponding Configurations.

This Pipeline always works with an existing working session, no new sessions can be created within a Prediction. The path to the session is therefore required, assuming that it contains a trained Network.

The Prediction can either run a specified number of steps or run an infinite loop.

A Dataset configuration can be provided. In this case, the Prediction can record prediction data. Each sample computed during the prediction phase will then be added to the Dataset in dedicated partitions. With a Dataset, the Prediction can also load its data to replay stored samples.

See following example:

# Import BasePrediction and Config objects
from DeepPhysX.Core.Pipelines.BasePrediction import BasePrediction
...

# Define configs
dataset_config = ...
environment_config = ...
network_config = ...

# Create the pipeline
runner = BasePrediction(session_dir='sessions',
                        session_name='my_training',
                        dataset_config=dataset_config,
                        environment_config=env_config,
                        network_config=net_config,
                        step_nb=-1)

# Launch the pipeline
runner.execute()