Skip to content
Merged
Show file tree
Hide file tree
Changes from all commits
Commits
Show all changes
32 commits
Select commit Hold shift + click to select a range
cb7665f
(Tuto) Create tutorial for parallelizing training of the pendulum.
kdesnos Dec 12, 2025
0c6d355
(Tuto) Prepare code for parallel tuto solution.
kdesnos Dec 12, 2025
0d0de07
(RelEng) Prepare evolution of python script for preparing template.
kdesnos Dec 12, 2025
e273f2b
(Releng) Factorize code.
kdesnos Dec 15, 2025
1c4522d
(RelEng) Improve multi-tuto templating.
kdesnos Dec 15, 2025
85fef7c
(RelEng) Create result of parallel tutorial.
kdesnos Dec 15, 2025
147aab2
(RelEng) Clean zip file generation.
kdesnos Dec 16, 2025
5c1b3fa
(RelEng) Start building CI for testing tutorial build.
kdesnos Dec 16, 2025
76f5128
(RelEng) Build tuto (ubuntu only for now)
kdesnos Dec 16, 2025
2739bd8
(RelEng) Install gegelati for build.
kdesnos Dec 16, 2025
b9522fc
(RelEng) Add build targets.
kdesnos Dec 16, 2025
faf7e30
(Releng) Run the training.
kdesnos Dec 16, 2025
d1a295c
(Tuto) Deactivate rendering code when relevant.
kdesnos Dec 16, 2025
b17b74e
(RelEng) Compile for test
kdesnos Dec 16, 2025
f2c1a36
(Tuto) Start developping code for multi-episode evaluation.
kdesnos Dec 17, 2025
02d1f89
(Tuto) Filter params.json for initial and parallel tuto archives.
kdesnos Dec 17, 2025
07c6b41
(Tuto) Activate validation
kdesnos Dec 17, 2025
4c9f471
(Tuto) Fix params.json double include in zip
kdesnos Dec 17, 2025
2b88afd
(Tuto) Start tuto on multi-episode training (separate from paralleliz…
kdesnos Dec 18, 2025
384b0ed
(Tuto) Put solution guards for strengthening tuto (breaks CI).
kdesnos Dec 18, 2025
cb0dd30
(RelEng) Update prepare template to support nested ifdef.
kdesnos Dec 19, 2025
856da54
(RelEng) Strengthen template prepa
kdesnos Dec 19, 2025
8baff4c
(RelEng) Test Strenghtening tutorial archive.
kdesnos Dec 19, 2025
d87f43d
(RelEng) Parallel archive now based on strengthening.
kdesnos Dec 19, 2025
da36fa0
(RelEng) Better filtrate else block in templates
kdesnos Dec 20, 2025
a2de253
(Tuto) Keep only relevant params.
kdesnos Dec 20, 2025
38c60bd
(Tuto) Make it possible to sync replay with training, or vice-versa.
kdesnos Dec 20, 2025
7dc94b6
(Tuto) Fix string size.
kdesnos Dec 20, 2025
5918b47
(RelEng) trigger msvc build
kdesnos Dec 20, 2025
7dc94d1
(Tuto) Fix manual control display.
kdesnos Dec 20, 2025
caaf2a3
(RelEng) Fix parallel build with cmake
kdesnos Dec 20, 2025
0f61f1a
(Releng) Fix msvc run.
kdesnos Dec 20, 2025
File filter

Filter by extension

Filter by extension


Conversations
Failed to load comments.
Loading
Jump to
Jump to file
Failed to load files.
Loading
Diff view
Diff view
123 changes: 123 additions & 0 deletions .github/workflows/test-tuto.yml
Original file line number Diff line number Diff line change
@@ -0,0 +1,123 @@
name: Build Archives (CI Only)

on:
push:
branches:
- '**'

jobs:
build_archives:
runs-on: ubuntu-latest
steps:
- uses: actions/checkout@v3

- name: Prepare template files
run: |
python ./scripts/prepare_template.py

- name: Prepare archives
run: |
python ./scripts/prepare_archives.py

- name: Upload archives as artifacts
uses: actions/upload-artifact@v4
with:
name: tutorial-archives
path: |
docs/data/gegelati-tutorial.zip
docs/data/gegelati-tutorial-solution.zip
docs/data/gegelati-tutorial-strengthening-solution.zip
docs/data/gegelati-tutorial-parallel-solution.zip

test_archives:
needs: build_archives
runs-on: ${{ matrix.os }}
strategy:
fail-fast: false
matrix:
archive:
- gegelati-tutorial.zip
- gegelati-tutorial-solution.zip
- gegelati-tutorial-strengthening-solution.zip
- gegelati-tutorial-parallel-solution.zip
os: [ubuntu-latest, windows-latest] ##, windows-latest, macos-latest]
compiler: [gcc, msvc] ##, clang, msvc]
exclude:
# Exclude MSVC on non-Windows
- os: ubuntu-latest
compiler: msvc
## - os: macos-latest
## compiler: msvc
# Exclude GCC and Clang on Windows (unless you want to setup MinGW/LLVM)
- os: windows-latest
compiler: gcc
## - os: windows-latest
## compiler: clang

name: Test ${{ matrix.archive }} on ${{ matrix.os }} with ${{ matrix.compiler }}
steps:
- uses: actions/checkout@v3

- name: Download archives artifact
uses: actions/download-artifact@v4
with:
name: tutorial-archives
path: archives

- name: Unzip archive
run: |
unzip -q archives/${{ matrix.archive }} -d tutorial
shell: bash

- name: Set up compiler
if: matrix.compiler == 'gcc'
uses: egor-tensin/setup-gcc@v1
with:
version: latest
# GCC is default on Linux, but this ensures it's available

- name: Set up Clang
if: matrix.compiler == 'clang'
uses: egor-tensin/setup-clang@v1
with:
version: latest

- name: Set up MSVC
if: matrix.compiler == 'msvc' && runner.os == 'Windows'
uses: ilammy/msvc-dev-cmd@v1

- name: Install Libraries (Linux)
if: matrix.os == 'ubuntu-latest'
run: |
sudo apt install -y libsdl2-dev libsdl2-image-dev libsdl2-ttf-dev

- name: Build Gegelati (Linux/MacOS)
if: matrix.os == 'ubuntu-latest' || matrix.os == 'macos-latest'
run: |
git clone -b master https://github.com/gegelati/gegelati.git lib/gegelati
cd lib/gegelati/bin
cmake .. -DBUILD_TESTING=OFF -DSKIP_DOXYGEN_BUILD=ON -DCMAKE_BUILD_TYPE=Release
sudo cmake --build . --target install --parallel $(nproc)
shell: bash

- name: Configure CMake project
run: |
mkdir -p tutorial/build
cmake -S tutorial/gegelati-tutorial -B build -DTESTING=ON
env:
CC: ${{ matrix.compiler == 'gcc' && 'gcc' || matrix.compiler == 'clang' && 'clang' || '' }}
CXX: ${{ matrix.compiler == 'gcc' && 'g++' || matrix.compiler == 'clang' && 'clang++' || '' }}
shell: bash

- name: Build manual-control target
run: |
cmake --build build --config Release --target manual-control --parallel $(nproc)
shell: bash

- name: Build and run tpg-training target
if: matrix.archive != 'gegelati-tutorial.zip'
run: |
sed -i 's/"nbGenerations": [0-9]*/"nbGenerations": 4/' tutorial/gegelati-tutorial/params.json
cmake --build build --config Release --target tpg-training --parallel $(nproc)
cd build && ./Release/tpg-training # cd needed for windows dll
shell: bash
6 changes: 3 additions & 3 deletions CMakeLists.txt
Original file line number Diff line number Diff line change
Expand Up @@ -52,7 +52,7 @@ endif()
# Add definitions for testing purposes
if(${TESTING})
MESSAGE("Testing mode")
add_definitions(-DNO_CONSOLE_CONTROL -DNB_GENERATIONS=2)
add_definitions(-DNO_CONSOLE_CONTROL -DNB_GENERATIONS=2 -DDEACTIVATE_DISPLAY=1)
endif()

# *******************************************
Expand Down Expand Up @@ -97,7 +97,7 @@ add_executable(tpg-training ${pendulum_files} ${training_files})
target_link_libraries(tpg-training ${GEGELATI_LIBRARIES} ${SDL2_LIBRARY} ${SDL2_IMAGE_LIBRARY} ${SDL2TTF_LIBRARY})
target_compile_definitions(tpg-training PRIVATE ROOT_DIR="${CMAKE_SOURCE_DIR}")

#ifdef SOLUTION
#ifdef SOLUTION_INFERENCE
# Sub project for inference
file(GLOB
inference_files
Expand All @@ -111,4 +111,4 @@ include_directories(${GEGELATI_INCLUDE_DIRS} ${SDL2_INCLUDE_DIR} ${SDL2_IMAGE_IN
add_executable(tpg-inference ${pendulum_files} ${inference_files})
target_link_libraries(tpg-inference ${GEGELATI_LIBRARIES} ${SDL2_LIBRARY} ${SDL2_IMAGE_LIBRARY} ${SDL2TTF_LIBRARY})
target_compile_definitions(tpg-inference PRIVATE ROOT_DIR="${CMAKE_SOURCE_DIR}")
#endif // SOLUTION
#endif // SOLUTION_INFERENCE
134 changes: 134 additions & 0 deletions docs/_pages/parallel_training.md
Original file line number Diff line number Diff line change
@@ -0,0 +1,134 @@
---
title: Parallel Training of Tangled Program Graphs
permalink: /tutos/parallel-training
toc: true
toc_sticky: true
---

The objective of this tutorial is to activate parallel training of Tangled Program Graphs (TPGs) with <span style="font-variant: small-caps;">Gegelati</span> by:
- instantiating a `ParallelLearningAgent`, and
- making the `PendulumWrapper` safely copyable so worker threads receive independent environments.

The starting point of this tutorial is the C++ project obtained at the end of the _[GEGELATI introductory tutorial](/gegelati-tutorial)_. While completing the introductory tutorial is strongly advised, a copy of the project resulting from this tutorial can be downloaded at the following link: [pendulum_wrapper_solution.zip](/gegelati-tutorial/data/gegelati-tutorial-solution.zip).

## Why make the environment copyable?

The learning process of TPGs involves two main time-consuming steps per generation:
- Evaluation of the fitness of each individual TPG root within the `PendulumWrapper` learning environment. This step takes time `T_eval` seconds at each generation in the printed log.
- Mutation of the TPG population. This step takes time `T_mutat` seconds at each generation in the printed log.

When using a `LearningAgent`, both steps are performed sequentially on a single thread. To accelerate training, it is possible to parallelize these steps across multiple threads/cores by using `ParallelLearningAgent`.

To better take not of the benefits of parallel training, keep a copy of the logs produced by the sequential training for comparison.

An important feature of <span style="font-variant: small-caps;">Gegelati</span> is that the parallelization of training is fully deterministic, which means that running the same training with the same random seed will always produce the same results, regardless of the number of threads used. This is achieved by ensuring that each worker thread operates on its own independent copy of the learning environment.

## 0. Parallelize mutations

To enable parallel mutations, the sequential `LearningAgent` must be replaced with `ParallelLearningAgent`. By default, the number of threads is set to the number of available hardware threads on the machine.

#### TODO 1:
Edit the `/gegelati-tutorial/src/training/main-training.cpp` by replacing the line that instantiates the `LearningAgent` with a line that instantiates a `ParallelLearningAgent`:

{% details Solution to #1 (Click to expand) %}
```cpp
/* main-training.cpp */
// Instantiate and initialize the Learning Agent (LA)
Learn::ParallelLearningAgent la(pendulumLE, instructionSet, params);
```

{% enddetails %}

Build and run the `main-training` target of the project. You should observe that `T_mutat` times have slightly decreased compared to the sequential training log. Other columns relative to the trained TPG characteristics (`NbVert`, `NbActR`, `NbTeamR`) and the fitness of agents (`Min`, `Avg`, `Max`) should remain identical to the sequential training.

## 1. Parallelize evaluations

To enable parallel evaluations, the `PendulumWrapper` must be made safely copyable. This is done first by implementing the copy constructor of the `PendulumWrapper` class, and then by overriding the `clone()` method inherited from the `LearningEnvironment` base class.

#### TODO 2:
Edit the `/gegelati-tutorial/src/environments/pendulum_wrapper.h` and `/gegelati-tutorial/src/environments/pendulum_wrapper.cpp` to add a copy constructor `PendulumWrapper(const PendulumWrapper& other)` to the class.

It is important to note that the default copy constructor generated by the compiler would perform a shallow copy of the member variables, which is not suitable in this case. Therefore, a custom copy constructor must be implemented to ensure that all member variables are properly duplicated.

Special care should be taken to handle the `std::vector<Data::PointerWrapper<double>> data` attribute, this attribute must be initialized as a copy-constructed copy of the `other.data` attribute. Then the pointers contained in the vector must be updated to point to the attributes of the `this->pendulum`, and not to `other.pendulum` as is the case after copy-constructing the `data` attribute.


{% details Solution to #2 (Click to expand) %}
```cpp
/* pendulum_wrapper.h */
// Copy constructor
PendulumWrapper(const PendulumWrapper& other);
```

```cpp
/* pendulum_wrapper.cpp */
// Copy constructor implementation
PendulumWrapper::PendulumWrapper(const PendulumWrapper& other)
: LearningEnvironment(other), // Call base class copy constructor
pendulum(other.pendulum), // Copy-construct the pendulum
data(other.data) // Copy-construct the data vector
{
// Update pointers in data to point to this->pendulum's attributes
data.at(0).setPointer(&this->pendulum.getAngle());
data.at(1).setPointer(&this->pendulum.getVelocity());
}
```

{% enddetails %}

#### TODO 3:
Next, override the `clone()` method in the `PendulumWrapper` class to return a new instance of `PendulumWrapper` created using the copy constructor.

{% details Solution to #3 (Click to expand) %}
```cpp
/* pendulum_wrapper.h */
// Override clone method
Data::LearningEnvironment* clone() const override;
```

```cpp
/* pendulum_wrapper.cpp */
// Override clone method implementation
Data::LearningEnvironment* PendulumWrapper::clone() const {
return new PendulumWrapper(*this); // Use copy constructor
}
```

{% enddetails %}


#### TODO 4:
To signal to <span style="font-variant: small-caps;">Gegelati</span> that the `PendulumWrapper` can be safely copied for parallel evaluation, the `LearningEnvironment::isCopyable()` method must be overridden to return `true`.

{% details Solution to #4 (Click to expand) %}
```cpp
/* pendulum_wrapper.h */
// Override isCopyable method
bool isCopyable() const override;
```

```cpp
/* pendulum_wrapper.cpp */
// Override isCopyable method implementation
bool PendulumWrapper::isCopyable() const {
return true; // Indicate that this environment is copyable
}
```

{% enddetails %}

#### Test parallel evaluations
Build and run the `main-training` target of the project. You should observe that `T_eval` times have significantly decreased compared to the sequential training log. Other columns relative to the trained TPG characteristics (`NbVert`, `NbActR`, `NbTeamR`) and the fitness of agents (`Min`, `Avg`, `Max`) should remain identical to the sequential training.

It it possible to control the number of threads used by the `ParallelLearningAgent` by setting the `nbThreads` parameter in the `/gegelati-tutorial/params.json` file as follows:

```json
"nbThreads": 4,
```

## Conclusion
In this tutorial, you have successfully enabled parallel training of Tangled Program Graphs (TPGs) in <span style="font-variant: small-caps;">Gegelati</span> by replacing the sequential `LearningAgent` with `ParallelLearningAgent` and making the `PendulumWrapper` safely copyable.

More information about parallel training with <span style="font-variant: small-caps;">Gegelati</span> can be found in the following publication:

[_K. Desnos, N. Sourbier, P.-Y. Raumer, O. Gesny and M. Pelcat. GEGELATI: Lightweight Artificial Intelligence through Generic and Evolvable Tangled Program Graphs. In Workshop on Design and Architectures for Signal and Image Processing (DASIP), ACM, 2021_](https://arxiv.org/pdf/2012.08296)
91 changes: 91 additions & 0 deletions docs/_pages/strengthening_agents.md
Original file line number Diff line number Diff line change
@@ -0,0 +1,91 @@
---
title: Strengthening Reinforcement Learning Agents in with Multi-Episode Evaluation and Validation Phases
permalink: /tutos/strengthening-agents
toc: true
toc_sticky: true
---

The objective of this tutorial is two-fold:
1. Strengthen the built reinforcement learning agents by evaluating them over multiple episodes during training, and
2. Activate a validation phase at the end of each generation to monitor potential overfitting, and

The starting point of this tutorial is the C++ project obtained at the end of the _[GEGELATI introductory tutorial](/gegelati-tutorial)_. While completing the introductory tutorial is strongly advised, a copy of the project resulting from this tutorial can be downloaded at the following link: [pendulum_wrapper_solution.zip](/gegelati-tutorial/data/gegelati-tutorial-solution.zip).

## Multi-episode evaluation setup
### Why evaluate over multiple episodes?
An episode refers to a complete sequence of interactions between a reinforcement learning agent and its environment, starting from an initial state and ending when a terminal condition is met. For example, in the initial tutorial, an episode consists of the agent attempting to balance the pendulum for a fixed duration of 1500 time steps, as defined by the `maxNbActionsPerEval` parameter in `params.json`.

In reinforcement learning, evaluating an agent's performance over multiple episodes is crucial for obtaining a reliable evaluation of its true capabilities. This is because the performance of an agent can vary significantly from one episode to another due to the inherent stochasticity of the environment and the agent's policy. By averaging the results over multiple episodes, we can mitigate the effects of randomness and strengthen the robustness of the learned policy.

Implementing multi-episode evaluation in <span style="font-variant: small-caps;">Gegelati</span> involves modifying the `PendulumWrapper` class to support multiple episodes during the evaluation phase. To vary the starting conditions of each episode, the pendulum's angle and angular velocity will be randomly initialized at the beginning of each episode.

### 0. Modify PendulumWrapper to support multi-episode evaluation
To implement multi-episode evaluation, we will first modify the `PendulumWrapper` class to support a stochastic reset of the pendulum's state at the beginning of each episode.

To support random initialization, we will use a pseudo-random number generator to generate random values for the pendulum's angle and angular velocity within specified ranges.

#### TODO 1:
Edit the `/gegelati-tutorial/src/environments/pendulum_wrapper.h`to add a random number generator as a member variable of the `PendulumWrapper` class. This pseudo-random number generator is provided in <span style="font-variant: small-caps;">Gegelati</span> with the `Mutator::RNG` class.

{% details Solution to #1 (Click to expand) %}
```cpp
/* pendulum_wrapper.h */
class PendulumWrapper : public Learn::LearningEnvironment {
public:
// Existing code...

/// Random Number Generator for the environment
Mutator::RNG rng;
```

{% enddetails %}

#### TODO 2:
Next, we will modify the `reset(size_t seed, Learn::LearningMode mode, uint16_t iterationNumber, uint64_t generationNumber)` method of the `PendulumWrapper` class to randomly initialize the pendulum's angle and angular velocity at the beginning of each episode.

When calling the `reset(...)` method of the environment, </span style="font-variant: small-caps;">Gegelati</span> notably provides a `seed` parameter that can be used to seed the environment random number generator, using the `Mutator::RNG::seed(size_t seed)` method. Using this seeding mechanism ensures deterministic reproducibility of the random initialization across different runs.

Once the RNG is seeded, we will use the `Mutator::RNG::getDouble(double min, double max)` method to generate random values within specified ranges. For example, we can set the angle to be randomly initialized between -π and π radians, and the angular velocity to be randomly initialized between -1.0 and 1.0 radians per second.

{% details Solution to #2 (Click to expand) %}
The reset method be modified as follows:
```cpp
/* pendulum_wrapper.cpp */
void PendulumWrapper::reset(size_t seed, Learn::LearningMode mode, uint16_t iterationNumber, uint64_t generationNumber) {
// Seed the RNG differently for each iteration
this->rng.setSeed(seed);

// Randomize the initial angle between [-pi, pi]
double initialAngle = this->rng.getDouble(-M_PI, M_PI);
this->pendulum.setAngle(initialAngle);
// Randomize the initial velocity between [-1.0, 1.0]
double initialVelocity = this->rng.getDouble(-1.0, 1.0);
this->pendulum.setVelocity(initialVelocity);
}
```

{% enddetails %}

### 1. Configure multi-episode evaluation in params.json
To enable multi-episode evaluation during training, we need to modify the training parameters in the `params.json` file of the project.

#### TODO 3:
Edit the `/gegelati-tutorial/params.json` file to set the `nbEpisodesPerEval` parameter to a value greater than 1. This parameter specifies the number of episodes over which each agent will be evaluated during training. For this tutorial, set it to 5.

{% details Solution to #3 (Click to expand) %}
```json
{
// Existing parameters...
"nbEpisodesPerEval": 5,
// Existing parameters...
}
```

{% enddetails %}

## Conclusion
In this tutorial, you have successfully enabled multi-episode evaluation for reinforcement learning agents in <span style="font-variant: small-caps;">Gegelati</span>. By evaluating agents over multiple episodes, you have strengthened the robustness of the learned policy and mitigated the effects of randomness in the environment.

More information about reinforcement learning with <span style="font-variant: small-caps;">Gegelati</span> can be found in the following publication:

[_K. Desnos, N. Sourbier, P.-Y. Raumer, O. Gesny and M. Pelcat. GEGELATI: Lightweight Artificial Intelligence through Generic and Evolvable Tangled Program Graphs. In Workshop on Design and Architectures for Signal and Image Processing (DASIP), ACM, 2021_](https://arxiv.org/pdf/2012.08296)
Loading