🌟 EvoGP: A GPU-accelerated Framework for Tree-Based Genetic Programming 🌟
EvoGP is a fully GPU-accelerated Tree-based Genetic Programming (TGP) framework built on PyTorch, leveraging custom CUDA kernels for core evolutionary operations like tree generation, mutation, crossover, and fitness evaluation. It supports multi-output trees and includes built-in tools for symbolic regression, policy optimization, and classification, along with standardized benchmarks for evaluation and tuning. EvoGP combines the flexibility of Python with the computational power of GPUs, making it an ideal platform for TGP research and applications. EvoGP is a sister project of EvoX.
-
CUDA-based parallel approach for TGP:
- Leverage specialized CUDA kernels to optimize critical TGP operations.
- Enhance computational efficiency, especially for large populations, enabling faster execution compared to traditional TGP methods.
-
GPU-accelerated framework in Python:
- Integrates CUDA kernels into Python via custom operators of PyTorch, ensuring compatibility with modern computational ecosystems.
- Achieve up to a 100x speedup compared to existing TGP implementations while maintaining or improving solution quality.
-
Rich in extended content:
- Offers a range of genetic operation variants, allowing users to tailor configurations for specific tasks.
- Supports multi-output trees, making it suitable for complex problems like classification and policy optimization.
- Supports Symbolic Regression, Classification, and Policy Optimization (Brax) benchmarks.
To install EvoGP, please follow the steps below:
Ensure you have the NVIDIA CUDA Toolkit installed, including nvcc
. You can download it from NVIDIA's official website.
- Check your CUDA version:
nvcc --version
Ensure you have a compatible C++ compiler installed:
- Linux/macOS: Install GCC (9.x or later is recommended).
sudo apt install build-essential # On Ubuntu gcc --version
- Windows: Install the Visual C++ Build Tools. You can download it from this. During installation, ensure that the C++ workload is selected.
Install the version of PyTorch that matches your installed CUDA Toolkit version.
For example, if you are using CUDA 11.8:
pip install torch torchvision torchaudio --index-url https://download.pytorch.org/whl/cu118
Important: Make sure to select the PyTorch version compatible with the CUDA Toolkit version (nvcc -V
), not the NVIDIA driver version.
You can find more details on the PyTorch installation page.
Finally, install EvoGP:
pip install git+https://github.com/EMI-Group/evogp.git
Note: This process might take a significant amount of time, as it includes the compilation of CUDA kernels.
python -m evogp.sr_test
Start your journey with EvoGP in a few simple steps:
- Import necessary modules:
import torch
from evogp.tree import Forest, GenerateDiscriptor
from evogp.algorithm import (
GeneticProgramming,
DefaultSelection,
DefaultMutation,
DefaultCrossover,
)
from evogp.problem import SymbolicRegression
from evogp.pipeline import StandardPipeline
- Define a problem (Here is Symbolic Regression with XOR-3d):
XOR_INPUTS = torch.tensor(
[
[0, 0, 0],
[0, 0, 1],
[0, 1, 0],
[0, 1, 1],
[1, 0, 0],
[1, 0, 1],
[1, 1, 0],
[1, 1, 1],
],
dtype=torch.float,
device="cuda",
)
XOR_OUTPUTS = torch.tensor(
[[0], [1], [1], [0], [1], [0], [0], [1]],
dtype=torch.float,
device="cuda",
)
problem = SymbolicRegression(datapoints=XOR_INPUTS, labels=XOR_OUTPUTS)
- Configure the algorithm:
# create decriptor for generating new trees
descriptor = GenerateDiscriptor(
max_tree_len=32,
input_len=problem.problem_dim,
output_len=problem.solution_dim,
using_funcs=["+", "-", "*", "/"],
max_layer_cnt=4,
const_samples=[-1, 0, 1],
)
# create the algorithm
algorithm = GeneticProgramming(
initial_forest=Forest.random_generate(pop_size=5000, descriptor=descriptor),
crossover=DefaultCrossover(),
mutation=DefaultMutation(
mutation_rate=0.2, descriptor=descriptor.update(max_layer_cnt=3)
),
selection=DefaultSelection(survival_rate=0.3, elite_rate=0.01),
)
- Run!:
pipeline = StandardPipeline(
algorithm,
problem,
generation_limit=100,
)
best = pipeline.run()
- Check the details for the best tree:
Predict results check:
pred_res = best.forward(XOR_INPUTS)
print(pred_res)
Obtain output like this:
tensor([[ 1.0000e-09],
[ 1.0000e+00],
[ 1.0000e+00],
[-1.0000e-09],
[ 1.0000e+00],
[ 1.0000e-09],
[ 1.0000e-09],
[ 1.0000e+00]], device='cuda:0')
Mathmatics Formula (Sympy expression):
sympy_expression = best.to_sympy_expr()
print(sympy_expression)
Obtain output like this:
(-x2*(x0 + x1) + 1.0)*(1.0*x2*(-x2*(x0 + x1) + 1.0) + (x0 - x1)**2)
Visualize:
best.to_png("./imgs/xor_tree.png")
Obtain:
The complete code is available in code.
EvoGP includes multiple genetic operators, allowing users to freely assemble them to build customized TGP algorithms.
Type | Name |
---|---|
Selection | DefaultSelection |
Selection | RouletteSelection |
Selection | TruncationSelection |
Selection | RankSelection |
Selection | TournamentSelection |
Crossover | DefaultCrossover |
Crossover | DiversityCrossover |
Crossover | LeafBiasedCrossover |
Mutation | DefaultMutation |
Mutation | HoistMutation |
Mutation | SinglePointMutation |
Mutation | MultiPointMutation |
Mutation | InsertMutation |
Mutation | DeleteMutation |
Mutation | SingleConstMutation |
Mutation | MultiConstMutation |
Mutation | CombinedMutation |
EvoGP supports symbolic regression tasks.
You can construct a Problem
with your custom dataset:
from evogp.problem import SymbolicRegression
problem = SymbolicRegression(datapoints=YOUR_DATA, labels=YOUR_LABELS)
Or use a predefined function to generate data:
def func(x):
val = x[0] ** 4 / (x[0] ** 4 + 1) + x[1] ** 4 / (x[1] ** 4 + 1)
return val.reshape(-1)
problem = SymbolicRegression(
func=func,
num_inputs=2,
num_data=20000,
lower_bounds=-5,
upper_bounds=5
)
EvoGP supports classification tasks.
You can construct a Problem
with your custom dataset:
from evogp.problem import Classification
problem = Classification(datapoints=YOUR_DATA, labels=YOUR_LABELS)
Or use the provided datasets:
dataset_name = ["iris", "wine", "breast_cancer", "digits"]
problem = Classification(dataset="iris")
EvoGP supports feature transformation tasks, allowing the generation of new features from raw data to improve model performance. You can create a Problem
with your custom dataset:
from evogp.problem import Transformation
problem = Transformation(datapoints=YOUR_DATA, labels=YOUR_LABELS)
Or use a built-in dataset like "diabetes":
problem = Transformation(dataset="diabetes")
During execution, EvoGP automatically generates features optimized for correlation with the target label. These new features can then be accessed through the new_feature
interface.
EvoGP supports robotics control tasks. You can create a Brax task with the following code:
from evogp.problem import BraxProblem
problem = BraxProblem("swimmer")
Note: Using BraxProblem
requires additional installation of the JAX and Brax packages.
Once you create your problem, you can use the following code to solve them:
problem = YOUR_HAVE_ALREADY_CREATED_IT
from evogp.tree import Forest, GenerateDiscriptor
from evogp.algorithm import (
GeneticProgramming,
DefaultSelection,
DefaultMutation,
DefaultCrossover,
)
from evogp.pipeline import StandardPipeline
descriptor = GenerateDiscriptor(
max_tree_len=128,
input_len=problem.problem_dim,
output_len=problem.solution_dim,
using_funcs=["+", "-", "*", "/"],
max_layer_cnt=5,
const_samples=[-1, 0, 1],
)
algorithm = GeneticProgramming(
initial_forest=Forest.random_generate(pop_size=1000, descriptor=descriptor),
crossover=DefaultCrossover(),
mutation=DefaultMutation(
mutation_rate=0.2, descriptor=descriptor.update(max_layer_cnt=3)
),
selection=DefaultSelection(survival_rate=0.3, elite_rate=0.01),
)
pipeline = StandardPipeline(
algorithm,
problem,
generation_limit=10,
)
pipeline.run()
Detailed examples for the above tasks are available in the examples.
EvoGP is a new project, and we will continue to maintain it in the future. We warmly welcome suggestions for improvement!
- Engage in discussions and share your experiences on GitHub Issues.
- Join our QQ group (ID: 297969717).
- Improve EvoGP documentation and tutorials.
- Implement more GP-related algorithms, such as LGP, CGP, GEP.
- Add more multi-output methods for EvoGP.
- Further optimize EvoGP to increase computation speed and reduce memory usage.
We warmly welcome community developers to contribute to EvoGP and look forward to your pull requests!
- Thanks to John R. Koza for the Genetic Programming (GP) algorithm, which provided an excellent automatic programming technique and laid the foundation for the development of EvoGP.
- Thanks to PyTorch and CUDA for providing flexible and efficient GPU-accelerated tools, which are essential for optimizing the performance of EvoGP.
- Thanks to the following projects for their valuable contributions to GP research, which provided inspiration and guidance for EvoGP's design: DEAP, gplearn, Karoo GP, TensorGP and SymbolicRegressionGPU.
- Thanks to scikit-learn and Brax for their benchmarking frameworks, which have provided a convenient environment for performance evaluation in EvoGP.
- Thanks to EvoX for providing a flexible framework that allows EvoGP to integrate with other evolutionary algorithms, expanding its potential.
If you use EvoGP in your research and want to cite it in your work, please use: