ai4co · fedebotu · Nov 28, 2025 · Nov 28, 2025 · Nov 28, 2025 · Nov 28, 2025
diff --git a/.pre-commit-config.yaml b/.pre-commit-config.yaml
@@ -1,26 +1,22 @@
-# See https://pre-commit.com for more information
-# See https://pre-commit.com/hooks.html for more hooks
-fail_fast: true
+# Usage
+# uv run pre-commit install
+# uv run pre-commit run --all-files
 
 repos:
-- repo: https://github.com/psf/black
-  rev: 25.1.0
-  hooks:
-    - id: black
-      args: [--config, pyproject.toml]
-      types: [python]
+  - repo: https://github.com/pre-commit/pre-commit-hooks
+    rev: v6.0.0
+    hooks:
+      - id: check-added-large-files
+      - id: check-case-conflict
+      - id: check-merge-conflict
+      - id: check-symlinks
+      - id: mixed-line-ending
+      - id: trailing-whitespace
 
-- repo: https://github.com/astral-sh/ruff-pre-commit
-  rev: v0.11.0
-  hooks:
-    - id: ruff
-      args: [ --fix ]
-
-- repo: https://github.com/pre-commit/pre-commit-hooks
-  rev: v5.0.0
-  hooks:
-    - id: check-toml
-    - id: check-yaml
-    - id: detect-private-key
-    - id: end-of-file-fixer
-    - id: trailing-whitespace
+  - repo: https://github.com/astral-sh/ruff-pre-commit
+    rev: v0.13.2
+    hooks:
+      - id: ruff-check
+        args: [ --fix ]
+      - id: ruff-format
+        types_or: [ python, pyi ]
diff --git a/configs/experiment/graph/am.yaml b/configs/experiment/graph/am.yaml
@@ -21,7 +21,7 @@ model:
   test_data_size: 1000
   optimizer_kwargs:
     lr: 1e-4
-  
+
 trainer:
   max_epochs: 100
 

diff --git a/configs/experiment/routing/mdpomo.yaml b/configs/experiment/routing/mdpomo.yaml
@@ -11,7 +11,7 @@ env:
   generator_params:
     num_loc: 50
     loc_distribution: "mix_distribution"
-    
+
 
 logger:
   wandb:

diff --git a/docs/content/api/zoo/improvement.md b/docs/content/api/zoo/improvement.md
@@ -2,7 +2,7 @@
 
 These methods are trained to improve existing solutions iteratively, akin to local search algorithms. They focus on refining existing solutions rather than generating them from scratch.
 
-### DACT 
+### DACT
 
 :::models.zoo.dact.encoder
     options:
@@ -19,7 +19,7 @@ These methods are trained to improve existing solutions iteratively, akin to loc
 :::models.zoo.dact.model
     options:
       show_root_heading: false
-      
+
 
 ### N2S
 

diff --git a/docs/content/general/faq.md b/docs/content/general/faq.md
@@ -2,7 +2,7 @@
 
 
 
-You can submit your questions via [GitHub Issues](https://github.com/ai4co/rl4co/issues) or [Discussions](https://github.com/ai4co/rl4co/discussions). 
+You can submit your questions via [GitHub Issues](https://github.com/ai4co/rl4co/issues) or [Discussions](https://github.com/ai4co/rl4co/discussions).
 
 You may search for your question in the existing issues or discussions before submitting a new one. If asked more than a few times, we will add it here!
 

diff --git a/docs/content/intro/environments.md b/docs/content/intro/environments.md
@@ -60,32 +60,32 @@ Click [here](../api/envs/routing.md) for API documentation on routing problems.
 
 ## Scheduling Problems
 
-Scheduling problems are a fundamental class of problems in operations research and industrial engineering, where the objective is to optimize the allocation of resources over time. These problems are critical in various industries, such as manufacturing, computer science, and project management. 
+Scheduling problems are a fundamental class of problems in operations research and industrial engineering, where the objective is to optimize the allocation of resources over time. These problems are critical in various industries, such as manufacturing, computer science, and project management.
 
 
 
 ### MDP
 
 Here we show a general constructive MDP formulation based on the Job Shop Scheduling Problem (JSSP), a well-known scheduling problem, which can be adapted to other scheduling problems.
 
-- **State** $s_t \in \mathcal{S}$: 
+- **State** $s_t \in \mathcal{S}$:
   The state is represented by a disjunctive graph, where:
   - Operations are nodes
   - Processing orders between operations are shown by directed arcs
   - This graph encapsulates both the problem instance and the current partial schedule
 
-- **Action** $a_t \in \mathcal{A}$: 
+- **Action** $a_t \in \mathcal{A}$:
   An action involves selecting a feasible operation to assign to its designated machine, a process often referred to as dispatching. The action space consists of all operations that can be feasibly scheduled at the current state.
 
-- **Transition** $\mathcal{T}$: 
+- **Transition** $\mathcal{T}$:
   The transition function deterministically updates the disjunctive graph based on the dispatched operation. This includes:
   - Modifying the graph's topology (e.g., adding new connections between operations)
   - Updating operation attributes (e.g., start times)
 
-- **Reward** $\mathcal{R}$: 
+- **Reward** $\mathcal{R}$:
   The reward function is designed to align with the optimization objective. For instance, if minimizing makespan is the goal, the reward could be the negative change in makespan resulting from the latest action.
 
-- **Policy** $\pi$: 
+- **Policy** $\pi$:
   The policy, typically stochastic, takes the current disjunctive graph as input and outputs a probability distribution over feasible dispatching actions. This process continues until a complete schedule is constructed.
 
 
@@ -103,25 +103,25 @@ Electronic Design Automation (EDA) is a sophisticated process that involves the
 EDA encompasses many problem types; here we'll focus on placement problems, which are fundamental in the physical design of integrated circuits and printed circuit boards. We'll use the Decap Placement Problem (DPP) as an example to illustrate a typical MDP formulation for EDA placement problems.
 
 
-- **State** $s_t \in \mathcal{S}$: 
+- **State** $s_t \in \mathcal{S}$:
   The state typically represents the current configuration of the design space, which may include:
   - Locations of fixed elements (e.g., ports, keepout regions)
   - Current placements of movable elements
   - Remaining resources or components to be placed
 
-- **Action** $a_t \in \mathcal{A}$: 
+- **Action** $a_t \in \mathcal{A}$:
   An action usually involves placing a component at a valid location within the design space. The action space consists of all feasible placement locations, considering design rules and constraints.
 
-- **Transition** $\mathcal{T}$: 
+- **Transition** $\mathcal{T}$:
   The transition function updates the design state based on the placement action, which may include:
   - Updating the placement map
   - Adjusting available resources or remaining components
   - Recalculating relevant metrics (e.g., wire length, power distribution)
 
-- **Reward** $\mathcal{R}$: 
+- **Reward** $\mathcal{R}$:
   The reward is typically based on the improvement in the design objective resulting from the latest placement action. This could involve metrics such as area efficiency, signal integrity, or power consumption.
 
-- **Policy** $\pi$: 
+- **Policy** $\pi$:
   The policy takes the current design state as input and outputs a probability distribution over possible placement actions.
 
 Note that specific problems may introduce additional complexities or constraints.
@@ -142,26 +142,26 @@ In graph problems, we typically work with a graph $G = (V, E)$, where $V$ is a s
 
 Graph problems can be effectively modeled using a Markov Decision Process (MDP) framework in a constructive fashion. Here, we outline the key components of the MDP formulation for graph problems:
 
-- **State** $s_t \in \mathcal{S}$: 
+- **State** $s_t \in \mathcal{S}$:
   The state encapsulates the current configuration of the graph and the optimization progress. It typically includes:
   - The graph structure (vertices and edges)
   - Attributes associated with vertices or edges
   - The set of elements (vertices, edges, or subgraphs) selected so far
   - Problem-specific information, such as remaining selections or resources
 
-- **Action** $a_t \in \mathcal{A}$: 
+- **Action** $a_t \in \mathcal{A}$:
   An action usually involves selecting a graph element (e.g., a vertex, edge, or subgraph). The action space comprises all valid selections based on the problem constraints and the current state.
 
-- **Transition** $\mathcal{T}$: 
+- **Transition** $\mathcal{T}$:
   The transition function $\mathcal{T}(s_t, a_t) \rightarrow s_{t+1}$ updates the graph state based on the selected action. This typically involves:
   - Updating the set of selected elements
   - Modifying graph attributes affected by the selection
   - Updating problem-specific information (e.g., remaining selections or resources)
 
-- **Reward** $\mathcal{R}$: 
+- **Reward** $\mathcal{R}$:
   The reward function $\mathcal{R}(s_t, a_t)$ quantifies the quality of the action taken. It is typically based on the improvement in the optimization objective resulting from the latest selection. This could involve metrics such as coverage, distance, connectivity, or any other problem-specific criteria.
 
-- **Policy** $\pi$: 
+- **Policy** $\pi$:
   The policy $\pi(a_t|s_t)$ is a probability distribution over possible actions given the current state. It guides the decision-making process, determining which graph elements to select at each step to optimize the objective.
 
 Specific problems may introduce additional complexities or constraints, which can often be incorporated through careful design of the state space, action space, and reward function.

diff --git a/docs/content/intro/policies.md b/docs/content/intro/policies.md
@@ -11,7 +11,7 @@ A policy $\pi$ is used to construct a solution from scratch for a given problem
 An AR policy is composed of an encoder $f$ that maps the instance $\mathbf{x}$ into an embedding space $\mathbf{h}=f(\mathbf{x})$ and by a decoder $g$ that iteratively determines a sequence of actions $\mathbf{a}$ as follows:
 
 $$
-a_t \sim g(a_t | a_{t-1}, ... ,a_0, s_t, \mathbf{h}), \quad 
+a_t \sim g(a_t | a_{t-1}, ... ,a_0, s_t, \mathbf{h}), \quad
 \pi(\mathbf{a}|\mathbf{x}) \triangleq \prod_{t=1}^{T-1} g(a_{t} | a_{t-1}, \ldots ,a_0, s_t, \mathbf{h}).
 $$
 

diff --git a/docs/content/intro/rl.md b/docs/content/intro/rl.md
@@ -19,7 +19,7 @@ $$
 \nabla_{\theta} \mathcal{L}_a(\theta|\mathbf{x}) = \mathbb{E}_{\pi(\mathbf{a}|\mathbf{x})} \left[(R(\mathbf{a}, \mathbf{x}) - b(\mathbf{x})) \nabla_{\theta}\log \pi(\mathbf{a}|\mathbf{x})\right],
 $$
 
-where $b(\cdot)$ is a baseline function used to stabilize training and reduce gradient variance. 
+where $b(\cdot)$ is a baseline function used to stabilize training and reduce gradient variance.
 
 We also distinguish between two types of RL (pre)training:
 

diff --git a/docs/content/start/hydra.md b/docs/content/start/hydra.md
@@ -102,7 +102,7 @@ defaults:
 This section sets the default configuration for the model, environment, callbacks, trainer, and logger. This means that if a key is not specified in the experiment configuration, the default value will be used. Note that these are set in the root [configs/](https://github.com/ai4co/rl4co/tree/main/configs) folder, and are useful for better organization and reusability.
 
 ```yaml linenums="11"
-env: 
+env:
   generator_params:
     loc_distribution: "uniform"
     num_loc: 50
@@ -153,7 +153,7 @@ logger:
 
 Finally, this section specifies the logger configuration. In this case, we are using Weights & Biases (WandB) to log the results of the experiment. We specify the project name, tags, group, and name of the experiment.
 
-That's it! 🎉 
+That's it! 🎉
 
 
 !!! tip

diff --git a/docs/hooks.py b/docs/hooks.py
@@ -26,12 +26,12 @@ def on_startup(*args, **kwargs):
     def append_tricks_to_readme(file_path):
         # read the tricks from docs/overrides/fancylogo.txt
         # and put them at the beginning of the file
-        with open("docs/overrides/fancylogo.txt", "r") as fancylogo:
+        with open("docs/overrides/fancylogo.txt") as fancylogo:
             tricks = fancylogo.read()
         if not os.path.exists(file_path):
             print(f"Error: The file {file_path} does not exist.")
             return
-        with open(file_path, "r") as original:
+        with open(file_path) as original:
             data = original.read()
         # remove first 33 lines. yeah, it's a hack to remove unneded stuff lol
         data = "\n".join(data.split("\n")[33:])

diff --git a/docs/js/autolink.js b/docs/js/autolink.js
@@ -3,16 +3,16 @@ const convertLinks = ( input ) => {
     let text = input;
     const linksFound = text.match( /(?:www|https?)[^\s]+/g );
     const aLink = [];
-  
+
     if ( linksFound != null ) {
-  
+
       for ( let i=0; i<linksFound.length; i++ ) {
         let replace = linksFound[i];
         if ( !( linksFound[i].match( /(http(s?)):\/\// ) ) ) { replace = 'http://' + linksFound[i] }
         let linkText = replace.split( '/' )[2];
         if ( linkText.substring( 0, 3 ) == 'www' ) { linkText = linkText.replace( 'www.', '' ) }
         if ( linkText.match( /youtu/ ) ) {
-  
+
           let youtubeID = replace.split( '/' ).slice(-1)[0];
           aLink.push( '<div class="video-wrapper"><iframe src="https://www.youtube.com/embed/' + youtubeID + '" frameborder="0" allow="accelerometer; autoplay; encrypted-media; gyroscope; picture-in-picture" allowfullscreen></iframe></div>' )
         }
@@ -26,7 +26,7 @@ const convertLinks = ( input ) => {
         text = text.split( linksFound[i] ).map(item => { return aLink[i].includes('iframe') ? item.trim() : item } ).join( aLink[i] );
       }
       return text;
-  
+
     }
     else {
       return input;

diff --git a/docs/js/katex.js b/docs/js/katex.js
@@ -1,4 +1,4 @@
-document$.subscribe(({ body }) => { 
+document$.subscribe(({ body }) => {
   renderMathInElement(body, {
     delimiters: [
       { left: "$$",  right: "$$",  display: true },

diff --git a/docs/overrides/fancylogo.txt b/docs/overrides/fancylogo.txt
@@ -2,16 +2,16 @@
 hide:
 - navigation
 - toc
---- 
+---
 
-<div>                        
+<div>
 <style type="text/css">
 .md-typeset h1,
 .md-content__button {
     display: none;
 }
-</style>      
-</div> 
+</style>
+</div>
 
 
 <div class="md-content" data-md-component="content">
@@ -83,11 +83,11 @@ hide:
                 const setContainerDimensions = () => {
                     const container = document.querySelector('.md-main__inner #particles-container');
                     const mainContent = document.querySelector('.md-main__inner');
-                    
+
                     if (mainContent && container) {
                         const containerWidth = mainContent.offsetWidth;
                         container.style.width = `${containerWidth}px`;
-                        
+
                         // Calculate height based on the aspect ratio and 60% width
                         const imageWidth = containerWidth * 0.6;
                         const imageHeight = imageWidth * ASPECT_RATIO;
@@ -103,16 +103,16 @@ hide:
                         const backgroundColor = computedStyle.backgroundColor;
 
                         mask.style.background = `
-                            linear-gradient(to right, 
-                                ${backgroundColor} 0%, 
-                                rgba(0,0,0,0) 10%, 
-                                rgba(0,0,0,0) 90%, 
+                            linear-gradient(to right,
+                                ${backgroundColor} 0%,
+                                rgba(0,0,0,0) 10%,
+                                rgba(0,0,0,0) 90%,
                                 ${backgroundColor} 100%
                             ),
-                            linear-gradient(to bottom, 
-                                ${backgroundColor} 0%, 
-                                rgba(0,0,0,0) 10%, 
-                                rgba(0,0,0,0) 90%, 
+                            linear-gradient(to bottom,
+                                ${backgroundColor} 0%,
+                                rgba(0,0,0,0) 10%,
+                                rgba(0,0,0,0) 90%,
                                 ${backgroundColor} 100%
                             )
                         `;
-Original file line number
+Diff line change
@@ Expand Up / @@ -11,7 +11,7 @@ env: @@
       generator_params:
         num_loc: 50
         loc_distribution: "mix_distribution"
     logger:
       wandb:
@@ Expand Down @@
Original file line number	Diff line number	Diff line change
Expand Up		@@ -2,7 +2,7 @@



		You can submit your questions via [GitHub Issues](https://github.com/ai4co/rl4co/issues) or [Discussions](https://github.com/ai4co/rl4co/discussions).
		You can submit your questions via [GitHub Issues](https://github.com/ai4co/rl4co/issues) or [Discussions](https://github.com/ai4co/rl4co/discussions).

		You may search for your question in the existing issues or discussions before submitting a new one. If asked more than a few times, we will add it here!

Expand Down