Skip to content
Open
Show file tree
Hide file tree
Changes from all commits
Commits
File filter

Filter by extension

Filter by extension


Conversations
Failed to load comments.
Loading
Jump to
Jump to file
Failed to load files.
Loading
Diff view
Diff view
42 changes: 19 additions & 23 deletions .pre-commit-config.yaml
Original file line number Diff line number Diff line change
@@ -1,26 +1,22 @@
# See https://pre-commit.com for more information
# See https://pre-commit.com/hooks.html for more hooks
fail_fast: true
# Usage
# uv run pre-commit install
# uv run pre-commit run --all-files

repos:
- repo: https://github.com/psf/black
rev: 25.1.0
hooks:
- id: black
args: [--config, pyproject.toml]
types: [python]
- repo: https://github.com/pre-commit/pre-commit-hooks
rev: v6.0.0
hooks:
- id: check-added-large-files
- id: check-case-conflict
- id: check-merge-conflict
- id: check-symlinks
- id: mixed-line-ending
- id: trailing-whitespace

- repo: https://github.com/astral-sh/ruff-pre-commit
rev: v0.11.0
hooks:
- id: ruff
args: [ --fix ]

- repo: https://github.com/pre-commit/pre-commit-hooks
rev: v5.0.0
hooks:
- id: check-toml
- id: check-yaml
- id: detect-private-key
- id: end-of-file-fixer
- id: trailing-whitespace
- repo: https://github.com/astral-sh/ruff-pre-commit
rev: v0.13.2
hooks:
- id: ruff-check
args: [ --fix ]
- id: ruff-format
types_or: [ python, pyi ]
2 changes: 1 addition & 1 deletion configs/experiment/graph/am.yaml
Original file line number Diff line number Diff line change
Expand Up @@ -21,7 +21,7 @@ model:
test_data_size: 1000
optimizer_kwargs:
lr: 1e-4

trainer:
max_epochs: 100

Expand Down
2 changes: 1 addition & 1 deletion configs/experiment/routing/mdpomo.yaml
Original file line number Diff line number Diff line change
Expand Up @@ -11,7 +11,7 @@ env:
generator_params:
num_loc: 50
loc_distribution: "mix_distribution"


logger:
wandb:
Expand Down
4 changes: 2 additions & 2 deletions docs/content/api/zoo/improvement.md
Original file line number Diff line number Diff line change
Expand Up @@ -2,7 +2,7 @@

These methods are trained to improve existing solutions iteratively, akin to local search algorithms. They focus on refining existing solutions rather than generating them from scratch.

### DACT
### DACT

:::models.zoo.dact.encoder
options:
Expand All @@ -19,7 +19,7 @@ These methods are trained to improve existing solutions iteratively, akin to loc
:::models.zoo.dact.model
options:
show_root_heading: false


### N2S

Expand Down
2 changes: 1 addition & 1 deletion docs/content/general/faq.md
Original file line number Diff line number Diff line change
Expand Up @@ -2,7 +2,7 @@



You can submit your questions via [GitHub Issues](https://github.com/ai4co/rl4co/issues) or [Discussions](https://github.com/ai4co/rl4co/discussions).
You can submit your questions via [GitHub Issues](https://github.com/ai4co/rl4co/issues) or [Discussions](https://github.com/ai4co/rl4co/discussions).

You may search for your question in the existing issues or discussions before submitting a new one. If asked more than a few times, we will add it here!

Expand Down
32 changes: 16 additions & 16 deletions docs/content/intro/environments.md
Original file line number Diff line number Diff line change
Expand Up @@ -60,32 +60,32 @@ Click [here](../api/envs/routing.md) for API documentation on routing problems.

## Scheduling Problems

Scheduling problems are a fundamental class of problems in operations research and industrial engineering, where the objective is to optimize the allocation of resources over time. These problems are critical in various industries, such as manufacturing, computer science, and project management.
Scheduling problems are a fundamental class of problems in operations research and industrial engineering, where the objective is to optimize the allocation of resources over time. These problems are critical in various industries, such as manufacturing, computer science, and project management.



### MDP

Here we show a general constructive MDP formulation based on the Job Shop Scheduling Problem (JSSP), a well-known scheduling problem, which can be adapted to other scheduling problems.

- **State** $s_t \in \mathcal{S}$:
- **State** $s_t \in \mathcal{S}$:
The state is represented by a disjunctive graph, where:
- Operations are nodes
- Processing orders between operations are shown by directed arcs
- This graph encapsulates both the problem instance and the current partial schedule

- **Action** $a_t \in \mathcal{A}$:
- **Action** $a_t \in \mathcal{A}$:
An action involves selecting a feasible operation to assign to its designated machine, a process often referred to as dispatching. The action space consists of all operations that can be feasibly scheduled at the current state.

- **Transition** $\mathcal{T}$:
- **Transition** $\mathcal{T}$:
The transition function deterministically updates the disjunctive graph based on the dispatched operation. This includes:
- Modifying the graph's topology (e.g., adding new connections between operations)
- Updating operation attributes (e.g., start times)

- **Reward** $\mathcal{R}$:
- **Reward** $\mathcal{R}$:
The reward function is designed to align with the optimization objective. For instance, if minimizing makespan is the goal, the reward could be the negative change in makespan resulting from the latest action.

- **Policy** $\pi$:
- **Policy** $\pi$:
The policy, typically stochastic, takes the current disjunctive graph as input and outputs a probability distribution over feasible dispatching actions. This process continues until a complete schedule is constructed.


Expand All @@ -103,25 +103,25 @@ Electronic Design Automation (EDA) is a sophisticated process that involves the
EDA encompasses many problem types; here we'll focus on placement problems, which are fundamental in the physical design of integrated circuits and printed circuit boards. We'll use the Decap Placement Problem (DPP) as an example to illustrate a typical MDP formulation for EDA placement problems.


- **State** $s_t \in \mathcal{S}$:
- **State** $s_t \in \mathcal{S}$:
The state typically represents the current configuration of the design space, which may include:
- Locations of fixed elements (e.g., ports, keepout regions)
- Current placements of movable elements
- Remaining resources or components to be placed

- **Action** $a_t \in \mathcal{A}$:
- **Action** $a_t \in \mathcal{A}$:
An action usually involves placing a component at a valid location within the design space. The action space consists of all feasible placement locations, considering design rules and constraints.

- **Transition** $\mathcal{T}$:
- **Transition** $\mathcal{T}$:
The transition function updates the design state based on the placement action, which may include:
- Updating the placement map
- Adjusting available resources or remaining components
- Recalculating relevant metrics (e.g., wire length, power distribution)

- **Reward** $\mathcal{R}$:
- **Reward** $\mathcal{R}$:
The reward is typically based on the improvement in the design objective resulting from the latest placement action. This could involve metrics such as area efficiency, signal integrity, or power consumption.

- **Policy** $\pi$:
- **Policy** $\pi$:
The policy takes the current design state as input and outputs a probability distribution over possible placement actions.

Note that specific problems may introduce additional complexities or constraints.
Expand All @@ -142,26 +142,26 @@ In graph problems, we typically work with a graph $G = (V, E)$, where $V$ is a s

Graph problems can be effectively modeled using a Markov Decision Process (MDP) framework in a constructive fashion. Here, we outline the key components of the MDP formulation for graph problems:

- **State** $s_t \in \mathcal{S}$:
- **State** $s_t \in \mathcal{S}$:
The state encapsulates the current configuration of the graph and the optimization progress. It typically includes:
- The graph structure (vertices and edges)
- Attributes associated with vertices or edges
- The set of elements (vertices, edges, or subgraphs) selected so far
- Problem-specific information, such as remaining selections or resources

- **Action** $a_t \in \mathcal{A}$:
- **Action** $a_t \in \mathcal{A}$:
An action usually involves selecting a graph element (e.g., a vertex, edge, or subgraph). The action space comprises all valid selections based on the problem constraints and the current state.

- **Transition** $\mathcal{T}$:
- **Transition** $\mathcal{T}$:
The transition function $\mathcal{T}(s_t, a_t) \rightarrow s_{t+1}$ updates the graph state based on the selected action. This typically involves:
- Updating the set of selected elements
- Modifying graph attributes affected by the selection
- Updating problem-specific information (e.g., remaining selections or resources)

- **Reward** $\mathcal{R}$:
- **Reward** $\mathcal{R}$:
The reward function $\mathcal{R}(s_t, a_t)$ quantifies the quality of the action taken. It is typically based on the improvement in the optimization objective resulting from the latest selection. This could involve metrics such as coverage, distance, connectivity, or any other problem-specific criteria.

- **Policy** $\pi$:
- **Policy** $\pi$:
The policy $\pi(a_t|s_t)$ is a probability distribution over possible actions given the current state. It guides the decision-making process, determining which graph elements to select at each step to optimize the objective.

Specific problems may introduce additional complexities or constraints, which can often be incorporated through careful design of the state space, action space, and reward function.
Expand Down
2 changes: 1 addition & 1 deletion docs/content/intro/policies.md
Original file line number Diff line number Diff line change
Expand Up @@ -11,7 +11,7 @@ A policy $\pi$ is used to construct a solution from scratch for a given problem
An AR policy is composed of an encoder $f$ that maps the instance $\mathbf{x}$ into an embedding space $\mathbf{h}=f(\mathbf{x})$ and by a decoder $g$ that iteratively determines a sequence of actions $\mathbf{a}$ as follows:

$$
a_t \sim g(a_t | a_{t-1}, ... ,a_0, s_t, \mathbf{h}), \quad
a_t \sim g(a_t | a_{t-1}, ... ,a_0, s_t, \mathbf{h}), \quad
\pi(\mathbf{a}|\mathbf{x}) \triangleq \prod_{t=1}^{T-1} g(a_{t} | a_{t-1}, \ldots ,a_0, s_t, \mathbf{h}).
$$

Expand Down
2 changes: 1 addition & 1 deletion docs/content/intro/rl.md
Original file line number Diff line number Diff line change
Expand Up @@ -19,7 +19,7 @@ $$
\nabla_{\theta} \mathcal{L}_a(\theta|\mathbf{x}) = \mathbb{E}_{\pi(\mathbf{a}|\mathbf{x})} \left[(R(\mathbf{a}, \mathbf{x}) - b(\mathbf{x})) \nabla_{\theta}\log \pi(\mathbf{a}|\mathbf{x})\right],
$$

where $b(\cdot)$ is a baseline function used to stabilize training and reduce gradient variance.
where $b(\cdot)$ is a baseline function used to stabilize training and reduce gradient variance.

We also distinguish between two types of RL (pre)training:

Expand Down
4 changes: 2 additions & 2 deletions docs/content/start/hydra.md
Original file line number Diff line number Diff line change
Expand Up @@ -102,7 +102,7 @@ defaults:
This section sets the default configuration for the model, environment, callbacks, trainer, and logger. This means that if a key is not specified in the experiment configuration, the default value will be used. Note that these are set in the root [configs/](https://github.com/ai4co/rl4co/tree/main/configs) folder, and are useful for better organization and reusability.

```yaml linenums="11"
env:
env:
generator_params:
loc_distribution: "uniform"
num_loc: 50
Expand Down Expand Up @@ -153,7 +153,7 @@ logger:

Finally, this section specifies the logger configuration. In this case, we are using Weights & Biases (WandB) to log the results of the experiment. We specify the project name, tags, group, and name of the experiment.

That's it! 🎉
That's it! 🎉


!!! tip
Expand Down
4 changes: 2 additions & 2 deletions docs/hooks.py
Original file line number Diff line number Diff line change
Expand Up @@ -26,12 +26,12 @@ def on_startup(*args, **kwargs):
def append_tricks_to_readme(file_path):
# read the tricks from docs/overrides/fancylogo.txt
# and put them at the beginning of the file
with open("docs/overrides/fancylogo.txt", "r") as fancylogo:
with open("docs/overrides/fancylogo.txt") as fancylogo:
tricks = fancylogo.read()
if not os.path.exists(file_path):
print(f"Error: The file {file_path} does not exist.")
return
with open(file_path, "r") as original:
with open(file_path) as original:
data = original.read()
# remove first 33 lines. yeah, it's a hack to remove unneded stuff lol
data = "\n".join(data.split("\n")[33:])
Expand Down
8 changes: 4 additions & 4 deletions docs/js/autolink.js
Original file line number Diff line number Diff line change
Expand Up @@ -3,16 +3,16 @@ const convertLinks = ( input ) => {
let text = input;
const linksFound = text.match( /(?:www|https?)[^\s]+/g );
const aLink = [];

if ( linksFound != null ) {

for ( let i=0; i<linksFound.length; i++ ) {
let replace = linksFound[i];
if ( !( linksFound[i].match( /(http(s?)):\/\// ) ) ) { replace = 'http://' + linksFound[i] }
let linkText = replace.split( '/' )[2];
if ( linkText.substring( 0, 3 ) == 'www' ) { linkText = linkText.replace( 'www.', '' ) }
if ( linkText.match( /youtu/ ) ) {

let youtubeID = replace.split( '/' ).slice(-1)[0];
aLink.push( '<div class="video-wrapper"><iframe src="https://www.youtube.com/embed/' + youtubeID + '" frameborder="0" allow="accelerometer; autoplay; encrypted-media; gyroscope; picture-in-picture" allowfullscreen></iframe></div>' )
}
Expand All @@ -26,7 +26,7 @@ const convertLinks = ( input ) => {
text = text.split( linksFound[i] ).map(item => { return aLink[i].includes('iframe') ? item.trim() : item } ).join( aLink[i] );
}
return text;

}
else {
return input;
Expand Down
2 changes: 1 addition & 1 deletion docs/js/katex.js
Original file line number Diff line number Diff line change
@@ -1,4 +1,4 @@
document$.subscribe(({ body }) => {
document$.subscribe(({ body }) => {
renderMathInElement(body, {
delimiters: [
{ left: "$$", right: "$$", display: true },
Expand Down
28 changes: 14 additions & 14 deletions docs/overrides/fancylogo.txt
Original file line number Diff line number Diff line change
Expand Up @@ -2,16 +2,16 @@
hide:
- navigation
- toc
---
---

<div>
<div>
<style type="text/css">
.md-typeset h1,
.md-content__button {
display: none;
}
</style>
</div>
</style>
</div>


<div class="md-content" data-md-component="content">
Expand Down Expand Up @@ -83,11 +83,11 @@ hide:
const setContainerDimensions = () => {
const container = document.querySelector('.md-main__inner #particles-container');
const mainContent = document.querySelector('.md-main__inner');

if (mainContent && container) {
const containerWidth = mainContent.offsetWidth;
container.style.width = `${containerWidth}px`;

// Calculate height based on the aspect ratio and 60% width
const imageWidth = containerWidth * 0.6;
const imageHeight = imageWidth * ASPECT_RATIO;
Expand All @@ -103,16 +103,16 @@ hide:
const backgroundColor = computedStyle.backgroundColor;

mask.style.background = `
linear-gradient(to right,
${backgroundColor} 0%,
rgba(0,0,0,0) 10%,
rgba(0,0,0,0) 90%,
linear-gradient(to right,
${backgroundColor} 0%,
rgba(0,0,0,0) 10%,
rgba(0,0,0,0) 90%,
${backgroundColor} 100%
),
linear-gradient(to bottom,
${backgroundColor} 0%,
rgba(0,0,0,0) 10%,
rgba(0,0,0,0) 90%,
linear-gradient(to bottom,
${backgroundColor} 0%,
rgba(0,0,0,0) 10%,
rgba(0,0,0,0) 90%,
${backgroundColor} 100%
)
`;
Expand Down
Loading