Skip to content
Open
Show file tree
Hide file tree
Changes from all commits
Commits
File filter

Filter by extension

Filter by extension


Conversations
Failed to load comments.
Loading
Jump to
Jump to file
Failed to load files.
Loading
Diff view
Diff view
4 changes: 2 additions & 2 deletions docs/component/rl/framework.rst
Original file line number Diff line number Diff line change
Expand Up @@ -3,7 +3,7 @@ The Framework of QlibRL

QlibRL contains a full set of components that cover the entire lifecycle of an RL pipeline, including building the simulator of the market, shaping states & actions, training policies (strategies), and backtesting strategies in the simulated environment.

QlibRL is basically implemented with the support of Tianshou and Gym frameworks. The high-level structure of QlibRL is demonstrated below:
QlibRL is basically implemented with the support of Tianshou and Gymnasium frameworks. The high-level structure of QlibRL is demonstrated below:

.. image:: ../../_static/img/QlibRL_framework.png
:width: 600
Expand All @@ -15,7 +15,7 @@ EnvWrapper
------------
EnvWrapper is the complete capsulation of the simulated environment. It receives actions from outside (policy/strategy/agent), simulates the changes in the market, and then replies rewards and updated states, thus forming an interaction loop.

In QlibRL, EnvWrapper is a subclass of gym.Env, so it implements all necessary interfaces of gym.Env. Any classes or pipelines that accept gym.Env should also accept EnvWrapper. Developers do not need to implement their own EnvWrapper to build their own environment. Instead, they only need to implement 4 components of the EnvWrapper:
In QlibRL, EnvWrapper is a subclass of gymnasium.Env, so it implements all necessary interfaces of gymnasium.Env. Any classes or pipelines that accept gymnasium.Env should also accept EnvWrapper. Developers do not need to implement their own EnvWrapper to build their own environment. Instead, they only need to implement 4 components of the EnvWrapper:

- `Simulator`
The simulator is the core component responsible for the environment simulation. Developers could implement all the logic that is directly related to the environment simulation in the Simulator in any way they like. In QlibRL, there are already two implementations of Simulator for single asset trading: 1) ``SingleAssetOrderExecution``, which is built based on Qlib's backtest toolkits and hence considers a lot of practical trading details but is slow. 2) ``SimpleSingleAssetOrderExecution``, which is built based on a simplified trading simulator, which ignores a lot of details (e.g. trading limitations, rounding) but is quite fast.
Expand Down
2 changes: 1 addition & 1 deletion pyproject.toml
Original file line number Diff line number Diff line change
Expand Up @@ -45,7 +45,7 @@ dependencies = [
"pymongo",
"loguru",
"lightgbm",
"gym",
"gymnasium",
"cvxpy",
"joblib",
"matplotlib",
Expand Down
4 changes: 2 additions & 2 deletions qlib/rl/interpreter.py
Original file line number Diff line number Diff line change
Expand Up @@ -5,9 +5,9 @@

from typing import Any, Generic, TypeVar

import gym
import gymnasium as gym
import numpy as np
from gym import spaces
from gymnasium import spaces

from qlib.typehint import final
from .simulator import ActType, StateType
Expand Down
2 changes: 1 addition & 1 deletion qlib/rl/order_execution/interpreter.py
Original file line number Diff line number Diff line change
Expand Up @@ -8,7 +8,7 @@

import numpy as np
import pandas as pd
from gym import spaces
from gymnasium import spaces

from qlib.constant import EPS
from qlib.rl.data.base import ProcessedDataProvider
Expand Down
4 changes: 2 additions & 2 deletions qlib/rl/order_execution/policy.py
Original file line number Diff line number Diff line change
Expand Up @@ -6,11 +6,11 @@
from pathlib import Path
from typing import Any, Dict, Generator, Iterable, Optional, OrderedDict, Tuple, cast

import gym
import gymnasium as gym
import numpy as np
import torch
import torch.nn as nn
from gym.spaces import Discrete
from gymnasium.spaces import Discrete
from tianshou.data import Batch, ReplayBuffer, to_torch
from tianshou.policy import BasePolicy, PPOPolicy, DQNPolicy

Expand Down
4 changes: 2 additions & 2 deletions qlib/rl/utils/env_wrapper.py
Original file line number Diff line number Diff line change
Expand Up @@ -6,8 +6,8 @@
import weakref
from typing import Any, Callable, cast, Dict, Generic, Iterable, Iterator, Optional, Tuple

import gym
from gym import Space
import gymnasium as gym
from gymnasium import Space

from qlib.rl.aux_info import AuxiliaryInfoCollector
from qlib.rl.interpreter import ActionInterpreter, ObsType, PolicyActType, StateInterpreter
Expand Down
2 changes: 1 addition & 1 deletion qlib/rl/utils/finite_env.py
Original file line number Diff line number Diff line change
Expand Up @@ -13,7 +13,7 @@
from contextlib import contextmanager
from typing import Any, Callable, Dict, Generator, List, Optional, Set, Tuple, Type, Union, cast

import gym
import gymnasium as gym
import numpy as np
from tianshou.env import BaseVectorEnv, DummyVectorEnv, ShmemVectorEnv, SubprocVectorEnv

Expand Down
2 changes: 1 addition & 1 deletion scripts/collect_info.py
Original file line number Diff line number Diff line change
Expand Up @@ -45,7 +45,7 @@ def qlib(self):
"pymongo",
"loguru",
"lightgbm",
"gym",
"gymnasium",
"cvxpy",
"joblib",
"matplotlib",
Expand Down
File renamed without changes.
File renamed without changes.
File renamed without changes.
2 changes: 1 addition & 1 deletion tests/rl/test_finite_env.py
Original file line number Diff line number Diff line change
Expand Up @@ -3,7 +3,7 @@

from collections import Counter

import gym
import gymnasium as gym
import numpy as np
from tianshou.data import Batch, Collector
from tianshou.policy import BasePolicy
Expand Down
4 changes: 2 additions & 2 deletions tests/rl/test_logger.py
Original file line number Diff line number Diff line change
Expand Up @@ -7,10 +7,10 @@
import re
from typing import Any, Tuple

import gym
import gymnasium as gym
import numpy as np
import pandas as pd
from gym import spaces
from gymnasium import spaces
from tianshou.data import Collector, Batch
from tianshou.policy import BasePolicy

Expand Down
2 changes: 1 addition & 1 deletion tests/rl/test_trainer.py
Original file line number Diff line number Diff line change
Expand Up @@ -7,7 +7,7 @@

import torch
import torch.nn as nn
from gym import spaces
from gymnasium import spaces
from tianshou.policy import PPOPolicy

from qlib.config import C
Expand Down
Loading