-
Notifications
You must be signed in to change notification settings - Fork 0
Training Method
ai-lab-projects edited this page Apr 29, 2025
·
1 revision
The training procedure consists of separately training two Deep Q-Network (DQN) agents:
- Buyer Agent: Decides whether to buy or wait.
- Seller Agent: After a buy, decides whether to sell or hold.
- Input: Normalized recent closing prices (look-back window).
-
Action Space:
- 0 = Wait
- 1 = Buy
-
Reward:
- 0 if waiting.
- Trading profit if a successful sell occurs.
-
Input:
- Return from buy price (%)
- Return from average price (%)
- RSI (Relative Strength Index)
- Elapsed holding time (log-scaled)
-
Action Space:
- 0 = Hold
- 1 = Sell
-
Reward:
- 0 if holding.
- Profit percentage if selling.
- Experiences (state, action, reward, next state, done) are stored in replay buffers.
- Random batches are sampled for training to break correlations.
- Each agent maintains a separate target network.
- Target networks are updated periodically to stabilize training.
- Agents initially choose random actions with high probability.
- The exploration rate epsilon decays over time, encouraging exploitation of learned policies.
- Loss function: selected from mean squared error, mean absolute error, Huber loss, etc. (randomized).
- Optimizer: selected from Adam, RMSProp, SGD, etc. (randomized).
- For each episode:
- Buyer agent searches for buying opportunities.
- Upon buying, seller agent manages the holding period and sells.
- Buyer and seller agents are updated during the episode using collected experiences.
- Hyperparameters like number of nodes, batch size, learning rate, optimizer type, and activation function are randomly selected for each trial to perform a basic hyperparameter search.