Training Method

Jump to bottom

ai-lab-projects edited this page Apr 29, 2025 · 1 revision

Training Method

The training procedure consists of separately training two Deep Q-Network (DQN) agents:

Buyer Agent: Decides whether to buy or wait.
Seller Agent: After a buy, decides whether to sell or hold.

Overview

1. Buyer Agent

Input: Normalized recent closing prices (look-back window).
Action Space:
- 0 = Wait
- 1 = Buy
Reward:
- 0 if waiting.
- Trading profit if a successful sell occurs.

2. Seller Agent

Input:
- Return from buy price (%)
- Return from average price (%)
- RSI (Relative Strength Index)
- Elapsed holding time (log-scaled)
Action Space:
- 0 = Hold
- 1 = Sell
Reward:
- 0 if holding.
- Profit percentage if selling.

Training Strategy

Replay Memory

Experiences (state, action, reward, next state, done) are stored in replay buffers.
Random batches are sampled for training to break correlations.

Target Network

Each agent maintains a separate target network.
Target networks are updated periodically to stabilize training.

Epsilon-Greedy Exploration

Agents initially choose random actions with high probability.
The exploration rate epsilon decays over time, encouraging exploitation of learned policies.

Optimization

Loss function: selected from mean squared error, mean absolute error, Huber loss, etc. (randomized).
Optimizer: selected from Adam, RMSProp, SGD, etc. (randomized).

Episode Structure

For each episode:
1. Buyer agent searches for buying opportunities.
2. Upon buying, seller agent manages the holding period and sells.
3. Buyer and seller agents are updated during the episode using collected experiences.

Hyperparameter Randomization

Hyperparameters like number of nodes, batch size, learning rate, optimizer type, and activation function are randomly selected for each trial to perform a basic hyperparameter search.