Skip to content

Training Method

ai-lab-projects edited this page Apr 29, 2025 · 1 revision

Training Method

The training procedure consists of separately training two Deep Q-Network (DQN) agents:

  • Buyer Agent: Decides whether to buy or wait.
  • Seller Agent: After a buy, decides whether to sell or hold.

Overview

1. Buyer Agent

  • Input: Normalized recent closing prices (look-back window).
  • Action Space:
    • 0 = Wait
    • 1 = Buy
  • Reward:
    • 0 if waiting.
    • Trading profit if a successful sell occurs.

2. Seller Agent

  • Input:
    • Return from buy price (%)
    • Return from average price (%)
    • RSI (Relative Strength Index)
    • Elapsed holding time (log-scaled)
  • Action Space:
    • 0 = Hold
    • 1 = Sell
  • Reward:
    • 0 if holding.
    • Profit percentage if selling.

Training Strategy

Replay Memory

  • Experiences (state, action, reward, next state, done) are stored in replay buffers.
  • Random batches are sampled for training to break correlations.

Target Network

  • Each agent maintains a separate target network.
  • Target networks are updated periodically to stabilize training.

Epsilon-Greedy Exploration

  • Agents initially choose random actions with high probability.
  • The exploration rate epsilon decays over time, encouraging exploitation of learned policies.

Optimization

  • Loss function: selected from mean squared error, mean absolute error, Huber loss, etc. (randomized).
  • Optimizer: selected from Adam, RMSProp, SGD, etc. (randomized).

Episode Structure

  • For each episode:
    1. Buyer agent searches for buying opportunities.
    2. Upon buying, seller agent manages the holding period and sells.
    3. Buyer and seller agents are updated during the episode using collected experiences.

Hyperparameter Randomization

  • Hyperparameters like number of nodes, batch size, learning rate, optimizer type, and activation function are randomly selected for each trial to perform a basic hyperparameter search.
Clone this wiki locally