In this project we have calculated the optimal asset allocation for each day (dynamically) with minimised risk and maximised profit using Deep Reinforcement Learning
Developed a model utilizing the Deep Deterministic Policy Gradient (DDPG) Algorithm, i.e. Q-learning with Off-Policy Gradient to optimize portfolio construction that is a continuous space, by dynamically allocating assets based on daily volatility, aiding in mitigating financial crises.
Trained two deep neural networks (DNNs) and constructed a custom Trading Environment incorporating various constraints and investor risk metrics. One DNN interact with this environment to determine optimal asset allocations for portfolio formation, the other acts like a critic to evaluate the weight performance that should maximize the Q-function.
Implemented Ornstein-Uhlenbeck noise decay to encourage exploration of the Trading Environment initially, transitioning gradually towards exploitation.
Incorporated an Exponential Moving Average baseline for rewards as a performance benchmark to enhance reward optimization.
Achieved these results (Testing Period : Aug 2022 to March 2024) :-
- Cumulative Return - 55.18%
- CAGR - 20.16%
- Sharpe - 2.4
- Annual Volatility - 11.65%
- Sortino - 3.37
- Calmar Ratio - 2.22
- Ulcer Index - 0.03
- Kelly Criterion - 19.42%
- Daily Value at Risk (VaR) - (-1.1%)
- Expected Shortfall (cVaR) - (-1.1%)
- Max Drawdown - (-9.06%)
- Average Drawdown - (-1.81%)
When benchmark - Nifty 50, Beta - 0.81 and Alpha - 0.15 and Treynor Ratio - 68.53%