Currently transformers are very powerful tools used across various tasks. Since many current projects run with transformers, I want to learn exactly how they work and deepen my knowledge of machine learning in general. The main purpose and strength of transformers are clearly LLM's, but what else can it do? That is what I want to explore in this project where we are going to transform a time series into a probability distribution and finally use it to calculate a portfolio allocation.
attention-stock-predictor is a PyTorch-based transformer model designed for experimenting with stock market predictions using multi-stock OHLCV data. Although it does not reliably predict next-day returns, it showcases advanced techniques like probability distribution modeling, Wasserstein loss, and allocation-based evaluation—highlighting technical depth in applying modern deep learning to financial time series.
🔎 Experiment-focused: Learn how attention behaves on time series 📊 Financial data pipeline: Automated OHLCV download via Alpha Vantage 🎯 Probability modeling: Target distributions instead of point predictions 📐 Wasserstein loss: More meaningful training for distributions 📈 Portfolio-style evaluation: Allocation, returns, drawdown, Sharpe ratio 🛠️ Custom optimizer: Arbitrary momentum scheduling
- Quickstart
- API Key Setup
- Features
- Project Structure
- Configuration
- Data Pipeline
- What You’ll Learn
- Evaluation
- Example Outputs
- Troubleshooting
- License
git clone https://github.com/SATheinen/attention-stock-predictor.git
cd attention-stock-predictorTested in Python 3.12.5
python -m venv venv
source venv/bin/activate
pip install -r requirements.txtecho 'api_key = "YOUR_KEY"' > api_key.txtcd data/
python get_data.pycd ../model/
jupyter-notebook lstm.ipynbTo download or update stock data, this project uses the Alpha Vantage API. You need to provide your own API key.
- Register at Alpha Vantage
- Create a file named
api_key.txtin the project root - Paste your API key into this file
api_key = "************"Important: Do not share this file or include it in version control.
- Transformer model for multi-stock prediction
- Sequence-to-distribution prediction using triangular probability targets
- Wasserstein loss for training
- Forward testing via allocation-based evaluation
- Custom optimizer with arbitrary momentum scheduling
- Metrics: annualized return, drawdown, Sharpe ratio
.
├── api_key.txt # User-provided API key for Alpha Vantage
├── data/
│ ├── data_dump/ # Saved OHLCV datasets
│ ├── get_data.py # Script to fetch stock data
│ └── stock_names.txt # List of stocks to fetch
├── model/
│ └── lstm.ipynb # Main notebook with model training
├── requirements.txt # Python dependencies
└── README.md # This file
- API Key:
api_key.txt - List of Stocks:
data/stock_names.txt - Dataset Path:
data/data_dump/ - All hyperparameters and training controls are in the notebook.
- Stock data (OHLCV) is downloaded via Alpha Vantage.
- Daily percentage changes are computed.
- Hard classification targets are mapped to triangular distributions.
- A sequence of metrics across all stocks is fed to the model.
- Training occurs in two phases:
- On training set for several rounds
- One forward online learning run on unseen test set
- How to preprocess financial data into model-ready sequences
- Why triangular probability targets can help vs. one-hot
- How to apply causal attention in a non-language setting
- How to evaluate models like portfolios, not just with accuracy
- Convert output distributions to stock allocations
- Compute:
- Daily returns
- Annualized return
- Maximum drawdown
- Sharpe ratio
- Evaluation handled by external post-processing script
(plots and results will be generated in the notebook)
Relative allocations over time:
Returns and optimal leverage heatmaps:
⏳ Slow training → MPS/CPU fallback, try fewer epochs 🔑 API errors → Check Alpha Vantage key & rate limits 🗒 Empty data → Verify tickers in stock_names.txt
MIT License – free to use, adapt, and learn from.

