Using Transformers to Model Symbol Sequences with Memory
.
├── models/
│   ├── attention.py     # Attention mechanism implementation
│   ├── mlp.py           # MLP layer implementation
│   ├── transformer.py   # Main transformer model
│   ├── autoencoder.py   # Base autoencoder implementation
│   └── utils.py         # Utility functions for model operations
├── training/
│   ├── trainer.py        # Training infrastructure
│   ├── loss.py           # Loss functions and metrics
│   ├── sae_trainer.py    # Sparse autoencoder training infrastructure
│   └── utils.py          # Utility functions for model saving/loading
└── examples/
    └── cyclic_sequence/  # Example of training on cyclic sequences
        ├── README.md     # Example-specific documentation
        ├── train_cyclic.py  # Training script for transformer
        ├── train_sae.py     # Training script for sparse autoencoder
        ├── sae_mechanistic_intervention.py  # Intervention experiments
        └── check_hallucinations.py  # Hallucination testing
First change into the SymbolicMemory directory and follow the instructions below to set up the repository:
- 
Install uv:curl -LsSf https://astral.sh/uv/install.sh | sh
- 
Install Python 3.11 using uv:uv python install 3.11 
- 
Create a virtual environment with Python 3.11 and sync: uv venv --python 3.11 source .venv/bin/activate
- 
Install the package: - For CPU-only (default, works on all platforms):
uv sync 
- For CUDA 12.4 support (Linux only):
uv sync --extra cuda 
 
- For CPU-only (default, works on all platforms):
The project supports both CPU and GPU acceleration:
- CPU Support: Works on all platforms (Linux, macOS, Windows)
- GPU Support: Available on Linux systems with CUDA 12.4
- You can check available devices with:
import jax print(jax.devices()) # Shows available devices (CPU/GPU) 
 
- You can check available devices with:
This example demonstrates training a transformer to predict the next token in a repeating sequence, and then training a sparse autoencoder to analyze its internal representations.
- First, train the transformer model:
python examples/cyclic_sequence/train_cyclic.py- Then, train the sparse autoencoder on the transformer's activations:
python examples/cyclic_sequence/train_sae.py- Finally, run mechanistic interventions using the trained models:
python examples/cyclic_sequence/sae_mechanistic_intervention.pyThe scripts will:
- Generate cyclic sequence datasets
- Train the transformer model
- Train the sparse autoencoder on transformer activations
- Perform mechanistic interventions to analyze the model's behavior
- Show attention and activation visualizations
- Plot training metrics
- 
Transformer Model ( models/transformer.py):- Implements a simple transformer architecture
- Handles sequence prediction tasks
- Supports mechanistic interventions
 
- 
Sparse Autoencoder ( models/autoencoder.py):- Implements a sparse autoencoder architecture
- Trains on transformer layer activations
- Supports expansion factors for different inflation ratios
 
- 
Training Infrastructure ( training/):- trainer.py: Base training infrastructure
- sae_trainer.py: Specialized trainer for sparse autoencoders
- loss.py: Loss functions for both models
- utils.py: Model saving/loading utilities
 
You can modify the following parameters in the training scripts:
- 
In examples/cyclic_sequence/train_cyclic.py:- Model dimensions
- Number of layers
- Training steps
- Learning rate
 
- 
In examples/cyclic_sequence/train_sae.py:- Expansion factor
- Layer to analyze
- Training steps
- Batch size
 
- 
In examples/cyclic_sequence/sae_mechanistic_intervention.py:- Intervention strength
- Number of candidate indices
- Sequence generation length