Titan-MAC is a PyTorch implementation of a novel neural network architecture inspired by the Titans model, incorporating a unique long-term memory mechanism. This allows the model to retain and utilize information over extended sequences, enhancing its ability to learn complex patterns and generate coherent text. The core innovation is the Memory as Context (MAC) module, which enables the network to dynamically query and integrate information from its learned memory into its current processing. This is a really quick implementation, do NOT expect much from it
- Long-Term Memory: A trainable memory module that stores and updates important information over time.
- Memory as Context (MAC): A mechanism for dynamically integrating memory into the model's processing.
- Multi-Head Attention: Enables the model to focus on different parts of the input and memory.
- Feedforward Network (FFN): Enhances the model's ability to learn complex relationships.
- Training Script: Includes a
titans_training.py
script for training the model on text datasets. - Text Generation: The trained model can generate coherent and contextually relevant text.
The model consists of the following key components:
- Embedding Layer: Converts input tokens into vector representations.
- TitansMAC Layers: A stack of layers, each containing:
- Long-Term Memory: Processes and updates the memory based on the current input.
- Memory Query: Generates a query vector to retrieve relevant information from the memory.
- Multi-Head Attention: Attends to the input, memory, and a set of persistent memory tokens.
- Feedforward Network: Further processes the attended information.
- Output Layer: Projects the final hidden state to the vocabulary space, producing probabilities for the next token.
-
Clone the repository:
git clone https://github.com/mattjohnpowell/Titans-MAC cd Titans-MAC
-
Install dependencies:
pip install -r requirements.txt
-
Prepare your dataset: The training script expects a text dataset in a format compatible with the Hugging Face
datasets
library (e.g., Wikitext). You can modify thetrain_and_test_titans
function intitans_training.py
to use a different dataset. -
Run the training script:
python titans_demo.py
This will train the model and save checkpoints periodically.
After training, you can generate text using the generate
function in titans_training.py
. You can modify the prompts in the main script to experiment with different starting points.
from titans_training import TitansTrainer
# ... (load your trained model) ...
trainer = TitansTrainer(model, vocab_size, tokenizer)
trainer.load_state_dict(torch.load("path/to/your/model.pt")) # Load the trained model
prompt = "Artificial intelligence is"
generated_text = trainer.generate(prompt, max_length=100, temperature=0.7)
print(generated_text)