This repo provides the code for the paper Incremental Sentence Processing Mechanisms in Autoregressive Transformer Language Models. Using this code, you should be able to replicate all of the experiments in the paper, though some of the data used (in particular the Penn Treebank) is not publicly / freely available, and external data (that does not originate from this project) should be downloaded from its source.
To replicate the results of the paper, please do the following. Note that these commands, will, for the most part, just gather results, not plot them; see the next section for plotting.
- Using conda, create an environment from the
environment.ymlfile. - (Section 4.1, Figure 2): Run
python behavioral_evaluation.py. For results with Gemma (Section D.1, Figure 7), runpython behavioral_evaluation.py --model_name google/gemma-2-2b. - (Section 4.2, Figure 3, Figure 9-12, Table 2):
- Make sure that you've gotten the contents of the submodule
feature-circuits-gp(as well as its submodule,dictionary_learning). To do this, usegit submodule update --init --recursive - Download the
pythia-70m-dedupedSAEs from here intofeature-learning-gp/dictionaries. For more details, seedictionary_learning/ its README file. - Run
feature-circuits-gp/scripts/get_circuit_garden_path.sh. The annotation of each figure comes from the annotation filefeature-circuits-gp/annotations/pythia-70m-deduped.jsonl. Figures 9-12 will be output infeature-circuits-gp/circuits/figures; however, note that our Figure 3 was constructed manually.- The same script can be run for Gemma (just change the model in the script, but also consider changing the batch size). These SAEs will download automatically, but be forewarned that they are large.
- Use the notebook
annotate_dashboard.ipynbto annotate the features manually. Note that you'll have to add your own Neuronpedia API key to do so. - To compute faithfulness, please use
feature-circuits-gp/scripts/evaluate_circuit.sh.
- Make sure that you've gotten the contents of the submodule
- (Section 4.3, Figure 4): Run
python causal_analysis.py. For results with Gemma (Section D.1, Figure 8), runpython causal_analysis.py --model_name google/gemma-2-2b. Note that this depends on the filesresults/<model>/npz_features.csvandresults/<model>/npz_features.csv, which we crafted manually.- (Appendix C, Figure 6) For large-scale results, first download this file from the SAP Benchmark to
data_csv. Then runpython causal_analysis_largescale.py(pythia-70m-dedupedonly).
- (Appendix C, Figure 6) For large-scale results, first download this file from the SAP Benchmark to
- (Section 5.1): Run
get_compare_activations.pyto get the values discussed in the paper (pythia-70m-dedupedonly). - (Section 5.2): We provide the probes trained on
pythia-70m-dedupedneeded for the structural probing experiments at this link; you just need to download the probes and put them in a folder calledstandalone_probes. So, feel free to do that and skip steps 1 and 2.- (Optional) First, train the structural probes on
pythia-70m-deduped.- For this, you will need (our fork of)
incremental_parse_probe. Go to that fork, and create a new conda environment based on theenvironment.ymlfile there. - Get a copy of Penn TreeBank, and put it in
incremental_parse_probe. - Generate the data splits by running
incremental_parse_probe/convert_splits_to_depparse.sh. Then follow the steps in the Preprocessing section of that repo's README (involves using Stanford CoreNLP + Java and is rather complicated). - Finally, run
incremental_parse_probe/inter_train_pythia_deduped.sh. - Copy the last checkpoint of each probe (in
incremental_parse_probe/experiment_checkpoints/eval/pythia-70m-deduped/StackActionProbe/layer_<layer>) intostandalone_probes. The probes should be namedembeddings.pt,layer0.pt, ...,layer5.pt; note thatembeddingscorresponds tolayer_0in theexperiment_checkpointsfolder.
- For this, you will need (our fork of)
- (Optional, Figure 14) Second, evaluate the probes by running
incremental_parse_probe/iter_eval_pythia_deduped.sh. Copy the files inincremental_parse_probe/resultstoresults/pythia-70m-deduped/parse_probe/performance/. - (Figure 5, Appendix F, Figure 15): Run
parseprobe_behavior.py. - (Appendix F.4, Figure 16): Run
parseprobe_attribution.py
- (Optional) First, train the structural probes on
- (Section 6.1, Table 4): To evaluate on the reading comprehension questions, use the
readingcomp_evaluation.pyscript. By default, this will evaluate Gemma 2 (2B) ondata_csv/garden_path_samelen_readingcomp.csv. Use the--modelargument to change the model (this takes a HuggingFace model identifier) or the--dataargument to change the dataset to a different .csv fromdata_csv/. - (Section 6.2): To discover a feature circuit for reading comprehension, use
feature-circuits-gp/scripts/get_circuit_garden_path.sh, but using the garden path sentences as the data. To compute feature overlaps, usefeature-circuits-gp/feature_overlap.py. This script takes in the nodes from two circuits discovered using the circuit discovery scripts infeature-circuits-gp/scripts/. Only the nodes are needed for this analysis.
Once you've run the corresponding line above, you can create each figure by running the following files within plotting/:
- Figure 1, 9, 13: Manually created
- Figure 2, 7:
plotting/behavioral-subplots-difference.py - Figure 4, 8:
plotting/causal-subplots-difference.py - Figure 5, 15:
plotting/parse-probe-behavior.py - Figure 6:
plotting/causal-subplots-largescale-difference.py - Figure 14:
plotting/parse-probe-performance.py - Figure 16:
plotting/parse-probe-overlap.py
As part of this project, we created the following data files:
data_csv/gp_same_len.csv: An edit of Arehalli et al.'s (2022) dataset containing both ambiguous and unambiguous sentences, all of the same length. Note that our unambiguous sentences are unambiguous because of the verb used not, e.g. because of an added comma.data_csv/garden_path_readingcomp.csv: An adaptation of the above dataset containing complete garden path sentences and follow-up questions.data_csv/garden_path_samelen_readingcomp.csv: A version of the above dataset containing garden path sentences that each contain the same number of words. This version of the dataset enables us to analyze which sparse features are most influential at specific token positions.
This paper is only available as a preprint for now; you can cite it like this:
@misc{hanna2024incremental,
title={Incremental Sentence Processing Mechanisms in Autoregressive Transformer Language Models},
author={Michael Hanna and Aaron Mueller},
year={2024},
eprint={2412.05353},
archivePrefix={arXiv},
primaryClass={cs.CL},
url={https://arxiv.org/abs/2412.05353},
}
We release our materials under an MIT license.