explanare

Jing Huang explanare

Achievements

ravel ravel Public

Evaluate interpretability methods on localizing and disentangling concepts in LLMs.

Python 46 7
verbatim-memorization verbatim-memorization Public

Demystifying Verbatim Memorization in Large Language Models

Python 4 2
eval-neuron-explanation eval-neuron-explanation Public

A framework for evaluating auto-interp pipelines, i.e., natural language explanations of neurons.

Python 2
char-iit char-iit Public

A causal intervention framework to learn robust and interpretable character representations inside subword-based language models

Jupyter Notebook 1
sail-blog-new-post sail-blog-new-post Public

Forked from StanfordVL/sail-blog-new-post

The repository for making new post submissions to the SAIL Blog

HTML