Skip to content

replaces sae-auto-interp with delphi#113

Draft
anthonyduong9 wants to merge 1 commit intomainfrom
replace-sae-auto-interp-with-eai-delphi
Draft

replaces sae-auto-interp with delphi#113
anthonyduong9 wants to merge 1 commit intomainfrom
replace-sae-auto-interp-with-eai-delphi

Conversation

@anthonyduong9
Copy link
Contributor

@anthonyduong9 anthonyduong9 commented Jun 6, 2025

Problem

We depend on a fork at EleutherAI/delphi@8a6307e.

Fix

  • Updating the response to match eai-delphi 0.0.2, in fuzz-detection.yaml.
  • From schemas, running make setup-all-autointerp VERSION=2.0.0.
  • Updating code in apps/autointerp/neuronpedia_autointerp to match eai-delphi 0.0.2.
  • Updating apps/autointerp/pyproject.toml.

Testing

Brief description of testing that is done or added. If there are multiple or subpoints, use bullet points.

  • I changed unit tests, and changed the code until they passed.
  • From apps/autointerp, I ran poetry run uvicorn server:app --host 0.0.0.0 --port 5003 --workers 1, and made requests to POST /score/fuzz-detection, POST /score/embedding, and POST /explain/default, before and after the changes.

Fixes #27

@codecov
Copy link

codecov bot commented Jun 6, 2025

Codecov Report

All modified and coverable lines are covered by tests ✅

Project coverage is 7.93%. Comparing base (45ce350) to head (1582c9f).
Report is 3 commits behind head on main.

Additional details and impacted files
@@           Coverage Diff            @@
##            main    #113      +/-   ##
========================================
- Coverage   7.98%   7.93%   -0.05%     
========================================
  Files        121     121              
  Lines      17040   17104      +64     
  Branches     362     361       -1     
========================================
- Hits        1360    1358       -2     
- Misses     15669   15735      +66     
  Partials      11      11              
Flag Coverage Δ
autointerp 100.00% <100.00%> (ø)
inference 37.77% <ø> (ø)
webapp 5.19% <ø> (-0.03%) ⬇️

Flags with carried forward coverage won't be shown. Click here to find out more.

☔ View full report in Codecov by Sentry.
📢 Have feedback on the report? Share it here.

🚀 New features to boost your workflow:
  • ❄️ Test Analytics: Detect flaky tests, report on failures, and find test suite problems.
  • 📦 JS Bundle Analysis: Save yourself from yourself by tracking and limiting bundle sizes in JS merges.

@anthonyduong9 anthonyduong9 changed the title replaces sae-autointerp with delphi replaces sae-auto-interp with delphi Jun 6, 2025
Copy link

@SrGonao SrGonao left a comment

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

I think I caught everything that I found strange

example = Example(
tokens=activation.tokens, # type: ignore
activations=torch.tensor(activation.values),
str_tokens=activation.tokens,
Copy link

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

This does not seem correct. We are expecting str_tokens to be the decoded tokens into strings. Do you have access to those in your request?

Copy link
Contributor Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

activations.tokens is a list of strings. I think what's confusing is that we're able to pass activations.tokens to Example.tokens despite https://github.com/EleutherAI/delphi/blob/db49cb78120c1926a4a3c4928c76ece6be64dcb3/delphi/latents/latents.py#L66-L73. But https://github.com/EleutherAI/delphi/blob/db49cb78120c1926a4a3c4928c76ece6be64dcb3/delphi/scorers/embedding/embedding.py#L111-L121 shows that Example.tokens can either be a list of integers or strings.

Do we want to keep both Example.tokens and Example.str_tokens in delphi but make changes so tokens can only be a list of strings?

If so, I can update the requests in this repo, in a separate PR, and make the changes to delphi as well.

Copy link

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Yea this is definitely a mistake on our part that we let pass. Example.tokens should definitely only be a list of integers and Example.str_tokens the corresponding strings. In this case, because you don't care about the integers you can just pass a dummy list?

example = ActivatingExample(
tokens=activation.tokens, # type: ignore
activations=torch.tensor(activation.values),
str_tokens=activation.tokens,
Copy link

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Same thing here

Copy link
Contributor Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Similar comment as to #113 (comment).

@anthonyduong9
Copy link
Contributor Author

@hijohnnylin this PR was ready to merge before EleutherAI/delphi#133 was merged. But this PR isn't ready because the latter PR contains a bunch of commits. Feel free to close or edit this.

Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment

Labels

None yet

Projects

None yet

Development

Successfully merging this pull request may close these issues.

[autointerp] - get on latest stable release of delphi

2 participants