fix bugs in _prepare_text and IntruderScorer prompt formatting #153

d0rbu · 2025-09-18T17:24:36Z

in delphi.scorers.classifier.sample._prepare_text, i believe i've found a bug when calling it with n_incorrect > 1(i.e. producing a false positive sample by incorrectly highlighting more than 1 non-activating token). i've added a comment to the location of the bug with an explanation.

additionally, in delphi.scorers.classifier.intruder.IntruderScorer._build_prompt (the method used to format a list of intruder examples into a prompt for the scorer model), there is an inconsistency with few-shot examples that i've also highlighted in a comment

there are also a bunch of smaller stylistic/readability changes, feel free to push back on these.

…e positives

d0rbu · 2025-09-18T17:27:04Z

delphi/scorers/classifier/sample.py

+            remaining_tokens_below_threshold.remove(token_pos)
+
            random_indices.extend(
-                random.sample(below_threshold.tolist(), n_incorrect - 1)


in the old code, this could result in token_pos being selected again since it's still in below_threshold. then, after being turned into a set, random_indices would have one less element than expected, resulting in one fewer token being incorrectly highlighted than was specified by n_incorrect

CLAassistant · 2025-09-18T18:02:26Z

All committers have signed the CLA.

delphi/scorers/classifier/intruder.py

SrGonao

I would rather not have all these different changes in the same pull request, but we can do it like this. I left some comments on things I didnt quite understand but most of it seems fine. Will wait on your updates. Reach me on discord if you want to chat more about delphi

SrGonao · 2025-10-01T09:45:01Z

delphi/pipeline.py

-                else:
-                    pass
+                if result is None:
+                    break


Why would we want to break

once a pipe returns None, we iterate through the rest of the pipes and simply hit the pass until we return the result, which is None. this skips that pointless extra iteration; perhaps it may be clearer if we replace break with return result or return None?

SrGonao · 2025-10-01T09:50:01Z

delphi/scorers/classifier/sample.py

-    n_incorrect = min(n_incorrect, len(below_threshold))
+    num_tokens_to_highlight = min(n_incorrect, tokens_below_threshold.shape[0])

    # The activating token is always ctx_len - ctx_len//4


I think this actually should say "when examples are centered the activating token is always blab".
When the examples are not centered (and its not that easy to check if the activating examples are centered when we are inside this function), this might bias the non activating examples to always have the same tokens selected, but It probably does not matter that much because the model that is being explained and the explainer model don't have the same tokenizer so that information shouldn't leak that much

good point, perhaps there should be additional assertions to make sure the activating token is centered

SrGonao · 2025-10-01T09:50:44Z

delphi/scorers/classifier/sample.py

+            remaining_tokens_below_threshold.remove(token_pos)
+
            random_indices.extend(
-                random.sample(below_threshold.tolist(), n_incorrect - 1)


SrGonao · 2025-10-01T10:01:10Z

delphi/scorers/classifier/sample.py

-    while i < len(tokens):
-        if check(i):
-            result.append(L)
+    for should_highlight, token_group in tokens_grouped_by_check_fn:


I might be ok with using these fancy groupings, but for me this is harder to understand what is happening

Its not obvious to me that this is doing what it should do. If the tokens to be highlighted are concecutive does it correctly close the brackets? I'm not sure I understand the expected behaviour of groupby

yeah i'm fine with not making this change, groupby is kind of misleading. the python docs put it best:

The operation of groupby() is similar to the uniq filter in Unix. It generates a break or new group every time the value of the key function changes (which is why it is usually necessary to have sorted the data using the same key function). That behavior differs from SQL’s GROUP BY which aggregates common elements regardless of their input order.

so, for example, using groupby() on a sequence like 001110 would result in 00 111 0 rather than 000 111. the behavior should be right but it's easy to mistake it for buggy code

delphi/scorers/classifier/intruder.py

SrGonao · 2025-10-01T10:09:44Z

delphi/scorers/classifier/intruder.py

-            majority_examples = []
-            active_tokens = 0
+            # highlights the active tokens with <<>> markers
+            examples = []


I think I like the name majority examples more

hm, my issue is that it does not only contain the activating majority samples; later on, after majority_examples.insert(intruder_index, intruder_sentence), it also includes the intruder.

i think a better solution may be to keep the name majority_samples up until that point and reassigning it to a new examples variable that contains both the majority and intruder samples

for more information, see https://pre-commit.ci

d0rbu · 2025-10-02T19:50:28Z

I would rather not have all these different changes in the same pull request

very reasonable! i've taken them out for now, i can just put them in a separate PR. i've addressed the comments anyway to explain the changes

clean up some code and fix bug with preparing text with multiple fals…

a4b6013

…e positives

d0rbu commented Sep 18, 2025

View reviewed changes

d0rbu marked this pull request as ready for review September 18, 2025 17:27

d0rbu changed the title ~~fix bug in IntruderScorer~~ fix bug in _prepare_text Sep 18, 2025

d0rbu added 4 commits September 18, 2025 13:11

clean up intruderscorer a bit more

7000df5

another cleanup in _prepare_and_batch

1e753cc

cleanup in Pipeline

a9860ef

fix inconsistency with intruder example formatting

2c99b16

d0rbu changed the title ~~fix bug in _prepare_text~~ fix bugs in _prepare_text and IntruderScorer prompt formatting Sep 18, 2025

d0rbu commented Sep 18, 2025

View reviewed changes

delphi/scorers/classifier/intruder.py Show resolved Hide resolved

d0rbu added 2 commits September 18, 2025 14:32

simplify _generate error reporting

1d72a34

ignore uv.lock

9ce2ce3

SrGonao reviewed Oct 1, 2025

View reviewed changes

d0rbu and others added 2 commits October 2, 2025 14:39

undo code readability/style changes

f7e1ff0

[pre-commit.ci] auto fixes from pre-commit.com hooks

513e2e4

for more information, see https://pre-commit.ci

fix bugs in _prepare_text and IntruderScorer prompt formatting #153

Are you sure you want to change the base?

fix bugs in _prepare_text and IntruderScorer prompt formatting #153

Conversation

d0rbu commented Sep 18, 2025 • edited Loading Uh oh! There was an error while loading. Please reload this page.

Uh oh!

Uh oh!

Choose a reason for hiding this comment

Uh oh!

Choose a reason for hiding this comment

Uh oh!

CLAassistant commented Sep 18, 2025 • edited Loading Uh oh! There was an error while loading. Please reload this page.

Uh oh!

Uh oh!

Uh oh!

SrGonao left a comment

Choose a reason for hiding this comment

Uh oh!

Choose a reason for hiding this comment

Uh oh!

Choose a reason for hiding this comment

Uh oh!

Choose a reason for hiding this comment

Uh oh!

Choose a reason for hiding this comment

Uh oh!

Choose a reason for hiding this comment

Uh oh!

Choose a reason for hiding this comment

Uh oh!

Choose a reason for hiding this comment

Uh oh!

d0rbu Oct 2, 2025 • edited Loading Uh oh! There was an error while loading. Please reload this page.

Uh oh!

Choose a reason for hiding this comment

Uh oh!

Uh oh!

Choose a reason for hiding this comment

Uh oh!

Choose a reason for hiding this comment

Uh oh!

d0rbu commented Oct 2, 2025

Uh oh!

Uh oh!

d0rbu commented Sep 18, 2025 •

edited

Loading

CLAassistant commented Sep 18, 2025 •

edited

Loading

d0rbu Oct 2, 2025 •

edited

Loading