Skip to content

Conversation

d0rbu
Copy link

@d0rbu d0rbu commented Sep 18, 2025

in delphi.scorers.classifier.sample._prepare_text, i believe i've found a bug when calling it with n_incorrect > 1(i.e. producing a false positive sample by incorrectly highlighting more than 1 non-activating token). i've added a comment to the location of the bug with an explanation.

additionally, in delphi.scorers.classifier.intruder.IntruderScorer._build_prompt (the method used to format a list of intruder examples into a prompt for the scorer model), there is an inconsistency with few-shot examples that i've also highlighted in a comment

there are also a bunch of smaller stylistic/readability changes, feel free to push back on these.

remaining_tokens_below_threshold.remove(token_pos)

random_indices.extend(
random.sample(below_threshold.tolist(), n_incorrect - 1)
Copy link
Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

in the old code, this could result in token_pos being selected again since it's still in below_threshold. then, after being turned into a set, random_indices would have one less element than expected, resulting in one fewer token being incorrectly highlighted than was specified by n_incorrect

Copy link
Collaborator

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Good catch

@d0rbu d0rbu marked this pull request as ready for review September 18, 2025 17:27
@d0rbu d0rbu changed the title fix bug in IntruderScorer fix bug in _prepare_text Sep 18, 2025
@CLAassistant
Copy link

CLAassistant commented Sep 18, 2025

CLA assistant check
All committers have signed the CLA.

@d0rbu d0rbu changed the title fix bug in _prepare_text fix bugs in _prepare_text and IntruderScorer prompt formatting Sep 18, 2025
Copy link
Collaborator

@SrGonao SrGonao left a comment

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

I would rather not have all these different changes in the same pull request, but we can do it like this. I left some comments on things I didnt quite understand but most of it seems fine. Will wait on your updates. Reach me on discord if you want to chat more about delphi

else:
pass
if result is None:
break
Copy link
Collaborator

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Why would we want to break

Copy link
Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

once a pipe returns None, we iterate through the rest of the pipes and simply hit the pass until we return the result, which is None. this skips that pointless extra iteration; perhaps it may be clearer if we replace break with return result or return None?

n_incorrect = min(n_incorrect, len(below_threshold))
num_tokens_to_highlight = min(n_incorrect, tokens_below_threshold.shape[0])

# The activating token is always ctx_len - ctx_len//4
Copy link
Collaborator

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

I think this actually should say "when examples are centered the activating token is always blab".
When the examples are not centered (and its not that easy to check if the activating examples are centered when we are inside this function), this might bias the non activating examples to always have the same tokens selected, but It probably does not matter that much because the model that is being explained and the explainer model don't have the same tokenizer so that information shouldn't leak that much

Copy link
Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

good point, perhaps there should be additional assertions to make sure the activating token is centered

remaining_tokens_below_threshold.remove(token_pos)

random_indices.extend(
random.sample(below_threshold.tolist(), n_incorrect - 1)
Copy link
Collaborator

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Good catch

while i < len(tokens):
if check(i):
result.append(L)
for should_highlight, token_group in tokens_grouped_by_check_fn:
Copy link
Collaborator

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

I might be ok with using these fancy groupings, but for me this is harder to understand what is happening

Copy link
Collaborator

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Its not obvious to me that this is doing what it should do. If the tokens to be highlighted are concecutive does it correctly close the brackets? I'm not sure I understand the expected behaviour of groupby

Copy link
Author

@d0rbu d0rbu Oct 2, 2025

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

yeah i'm fine with not making this change, groupby is kind of misleading. the python docs put it best:

The operation of groupby() is similar to the uniq filter in Unix. It generates a break or new group every time the value of the key function changes (which is why it is usually necessary to have sorted the data using the same key function). That behavior differs from SQL’s GROUP BY which aggregates common elements regardless of their input order.

so, for example, using groupby() on a sequence like 001110 would result in 00 111 0 rather than 000 111. the behavior should be right but it's easy to mistake it for buggy code

majority_examples = []
active_tokens = 0
# highlights the active tokens with <<>> markers
examples = []
Copy link
Collaborator

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

I think I like the name majority examples more

Copy link
Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

hm, my issue is that it does not only contain the activating majority samples; later on, after majority_examples.insert(intruder_index, intruder_sentence), it also includes the intruder.

i think a better solution may be to keep the name majority_samples up until that point and reassigning it to a new examples variable that contains both the majority and intruder samples

@d0rbu
Copy link
Author

d0rbu commented Oct 2, 2025

I would rather not have all these different changes in the same pull request

very reasonable! i've taken them out for now, i can just put them in a separate PR. i've addressed the comments anyway to explain the changes

Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment
Labels
None yet
Projects
None yet
Development

Successfully merging this pull request may close these issues.

3 participants