Bug Description
When steering (/steer/completion-chat), when we don't utilize the KV cache, we get different results compared to when we do utilize the KV cache
How To Reproduce Bug
On main:
- Set
use_past_kv_cache to False here
- From
apps/inference run poetry run pytest -s -k test_completion_chat_steered_with_features_additive
You will fail the test. If you then set use_past_kv_cache to True, you'll pass the test.
Expected Behavior
I would expect the KV cache shouldn't impact the steering. It should only speed up the computation 🤔
Additional Context
We are attempting to upgrade to transformerlens v3. But, transformer lens v3 no longer supports the HookedTransformerKeyValueCache class utilized in HookedTransformer. As a result, we will need to set kv cache as false within our fork of transformerlens (within generate_stream method), but unfortunately that generates the above behaviour.
I did notice that Bryce just added the KV cache to transformer lens v3 yesterday, but haven't looked too much into it. See here.
Bug Description
When steering (
/steer/completion-chat), when we don't utilize the KV cache, we get different results compared to when we do utilize the KV cacheHow To Reproduce Bug
On main:
use_past_kv_cachetoFalsehereapps/inferencerunpoetry run pytest -s -k test_completion_chat_steered_with_features_additiveYou will fail the test. If you then set
use_past_kv_cache to True, you'll pass the test.Expected Behavior
I would expect the KV cache shouldn't impact the steering. It should only speed up the computation 🤔
Additional Context
We are attempting to upgrade to transformerlens v3. But, transformer lens v3 no longer supports the
HookedTransformerKeyValueCacheclass utilized inHookedTransformer. As a result, we will need to set kv cache asfalsewithin our fork oftransformerlens(withingenerate_streammethod), but unfortunately that generates the above behaviour.I did notice that Bryce just added the KV cache to transformer lens v3 yesterday, but haven't looked too much into it. See here.