Skip to content

Conversation

@danhorner
Copy link

This commit adds prompt caching to claude.nvim.

  • I added cache points for the buffers as well as for the conversation, so it should be possible to wipe the claude chat buffer and still get the benefit of cached data.
  • I also updated the usage display to include costing for cached tokens.
  • I don't think I broke the bedrock mode, but I didn't test. I'm counting on it to ignore the cache_control instructions in the python helper.

Hope this is helpful.

image image

 - Expand user messages to contain content blocks instead of strings
 - Move the shared buffers into the content block of the first user message
 - Sort buffers by modification time into old (>2min) and recent (>2min)
 - Apply cache_control break points to old buffers, new buffers, and the
   most recent conversation message
 - Update cost reporting message with pricing for new cached prompts
  - extract pricing and usage categories into a config var
 - Previous version broke tool results and diffs
@pasky
Copy link
Owner

pasky commented Dec 5, 2024

Oh, nice ideas there!

I'll test the bedrock mode once I get a moment.

In the example https://docs.anthropic.com/en/docs/build-with-claude/prompt-caching the context is part of the system prompt, but you have it as part of messages (and also inflate all other messages to content blocks) - does it matter? (Keeping it as part of the system prompt would feel a bit cleaner to me, unless there's a reason not to.)

What about instead of busy/quiet buffer distinction simply sorting buffers by lastmodified?

@danhorner
Copy link
Author

In the example https://docs.anthropic.com/en/docs/build-with-claude/prompt-caching the context is part of the system prompt, but you have it as part of messages ... does it matter?

I'm trying to remember. I think I tried to use the system prompt first. But the system prompt needs to be inflated in order to apply the cache breakpoints. Later, it's passed as a string to the bedrock helper and when I tried to modify the bedrock helper I got scared.

I'm not sure if it makes a difference. It's hard to understand from the anthropic documentation whether it really matters -- different examples show both.

(and also inflate all other messages to content blocks)

I inflate all user messages to content blocks for the same reason. The most recent user message needs to have a cache-breakpoint. Actually, the multi-turn example says that the second-last user message also needs one, but it seems to work fine without.

What about instead of busy/quiet buffer distinction simply sorting buffers by lastmodified?

I'm really not sure what is best here. I think sorting alone is not sufficient. The problem is we only have 5 cache breakpoints, and I'm using one for the final message. Even if we sort, we still need to arbitrarily decide where to put the cache breakpoints and how many to use.

I decided on 2 breakpoints: one for quiet buffers (documentation / background code) and one for busy buffers that claude will edit. It might be helpful to go one more step and separate the single most recently edited buffer into its own block.

It's hard to optimize this stuff for unknown use cases! Mostly I think it is useful to cache documentation and read-only source files data that claude will not edit.

Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment

Labels

None yet

Projects

None yet

Development

Successfully merging this pull request may close these issues.

2 participants