Making sure summarization leverages KV cache #152

eddierichter-amd · 2026-01-08T20:52:00Z

The initial implementation of summarization used the .generate() method in the ChatSDK which bypassed sending previously stored messages to the LLM. It therefore needed to send the entire prompt to the agent which was formatted slightly differently and bypassed the KV cache and had to be processed. This resulted in very high TTFT for summarization. This change reduces the TTFT of summarization to be similar to other requests.

eddierichter-amd self-assigned this Jan 8, 2026

eddierichter-amd requested review from itomek-amd and kovtcharov-amd January 9, 2026 16:53

Fixing summarization to use context caching

7a33fa9

eddierichter-amd force-pushed the eddie/summarization-kv-cache-eddie-fork branch from 4e2b5ae to 7a33fa9 Compare January 9, 2026 19:14

Provide feedback

Saved searches

Use saved searches to filter your results more quickly

Uh oh!

Making sure summarization leverages KV cache #152

Making sure summarization leverages KV cache #152

Uh oh!

eddierichter-amd commented Jan 8, 2026

Uh oh!

Reviewers

Assignees

Labels

Projects

Milestone

Development

Uh oh!

1 participant

Making sure summarization leverages KV cache #152

Are you sure you want to change the base?

Making sure summarization leverages KV cache #152

Uh oh!

Conversation

eddierichter-amd commented Jan 8, 2026

Uh oh!

Reviewers

Assignees

Labels

Projects

Milestone

Development

Uh oh!

1 participant