-
Notifications
You must be signed in to change notification settings - Fork 2.4k
Gemini Context caching #3212
New issue
Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.
By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.
Already on GitHub? Sign in to your account
Comments
Would love to have some utilities around this too for the Google and Vertex AI providers, especially for use cases where you want to tools in combination with cached context. There's in a doc a way to give a cachedContent key to the provider, which works, but as soon as tools are involved, it breaks.
This would likely require some re-working in how tools are passed on to Google APIs when a cachedContent name is provided, since you might have tools defined in your cache. But for them to be executed you need them in your vercel AI SDK client too ofc. |
I was exploring this last night and wanted to share what it would take to make this work. What I'm sharing here is quite hack-y but should give us some pointers on how to properly implement context caching within the Vercel AI SDK, including support for caching Based on my understanding and experimentation, the core challenge is that the Gemini API (via Google AI or Vertex AI) doesn't allow sending Here's a rough outline of the approach I explored: 1. Modify Provider Request for Cached CallsWhen using import { createVertexProvider } from "@ai-sdk/google-vertex";
// Assuming other necessary imports
const provider = createVertexProvider({
// ... other options
fetch: async (input, init) => {
const request = new Request(input, init);
const headers = request.headers;
const contentType = headers.get("content-type");
// Check if it's a JSON request likely targeting the generateContent endpoint
if (contentType?.includes("application/json")) {
// Clone the request to read the body safely
const clonedRequest = request.clone();
try {
const body = await clonedRequest.json<Record<string, unknown | undefined>>();
// If cachedContent is present, remove fields disallowed by the API
if (body?.cachedContent) {
delete body.tools;
delete body.toolConfig;
delete body.systemInstruction;
// Create a new init object with the modified body
const newInit = {
...init,
body: JSON.stringify(body), // Reserialize the modified body
};
// Fetch with the modified init object
return fetch(input, newInit);
}
} catch (error) {
console.error("Error processing request body:", error);
// Proceed with the original request if JSON parsing fails or it's not the expected structure
}
}
// Fallback to the original fetch call
return fetch(input, init);
},
}); 2. Separate Configurations and Create Cache ExplicitlySince the configurations ( Let's first define our configurations separately. import { GoogleGenerativeAI, FunctionCallingConfigMode, Schema } from "@google/generative-ai";
import { asSchema, convertJSONSchemaToOpenAPISchema } from "@ai-sdk/provider-utils";
import dayjs from "dayjs"; // Example date library
// Define tools using Vercel AI SDK's tool definition
const tools = {
think: tool({ /* ... think tool definition */ }),
search: tool({ /* ... search tool definition */ }),
};
// Define system instruction
const systemInstructionText = "You're a friendly bot with a very long system instruction and potentially large files referenced in the cache.";
// Prepare tool definitions for Google API format
const functionDeclarations = Object.entries(tools).map(([name, toolDef]) => ({
name,
description: toolDef.description ?? "",
parameters: convertJSONSchemaToOpenAPISchema(
"jsonSchema" in toolDef.parameters
? toolDef.parameters.jsonSchema
: asSchema(toolDef.parameters as any).jsonSchema ?? {}
) as Schema,
}));
// ToolConfig for Google API
const toolConfig = {
functionCallingConfig: {
mode: FunctionCallingConfigMode.AUTO,
// allowedFunctionNames: Object.keys(tools), // Only needed if Mode is ANY
},
}; Then, create the cache using the Google AI SDK directly: // Initialize Google SDK (ensure auth is configured)
const genai = new GoogleGenerativeAI({
vertexai: true, // or false for Google AI
project: "your-gcp-project-id",
// googleAuthOptions: { ... }, // If needed
});
// Example conversation ID or unique identifier for the cache
const conversationId = "unique-conversation-123";
// Create the cache
const cache = await genai.caches.create({
model: "gemini-1.5-flash-001", // Ensure this matches the model used later
displayName: `Cache for conversation ${conversationId}`,
ttl: "600s", // How long the cache is allowed to be valid
systemInstruction: {
parts: [{ text: systemInstructionText }],
},
tools: [{ functionDeclarations }],
toolConfig: toolConfig,
contents: [ // Include initial context/files here
{
role: "user", // Can be 'user' or 'model'
parts: [
{ text: "Here is a large document for context:" },
{
fileData: {
// Use GCS URI for large files
fileUri: "gs://your-bucket-name/your-large-file.pdf",
mimeType: "application/pdf",
// Alternatively, for small data:
// inlineData: { data: Buffer.from("...").toString("base64"), mimeType: "text/plain" }
},
},
],
},
// Add more history/context if needed
],
});
console.log(`Cache created: ${cache.name}`); Finally, use the cache name with the Vercel AI SDK's import { generateText } from "ai";
import { CoreMessage } from "ai";
// Use the modified provider and pass the cache name
const model = provider(cache.model); // Use the same model name as the cache
const messages: CoreMessage[] = [
// Add only *new* messages not already in the cache's 'contents'
{ role: "user", content: "Summarize the document provided earlier." }
];
const result = await generateText({
model: model,
// Pass cache name via provider options defined in Step 1
providerOptions: {
cachedContent: cache.name,
},
system: systemInstructionText,
tools: tools, // Still required here for executing functions
messages: messages, // Only new messages
maxSteps: 20, // Maximum amount of steps / tool calls
});
console.log(result.text); This approach seems viable for both Google AI and Vertex AI providers. It's definitely a bit more involved as it requires interacting with the underlying Google SDK directly for cache management and modifying the provider's fetch behavior. I'd be happy to collaborate on refining an approach, exploring cleaner abstractions within the SDK if possible, and perhaps opening a PR with a more integrated solution if this direction seems promising. |
Just out of curiosity, if you're going to do that, does it have any meaning in using this Vercel library? |
Feature Description
Context caching is particularly well suited to scenarios where a substantial initial context is referenced repeatedly by shorter requests. Consider using context caching for use cases such as:
Chatbots with extensive system instructions
Repetitive analysis of lengthy video files
Recurring queries against large document sets
Frequent code repository analysis or bug fixing
Use Case
In a typical AI workflow, you might pass the same input tokens over and over to a model. Using the Gemini API context caching feature, you can pass some content to the model once, cache the input tokens, and then refer to the cached tokens for subsequent requests. At certain volumes, using cached tokens is lower cost than passing in the same corpus of tokens repeatedly.
When you cache a set of tokens, you can choose how long you want the cache to exist before the tokens are automatically deleted. This caching duration is called the time to live (TTL). If not set, the TTL defaults to 1 hour. The cost for caching depends on the input token size and how long you want the tokens to persist.
Additional context
https://ai.google.dev/gemini-api/docs/caching?lang=node
The text was updated successfully, but these errors were encountered: