-
Notifications
You must be signed in to change notification settings - Fork 1.4k
New issue
Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.
By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.
Already on GitHub? Sign in to your account
Added gemini thinking model support, with a default of gemini-2.0-flash-thinking-exp-01-21 #56
base: main
Are you sure you want to change the base?
Conversation
…sh-thinking-exp-01-21
Im trying to implement it, but this is from the docs: Large language models are a powerfuls multitask tools. Often you can just ask Gemini for what you want, and it will do okay. The Gemini API doesn't have a JSON mode, so there are a few things to watch for when generating data structures this way:
You'll solve those problems in the next section. First, try a simple natural language prompt with the schema written out as text. This has not been optimized: MODEL_ID="gemini-2.0-flash" {"people": list[PERSON], "places":list[PLACE], "things":list[THING], "relationships": list[RELATIONSHIP]} PERSON = {"name": str, "description": str, "start_place_name": str, "end_place_name": str} All fields are required. Important: Only return a single piece of valid JSON text. Here is the story: """ + story response = client.models.generate_content( That returned a json string. Try parsing it: import json print(json.dumps(json.loads(response.text), indent=4)) { That's relatively simple and often works, but you can potentially make this more strict/robust by defining the schema using the API's function calling feature. |
I opted to use the official Gemini package here rather than Vercel's wrapper since I encountered that JSON mode compatibility issue. Direct integration with the official package gives better control and helps avoid the abstraction-related errors. I've used an example schema as a way to make the gemini model return JSON and markdown, which is a technique that works pretty broadly across LLMs. This example schema is in the providers.ts file. |
Wait, so is this a working implementation of gemini? |
Yeah! It's pretty speedy too, the main limitation is firecrawl rate-limiting if you're a free user like me. You can pull it down from my fork and play around with it |
Hey thanks for the Gemini addition, works seamlessly. I am trying to integrate this with other search APIs as Firecrawl provides only 500 free credits and rate limit on their free tier option, is it possible to integrate SearXNG or Tavily API into this. Thanks |
@dzhng @Dariton4000 is this ready to merge then? |
Yes, a choice of web scrapers is a sensible next step. Firecrawl free tier is pretty limiting. |
Im self-hosting right now, and struggling a bit. Also the gemini keeps getting
so we may need to implement a rate limit with an exponential backoff. |
@jamisonl I saw somewhere when using thinking model, I was getting in middle of research, JSON at position 0 unable to decode or something like that. Have you also noticed any error like that? |
Hey i have added searxng instance on this repo https://github.com/Shreyas9400/deep-research-searxng I have deployed the instance on huggingface due to resource constraints, however, the processing time is high due to search and parse which takes additional time. |
Hey - I spoke to the AI SDK guys and I think I'll keep this PR open but not merge. The reason is that the only reason for adding the complexity of gemini specific packages is to use their thinking model, which doesn't support tool calling yet. BUT, that's coming, this is just an experimental model. You CAN use the normal gemini-flash-2.0 model right now with AI SDK fine. I rather keep this simple and just rely on one llm interface package (AI SDK). If you really want to try gemini thinking model, this is a good reference implementation. |
That's a rational design choice! DeepSeek also doesn't have tool use/structured output yet. I look forward to the other reasoning models having feature parity. Any interest in making an everything but the kitchen sink version? |
honestly I don't think I have enough time to do an kitchen sink version, also that's a looong rabbit hole to go down in haha. my guess is the architecture will evolve as new models / capabilities are unlocked, so it's better to keep a bare minimum ver, that's constantly updated with what the current SOTA implementation is. |
Hi @dzhng, hope the below screenshot adds more context for using the @google/generative-ai package instead of the vercel wrapper. This feature lets the user choose between o3-mini and gemini-2.0-flash-thinking (defaults to o3-mini) in the CLI. I've tested it and it's outputting markdown reports. Wondering if there's any interest in adding r1..