Skip to content
New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

Added gemini thinking model support, with a default of gemini-2.0-flash-thinking-exp-01-21 #56

Open
wants to merge 1 commit into
base: main
Choose a base branch
from

Conversation

jamisonl
Copy link

Hi @dzhng, hope the below screenshot adds more context for using the @google/generative-ai package instead of the vercel wrapper. This feature lets the user choose between o3-mini and gemini-2.0-flash-thinking (defaults to o3-mini) in the CLI. I've tested it and it's outputting markdown reports. Wondering if there's any interest in adding r1..

image

@carterlasalle
Copy link

Im trying to implement it, but this is from the docs:
Using Natural language

Large language models are a powerfuls multitask tools. Often you can just ask Gemini for what you want, and it will do okay.

The Gemini API doesn't have a JSON mode, so there are a few things to watch for when generating data structures this way:

Sometimes parsing fails.
The schema can't be strictly enforced.

You'll solve those problems in the next section. First, try a simple natural language prompt with the schema written out as text. This has not been optimized:

MODEL_ID="gemini-2.0-flash"
prompt = """
Please return JSON describing the people, places, things and relationships from this story using the following schema:

{"people": list[PERSON], "places":list[PLACE], "things":list[THING], "relationships": list[RELATIONSHIP]}

PERSON = {"name": str, "description": str, "start_place_name": str, "end_place_name": str}
PLACE = {"name": str, "description": str}
THING = {"name": str, "description": str, "start_place_name": str, "end_place_name": str}
RELATIONSHIP = {"person_1_name": str, "person_2_name": str, "relationship": str}

All fields are required.

Important: Only return a single piece of valid JSON text.

Here is the story:

""" + story

response = client.models.generate_content(
model=MODEL_ID,
contents=prompt,
config=types.GenerateContentConfig(
response_mime_type="application/json"
),
)

That returned a json string. Try parsing it:

import json

print(json.dumps(json.loads(response.text), indent=4))

{
"people": [
{
"name": "Elara",
"description": "A wisp of a girl with eyes the color of a stormy sea, who can hear the forest's secrets.",
"start_place_name": "Havenwood",
"end_place_name": "Havenwood"
},
{
"name": "Arthur",
"description": "Elara's father, a quiet carpenter with sawdust permanently clinging to his eyebrows.",
"start_place_name": "Havenwood",
"end_place_name": "Havenwood"
},
{
"name": "Clara",
"description": "Elara's mother, a baker whose cinnamon rolls were legendary.",
"start_place_name": "Havenwood",
"end_place_name": "Havenwood"
},
{
"name": "Silas",
"description": "A gruff, barrel-chested man with a glint of steel in his eyes who wanted to drain the magic from Havenwood.",
"start_place_name": "Havenwood",
"end_place_name": "Havenwood"
},
{
"name": "Thomas",
"description": "The town\u2019s librarian, a kind, bookish man.",
"start_place_name": "Havenwood",
"end_place_name": "Havenwood"
}
],
"places": [
{
"name": "Havenwood",
"description": "A quiet town nestled beside a whispering forest."
},
{
"name": "Whispering Woods",
"description": "A forest beside Havenwood that murmurs secrets."
},
{
"name": "Old Mill",
"description": "A dilapidated old mill on the edge of Havenwood."
}
],
"things": [
{
"name": "Magic Backpack",
"description": "A simple, worn, leather backpack that holds magical items.",
"start_place_name": "Elara's House",
"end_place_name": "Havenwood"
},
{
"name": "Honey Jar",
"description": "A bottomless jar of honey.",
"start_place_name": "Magic Backpack",
"end_place_name": "Magic Backpack"
},
{
"name": "Wooden Bird",
"description": "A small, intricately carved wooden bird that can fly messages.",
"start_place_name": "Magic Backpack",
"end_place_name": "Magic Backpack"
},
{
"name": "Iridescent Dust",
"description": "Shimmering dust that can mend anything broken.",
"start_place_name": "Magic Backpack",
"end_place_name": "Magic Backpack"
},
{
"name": "Compass",
"description": "A compass that points towards what is needed most.",
"start_place_name": "Magic Backpack",
"end_place_name": "Magic Backpack"
},
{
"name": "Leather-bound Book",
"description": "A small, leather-bound book filled with blank pages that fill with the perfect story, poem, or spell whenever needed.",
"start_place_name": "Magic Backpack",
"end_place_name": "Magic Backpack"
},
{
"name": "Cinnamon Rolls",
"description": "Legendary cinnamon rolls made by Clara.",
"start_place_name": "Havenwood",
"end_place_name": "Havenwood"
}
],
"relationships": [
{
"person_1_name": "Elara",
"person_2_name": "Arthur",
"relationship": "Daughter-Father"
},
{
"person_1_name": "Elara",
"person_2_name": "Clara",
"relationship": "Daughter-Mother"
},
{
"person_1_name": "Elara",
"person_2_name": "Thomas",
"relationship": "Friends"
},
{
"person_1_name": "Elara",
"person_2_name": "Silas",
"relationship": "Adversaries"
},
{
"person_1_name": "Arthur",
"person_2_name": "Clara",
"relationship": "Spouses"
}
]
}

That's relatively simple and often works, but you can potentially make this more strict/robust by defining the schema using the API's function calling feature.

@jamisonl
Copy link
Author

I opted to use the official Gemini package here rather than Vercel's wrapper since I encountered that JSON mode compatibility issue. Direct integration with the official package gives better control and helps avoid the abstraction-related errors. I've used an example schema as a way to make the gemini model return JSON and markdown, which is a technique that works pretty broadly across LLMs. This example schema is in the providers.ts file.

@carterlasalle
Copy link

I opted to use the official Gemini package here rather than Vercel's wrapper since I encountered that JSON mode compatibility issue. Direct integration with the official package gives better control and helps avoid the abstraction-related errors. I've used an example schema as a way to make the gemini model return JSON and markdown, which is a technique that works pretty broadly across LLMs. This example schema is in the providers.ts file.

Wait, so is this a working implementation of gemini?

@jamisonl
Copy link
Author

I opted to use the official Gemini package here rather than Vercel's wrapper since I encountered that JSON mode compatibility issue. Direct integration with the official package gives better control and helps avoid the abstraction-related errors. I've used an example schema as a way to make the gemini model return JSON and markdown, which is a technique that works pretty broadly across LLMs. This example schema is in the providers.ts file.

Wait, so is this a working implementation of gemini?

Yeah! It's pretty speedy too, the main limitation is firecrawl rate-limiting if you're a free user like me. You can pull it down from my fork and play around with it

@Shreyas9400
Copy link

Hey thanks for the Gemini addition, works seamlessly. I am trying to integrate this with other search APIs as Firecrawl provides only 500 free credits and rate limit on their free tier option, is it possible to integrate SearXNG or Tavily API into this. Thanks

@carterlasalle
Copy link

@dzhng @Dariton4000 is this ready to merge then?

@jamisonl
Copy link
Author

Hey thanks for the Gemini addition, works seamlessly. I am trying to integrate this with other search APIs as Firecrawl provides only 500 free credits and rate limit on their free tier option, is it possible to integrate SearXNG or Tavily API into this. Thanks

Yes, a choice of web scrapers is a sensible next step. Firecrawl free tier is pretty limiting.

@carterlasalle
Copy link

Hey thanks for the Gemini addition, works seamlessly. I am trying to integrate this with other search APIs as Firecrawl provides only 500 free credits and rate limit on their free tier option, is it possible to integrate SearXNG or Tavily API into this. Thanks

Yes, a choice of web scrapers is a sensible next step. Firecrawl free tier is pretty limiting.

Im self-hosting right now, and struggling a bit. Also the gemini keeps getting

GoogleGenerativeAIFetchError: [GoogleGenerativeAI Error]: Error fetching from https://generativelanguage.googleapis.com/v1beta/models/gemini-2.0-flash-thinking-exp-01-21:generateContent: [429 Too Many Requests] Resource has been exhausted (e.g. check quota).
    at handleResponseNotOk (/Users/rocket/deep-research-1/node_modules/@google/generative-ai/dist/index.js:414:11)
    at process.processTicksAndRejections (node:internal/process/task_queues:105:5)
    at async makeRequest (/Users/rocket/deep-research-1/node_modules/@google/generative-ai/dist/index.js:387:9)solated populations
    at async generateContent (/Users/rocket/deep-research-1/node_modules/@google/generative-ai/dist/index.js:832:22)
    at async GeminiProvider.generateObject (/Users/rocket/deep-research-1/src/ai/providers.ts:185:20)
    at async writeFinalReport (/Users/rocket/deep-research-1/src/deep-research.ts:154:15)
    at async run (/Users/rocket/deep-research-1/src/run.ts:103:18) {
  status: 429,
  statusText: 'Too Many Requests',
  errorDetails: undefined

so we may need to implement a rate limit with an exponential backoff.

@samyogdhital
Copy link

@jamisonl I saw somewhere when using thinking model, I was getting in middle of research, JSON at position 0 unable to decode or something like that.
Sorry, I am not able to give exact error. Will update if I see that again.

Have you also noticed any error like that?

@Shreyas9400
Copy link

Hey thanks for the Gemini addition, works seamlessly. I am trying to integrate this with other search APIs as Firecrawl provides only 500 free credits and rate limit on their free tier option, is it possible to integrate SearXNG or Tavily API into this. Thanks

Yes, a choice of web scrapers is a sensible next step. Firecrawl free tier is pretty limiting.

Hey i have added searxng instance on this repo https://github.com/Shreyas9400/deep-research-searxng

I have deployed the instance on huggingface due to resource constraints, however, the processing time is high due to search and parse which takes additional time.

@dzhng
Copy link
Owner

dzhng commented Feb 11, 2025

Hey - I spoke to the AI SDK guys and I think I'll keep this PR open but not merge. The reason is that the only reason for adding the complexity of gemini specific packages is to use their thinking model, which doesn't support tool calling yet.

BUT, that's coming, this is just an experimental model. You CAN use the normal gemini-flash-2.0 model right now with AI SDK fine. I rather keep this simple and just rely on one llm interface package (AI SDK).

If you really want to try gemini thinking model, this is a good reference implementation.

@jamisonl
Copy link
Author

jamisonl commented Feb 11, 2025

Hey - I spoke to the AI SDK guys and I think I'll keep this PR open but not merge. The reason is that the only reason for adding the complexity of gemini specific packages is to use their thinking model, which doesn't support tool calling yet.

BUT, that's coming, this is just an experimental model. You CAN use the normal gemini-flash-2.0 model right now with AI SDK fine. I rather keep this simple and just rely on one llm interface package (AI SDK).

If you really want to try gemini thinking model, this is a good reference implementation.

That's a rational design choice! DeepSeek also doesn't have tool use/structured output yet. I look forward to the other reasoning models having feature parity. Any interest in making an everything but the kitchen sink version?

@dzhng
Copy link
Owner

dzhng commented Feb 17, 2025

honestly I don't think I have enough time to do an kitchen sink version, also that's a looong rabbit hole to go down in haha. my guess is the architecture will evolve as new models / capabilities are unlocked, so it's better to keep a bare minimum ver, that's constantly updated with what the current SOTA implementation is.

Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment
Labels
None yet
Projects
None yet
Development

Successfully merging this pull request may close these issues.

6 participants