You signed in with another tab or window. Reload to refresh your session.You signed out in another tab or window. Reload to refresh your session.You switched accounts on another tab or window. Reload to refresh your session.Dismiss alert
Observation 1: Large-context/smart LLMs (GPT4, Gemini, Claude) can be expensive.
Ideally, you want to send as few tokens as needed to them when crafting your prompt, to reduce cost (and also possibly make them better at answering).
This means you can't send entire files at a time (which would be faster/more convenient) but you instead have to think about which parts of the files are important to your prompt and which are not, and send only the important parts. This takes time/effort.
Observation 2: Local (llama3-8b/oolama etc) / smaller-but-remote (groq, gpt3.5) LLMs are free or much cheaper.
So, what if we could delegate the task of "filtering" what to send in the final prompt, to a small/local LLM?
This would work like this:
Pass 1: Extract everything from the prompt that is not a file, meaning it is "the question" / "the task" the user needs done. Ask the small LLM to "summarize" this task.
Pass 2: Go over each file, and for each file, ask it, which part of this file is relevant to the question/task, and which is not. Filter out the irrelevant parts, only keep the relevant parts.
Finally, generate the prompt keeping only the relevant parts, resulting in a (possibly much) more compact prompt, without losing any important information.
If this works, it would significantly reduce cost without reducing usefulness/accuracy (at the cost of a bit of time to process the initial passes, and a bit of effort to initially set things up).
Just an idea. Sorry for all the noise, I'm presuming you'd rather people give ideas even if you don't end up implementing them, tell me if I need to calm down.
Cheers.
The text was updated successfully, but these errors were encountered:
Observation 1: Large-context/smart LLMs (GPT4, Gemini, Claude) can be expensive.
Ideally, you want to send as few tokens as needed to them when crafting your prompt, to reduce cost (and also possibly make them better at answering).
This means you can't send entire files at a time (which would be faster/more convenient) but you instead have to think about which parts of the files are important to your prompt and which are not, and send only the important parts. This takes time/effort.
Observation 2: Local (llama3-8b/oolama etc) / smaller-but-remote (groq, gpt3.5) LLMs are free or much cheaper.
So, what if we could delegate the task of "filtering" what to send in the final prompt, to a small/local LLM?
This would work like this:
If this works, it would significantly reduce cost without reducing usefulness/accuracy (at the cost of a bit of time to process the initial passes, and a bit of effort to initially set things up).
Just an idea. Sorry for all the noise, I'm presuming you'd rather people give ideas even if you don't end up implementing them, tell me if I need to calm down.
Cheers.
The text was updated successfully, but these errors were encountered: