feat: enhance AI JSON response with sources, confidence, and tags#141
feat: enhance AI JSON response with sources, confidence, and tags#141sachin9058 wants to merge 2 commits intobenodiwal:mainfrom
Conversation
| double calculateConfidence(const std::string& text) { | ||
| if (text.length() > 150) return 0.9; | ||
| if (text.length() > 50) return 0.75; | ||
| return 0.6; | ||
| } |
There was a problem hiding this comment.
how does this calculate ai confidence query ?
There was a problem hiding this comment.
Good question ... currently the confidence score is a simple heuristic, not derived from the AI model itself.
Right now it’s based on the length of the generated explanation:
- Longer explanations are assumed to be more detailed → higher confidence
- Shorter ones → lower confidence
This is just an initial placeholder to provide a basic signal to users.
In future iterations, this could be improved by:
- Incorporating model-provided confidence (if available)
- Using response structure/quality signals
- Integrating external validation or scoring mechanisms
Happy to refine this approach based on suggestions.
| if (text.find("ST_") != std::string::npos) { | ||
| sources.push_back({ | ||
| {"title", "PostGIS Documentation"}, | ||
| {"url", "https://postgis.net/docs/"}}); | ||
| } | ||
|
|
||
| if (text.find("SELECT") != std::string::npos) { | ||
| sources.push_back({ | ||
| {"title", "PostgreSQL SELECT"}, | ||
| {"url", "https://www.postgresql.org/docs/current/sql-select.html"}}); | ||
| } |
There was a problem hiding this comment.
you only cover select and Postgis queries for documentation, i don't think this should be done this way
There was a problem hiding this comment.
Thanks for the effort here! I'm a bit skeptical though about whether this belongs in the core extension.
The confidence score being derived from text length doesn't really reflect model confidence. And honestly I'm not sure
a confidence field makes sense here, in general, the models that hallucinate a lot are exactly the ones where you'd
want some uncertainty signal, but they're also the ones least capable of producing a reliable one. And stronger models
are good enough that you don't really need it. So either way it feels like it could mislead users more than it helps.
The source linking feels a bit fragile too, if nearly every response is going to point to the same top-level
PostgreSQL or PostGIS page just because SELECT or ST_ appeared somewhere, that's not really sources, it's more
like a static footer. I think it could be genuinely useful if the links were actually tied to the specific functions
or clauses in the generated query, but as-is I feel it adds more noise than clarity.
More broadly I'm not sure metadata enrichment like this is really the extension's job. Its core responsibility is
generating valid SQL from natural language - annotating and interpreting the AI's explanation feels more like
something a client or UI layer should own. And once this is in, we're on the hook for keeping these heuristics
updated.
Might be worth opening a discussion thread first to see if there's consensus on whether this is the right place for
it?
cc @benodiwal @probablyArth - curious what you think about the scope here
|
@MohamedKamal000 @sahitya-chandra I get the concerns you raised. The current version was more of an initial attempt, but I agree that:
My intention was to make responses a bit more transparent and easier to trust, but I see how putting this in the core extension adds scope and maintenance overhead. I’m open to taking a different approach here — maybe moving this to the client/UI side, or reworking it so the metadata is actually meaningful and tied to specific SQL constructs. Happy to iterate on this, or pause and open a discussion first if that makes more sense. Let me know what you think 👍 |
Summary
This PR enhances the JSON response format by adding:
Changes
Impact
Notes