-
Notifications
You must be signed in to change notification settings - Fork 73
New issue
Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.
By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.
Already on GitHub? Sign in to your account
Retriever results #45
Conversation
…r-results # Conflicts: # src/neo4j_genai/retrievers/base.py # tests/unit/retrievers/test_vector.py
metadata: Record-related metadata, such as score. | ||
""" | ||
|
||
records: list[neo4j.Record] |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Is it possible to merge RetrieverRawResult and RetrieverResult, such that RetrieverResultItem either contains str_content or record_content (edited: or both of them)?
As the result user wouldn't need to think/learn when to use either of them
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
The get_search_results
method returns neo4j.Record
, but the results we need for RAG is a string, which is the type returned by the public search
method. That why we introduced both models. It also makes sense to have a string returned here, since the way to extract the text content depends on the retriever.
If not subclassing the Retriever
class, you only have to deal with RetrieverResult
. The other one will only be needed by developers who want to implement their own retriever.
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
RetrieverRawResult
does make me think that it's a parent of RetrieverResult
though. Perhaps we can rename RetrieverRawResult
to something like RawSearchResult
? As it's a result for the retriever directly
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Thinking again about that :
- The inconvenient is that users who may want to test different formatting will have to re-run the retriever.
- Advantage is that when we will have agents, we will be able to have one formatting per retriever for free, if we postpone the formatting it will be harder to have something other than
str(retriever_result)
.
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
I think you can also replace VectorSearchRecord
with RetrieverRawResult
in the documentation. It's located in types.rst
# Conflicts: # src/neo4j_genai/retrievers/base.py # src/neo4j_genai/retrievers/external/weaviate/weaviate.py # src/neo4j_genai/retrievers/vector.py # tests/unit/retrievers/external/test_weaviate.py
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Nice ☂️
Description
Add a
RetrieverResult
class. To deal with the retrievers with aretrieval_query
, introduce aformat_record_function
that users can provide when instantiating the retriever (at the same time as the retrieval query). For retrievers with fixed outputs (VectorRetriever) or unconstrained output (Text2Cypher), a default formatting function is used.BREAKING CHANGE
The method to be instantiated by all
Retriever
is_get_search_results
, that must return a list ofneo4j.Record
and an optional metadata dictionnary (e.g.: the generated Cypher query forText2CypherRetriever
).Also, the
ExternalRetriever
now inherits fromRetriever
to be able to use the new implementation ofsearch
, which means it requires aneo4j.Driver
during instantiation. However, we do not have any constraint on the neo4j version there (no vector index), so retrievers classes have aVERIFY_NEO4J_VERSION
class attribute, which isTrue
by default but has been toggled toFalse
for external retrievers.Type of Change
Complexity
Complexity: Medium
How Has This Been Tested?
Checklist
The following requirements should have been met (depending on the changes in the branch):