Skip to content
New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

Retriever results #45

Merged
merged 15 commits into from
Jun 3, 2024
Merged

Retriever results #45

merged 15 commits into from
Jun 3, 2024

Conversation

stellasia
Copy link
Contributor

@stellasia stellasia commented May 28, 2024

Description

Add a RetrieverResult class. To deal with the retrievers with a retrieval_query, introduce a format_record_function that users can provide when instantiating the retriever (at the same time as the retrieval query). For retrievers with fixed outputs (VectorRetriever) or unconstrained output (Text2Cypher), a default formatting function is used.

BREAKING CHANGE
The method to be instantiated by all Retriever is _get_search_results, that must return a list of neo4j.Record and an optional metadata dictionnary (e.g.: the generated Cypher query for Text2CypherRetriever).

Also, the ExternalRetriever now inherits from Retriever to be able to use the new implementation of search, which means it requires a neo4j.Driver during instantiation. However, we do not have any constraint on the neo4j version there (no vector index), so retrievers classes have a VERIFY_NEO4J_VERSION class attribute, which is True by default but has been toggled to False for external retrievers.

Type of Change

  • New feature
  • Bug fix
  • Breaking change
  • Documentation update
  • Project configuration change

Complexity

Note

Please provide an estimated complexity of this PR of either Low, Medium or High

Complexity: Medium

How Has This Been Tested?

  • Unit tests
  • E2E tests
  • Manual tests

Checklist

The following requirements should have been met (depending on the changes in the branch):

  • Documentation has been updated
  • Unit tests have been updated
  • E2E tests have been updated
  • Examples have been updated
  • New files have copyright header
  • CLA (https://neo4j.com/developer/cla/) has been signed

@stellasia stellasia marked this pull request as ready for review May 31, 2024 07:36
@stellasia stellasia requested review from willtai and a team May 31, 2024 07:40
metadata: Record-related metadata, such as score.
"""

records: list[neo4j.Record]
Copy link

@mgozsoy-neo4j mgozsoy-neo4j May 31, 2024

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Is it possible to merge RetrieverRawResult and RetrieverResult, such that RetrieverResultItem either contains str_content or record_content (edited: or both of them)?
As the result user wouldn't need to think/learn when to use either of them

Copy link
Contributor Author

@stellasia stellasia May 31, 2024

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

The get_search_results method returns neo4j.Record, but the results we need for RAG is a string, which is the type returned by the public search method. That why we introduced both models. It also makes sense to have a string returned here, since the way to extract the text content depends on the retriever.
If not subclassing the Retriever class, you only have to deal with RetrieverResult. The other one will only be needed by developers who want to implement their own retriever.

Copy link
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

RetrieverRawResult does make me think that it's a parent of RetrieverResult though. Perhaps we can rename RetrieverRawResult to something like RawSearchResult? As it's a result for the retriever directly

Copy link
Contributor Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Thinking again about that :

  • The inconvenient is that users who may want to test different formatting will have to re-run the retriever.
  • Advantage is that when we will have agents, we will be able to have one formatting per retriever for free, if we postpone the formatting it will be harder to have something other than str(retriever_result).

Copy link
Contributor

@willtai willtai left a comment

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

I think you can also replace VectorSearchRecord with RetrieverRawResult in the documentation. It's located in types.rst

stellasia added 3 commits May 31, 2024 14:48
# Conflicts:
#	src/neo4j_genai/retrievers/base.py
#	src/neo4j_genai/retrievers/external/weaviate/weaviate.py
#	src/neo4j_genai/retrievers/vector.py
#	tests/unit/retrievers/external/test_weaviate.py
Copy link
Member

@oskarhane oskarhane left a comment

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Nice ☂️

@stellasia stellasia merged commit f1a9cde into main Jun 3, 2024
9 checks passed
@stellasia stellasia deleted the retriever-results branch June 3, 2024 12:44
Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment
Labels
None yet
Projects
None yet
Development

Successfully merging this pull request may close these issues.

4 participants