Skip to content
New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

Elasticsearch / OpenSearch as destination #2173

Open
HectorBst opened this issue Dec 22, 2024 · 1 comment
Open

Elasticsearch / OpenSearch as destination #2173

HectorBst opened this issue Dec 22, 2024 · 1 comment

Comments

@HectorBst
Copy link

HectorBst commented Dec 22, 2024

Feature description

Elasticsearch (and its fork OpenSearch) is a mature and proven technology for building search engines. It can also be used as a Vector Store and Inverted Index to build a robust RAG solution. Elasticsearch and OpenSearch also have a good presence in the cloud ecosystem.

Having them in destinations would be very interesting, especially in an architecture already using dlt as EL or ETL tool.

Are you a dlt user?

Yes, I'm already a dlt user.

Use case

Build a RAG solution, integrated with the Cloud Data Platform. The data platform extracts the company's documentation and knowledge base from various sources and centralizes it in the data lake, helping to implement versioning and reproducibility of the knowledge base and processing.

The vector store (and/or inverted index) would be Elasticsearch, OpenSearch or OpenSearch Serverless.

The data platform would use dlt as an extraction tool, and use it as a reverse ETL/EL to load text and vectors into an OpenSeach or Elasticsearch vector store.

Proposed solution

Reverse ETL or EL with dlt to build a knowledge base for a RAG or search engine:

  • Extract chunks of text and their associated vectors (already computed and versioned in the data platform) from a Data Warehouse (the source) and load them into Elasticsearch / OpenSearch (the destination).
  • Extract a table from a Data Lake (the source) referencing raw documents (such as images or PDFs), split documents and compute the vectors on the fly with a developer-supplied transformer and load them into Elasticsearch / OpenSearch.

The need here is to have the Elasticsearch / OpenSearch destinations, based on ES/OS Python clients, to be able to load data into indices with dlt, from existing dlt sources. The Bulk API could help load batches of documents.

Related issues

No response

@sh-rp
Copy link
Collaborator

sh-rp commented Jan 2, 2025

@HectorBst just in case you missed it, we have a reverse etl destination which you can use to send data back to an api / queue or other type of endpoint here: https://dlthub.com/docs/dlt-ecosystem/destinations/destination. I'm not super familiar with elastic search, but my intuition is that we would rather build a nice code example with the reverse etl destination instead of a full destination implementation for this.

Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment
Labels
None yet
Projects
Status: Todo
Development

No branches or pull requests

2 participants