Elasticsearch / OpenSearch as destination #2173

HectorBst · 2024-12-22T18:16:16Z

Feature description

Elasticsearch (and its fork OpenSearch) is a mature and proven technology for building search engines. It can also be used as a Vector Store and Inverted Index to build a robust RAG solution. Elasticsearch and OpenSearch also have a good presence in the cloud ecosystem.

Having them in destinations would be very interesting, especially in an architecture already using dlt as EL or ETL tool.

Are you a dlt user?

Yes, I'm already a dlt user.

Use case

Build a RAG solution, integrated with the Cloud Data Platform. The data platform extracts the company's documentation and knowledge base from various sources and centralizes it in the data lake, helping to implement versioning and reproducibility of the knowledge base and processing.

The vector store (and/or inverted index) would be Elasticsearch, OpenSearch or OpenSearch Serverless.

The data platform would use dlt as an extraction tool, and use it as a reverse ETL/EL to load text and vectors into an OpenSeach or Elasticsearch vector store.

Proposed solution

Reverse ETL or EL with dlt to build a knowledge base for a RAG or search engine:

Extract chunks of text and their associated vectors (already computed and versioned in the data platform) from a Data Warehouse (the source) and load them into Elasticsearch / OpenSearch (the destination).
Extract a table from a Data Lake (the source) referencing raw documents (such as images or PDFs), split documents and compute the vectors on the fly with a developer-supplied transformer and load them into Elasticsearch / OpenSearch.

The need here is to have the Elasticsearch / OpenSearch destinations, based on ES/OS Python clients, to be able to load data into indices with dlt, from existing dlt sources. The Bulk API could help load batches of documents.

Related issues

No response

sh-rp · 2025-01-02T10:36:35Z

@HectorBst just in case you missed it, we have a reverse etl destination which you can use to send data back to an api / queue or other type of endpoint here: https://dlthub.com/docs/dlt-ecosystem/destinations/destination. I'm not super familiar with elastic search, but my intuition is that we would rather build a nice code example with the reverse etl destination instead of a full destination implementation for this.

github-project-automation bot added this to dlt core library Dec 22, 2024

github-project-automation bot moved this to Todo in dlt core library Dec 22, 2024

Provide feedback

Saved searches

Use saved searches to filter your results more quickly

Elasticsearch / OpenSearch as destination #2173

Elasticsearch / OpenSearch as destination #2173

HectorBst commented Dec 22, 2024 •

edited

Loading

sh-rp commented Jan 2, 2025

Elasticsearch / OpenSearch as destination #2173

Elasticsearch / OpenSearch as destination #2173

Comments

HectorBst commented Dec 22, 2024 • edited Loading

Feature description

Are you a dlt user?

Use case

Proposed solution

Related issues

sh-rp commented Jan 2, 2025

HectorBst commented Dec 22, 2024 •

edited

Loading