You signed in with another tab or window. Reload to refresh your session.You signed out in another tab or window. Reload to refresh your session.You switched accounts on another tab or window. Reload to refresh your session.Dismiss alert
Elasticsearch (and its fork OpenSearch) is a mature and proven technology for building search engines. It can also be used as a Vector Store and Inverted Index to build a robust RAG solution. Elasticsearch and OpenSearch also have a good presence in the cloud ecosystem.
Having them in destinations would be very interesting, especially in an architecture already using dlt as EL or ETL tool.
Are you a dlt user?
Yes, I'm already a dlt user.
Use case
Build a RAG solution, integrated with the Cloud Data Platform. The data platform extracts the company's documentation and knowledge base from various sources and centralizes it in the data lake, helping to implement versioning and reproducibility of the knowledge base and processing.
The vector store (and/or inverted index) would be Elasticsearch, OpenSearch or OpenSearch Serverless.
The data platform would use dlt as an extraction tool, and use it as a reverse ETL/EL to load text and vectors into an OpenSeach or Elasticsearch vector store.
Proposed solution
Reverse ETL or EL with dlt to build a knowledge base for a RAG or search engine:
Extract chunks of text and their associated vectors (already computed and versioned in the data platform) from a Data Warehouse (the source) and load them into Elasticsearch / OpenSearch (the destination).
Extract a table from a Data Lake (the source) referencing raw documents (such as images or PDFs), split documents and compute the vectors on the fly with a developer-supplied transformer and load them into Elasticsearch / OpenSearch.
The need here is to have the Elasticsearch / OpenSearch destinations, based on ES/OS Python clients, to be able to load data into indices with dlt, from existing dlt sources. The Bulk API could help load batches of documents.
Related issues
No response
The text was updated successfully, but these errors were encountered:
@HectorBst just in case you missed it, we have a reverse etl destination which you can use to send data back to an api / queue or other type of endpoint here: https://dlthub.com/docs/dlt-ecosystem/destinations/destination. I'm not super familiar with elastic search, but my intuition is that we would rather build a nice code example with the reverse etl destination instead of a full destination implementation for this.
Feature description
Elasticsearch (and its fork OpenSearch) is a mature and proven technology for building search engines. It can also be used as a Vector Store and Inverted Index to build a robust RAG solution. Elasticsearch and OpenSearch also have a good presence in the cloud ecosystem.
Having them in destinations would be very interesting, especially in an architecture already using dlt as EL or ETL tool.
Are you a dlt user?
Yes, I'm already a dlt user.
Use case
Build a RAG solution, integrated with the Cloud Data Platform. The data platform extracts the company's documentation and knowledge base from various sources and centralizes it in the data lake, helping to implement versioning and reproducibility of the knowledge base and processing.
The vector store (and/or inverted index) would be Elasticsearch, OpenSearch or OpenSearch Serverless.
The data platform would use dlt as an extraction tool, and use it as a reverse ETL/EL to load text and vectors into an OpenSeach or Elasticsearch vector store.
Proposed solution
Reverse ETL or EL with dlt to build a knowledge base for a RAG or search engine:
The need here is to have the Elasticsearch / OpenSearch destinations, based on ES/OS Python clients, to be able to load data into indices with dlt, from existing dlt sources. The Bulk API could help load batches of documents.
Related issues
No response
The text was updated successfully, but these errors were encountered: