Skip to content

Commit

Permalink
Improve vector store onboarding exp (langchain-ai#6698)
Browse files Browse the repository at this point in the history
This PR
- fixes the `similarity_search_by_vector` example, makes the code run
and adds the example to mirror `similarity_search`
- reverts back to chroma from faiss to remove sharp edges / create a
happy path for new developers. (1) real metadata filtering, (2) expected
functionality like `update`, `delete`, etc to serve beyond the most
trivial use cases

@hwchase17

---------

Co-authored-by: Bagatur <[email protected]>
  • Loading branch information
jeffchuber and baskaryan authored Jul 18, 2023
1 parent 25a2bdf commit dc8b790
Show file tree
Hide file tree
Showing 3 changed files with 113 additions and 5 deletions.
Original file line number Diff line number Diff line change
Expand Up @@ -8,6 +8,8 @@ vectors, and then at query time to embed the unstructured query and retrieve the
'most similar' to the embedded query. A vector store takes care of storing embedded data and performing vector search
for you.

![vector store diagram](/img/vector_stores.jpg)

## Get started

This walkthrough showcases basic functionality related to VectorStores. A key part of working with vector stores is creating the vector to put in them, which is usually created via embeddings. Therefore, it is recommended that you familiarize yourself with the [text embedding model](/docs/modules/data_connection/text_embedding/) interfaces before diving into this.
Expand Down
Binary file added docs/docs_skeleton/static/img/vector_stores.jpg
Loading
Sorry, something went wrong. Reload?
Sorry, we cannot display this file.
Sorry, this file is invalid so it cannot be displayed.
116 changes: 111 additions & 5 deletions docs/snippets/modules/data_connection/vectorstores/get_started.mdx
Original file line number Diff line number Diff line change
@@ -1,3 +1,43 @@
import Tabs from '@theme/Tabs';
import TabItem from '@theme/TabItem';

There are many great vector store options, here are a few that are free, open-source, and run entirely on your local machine. Review all integrations for many great hosted offerings.

<Tabs>
<TabItem value="chroma" label="Chroma" default>

This walkthrough uses the `chroma` vector database, which runs on your local machine as a library.

```bash
pip install chromadb
```

We want to use OpenAIEmbeddings so we have to get the OpenAI API Key.


```python
import os
import getpass

os.environ['OPENAI_API_KEY'] = getpass.getpass('OpenAI API Key:')
```

```python
from langchain.document_loaders import TextLoader
from langchain.embeddings.openai import OpenAIEmbeddings
from langchain.text_splitter import CharacterTextSplitter
from langchain.vectorstores import Chroma

# Load the document, split it into chunks, embed each chunk and load it into the vector store.
raw_documents = TextLoader('../../../state_of_the_union.txt').load()
text_splitter = CharacterTextSplitter(chunk_size=1000, chunk_overlap=0)
documents = text_splitter.split_documents(raw_documents)
db = Chroma.from_documents(documents, OpenAIEmbeddings())
```

</TabItem>
<TabItem value="faiss" label="FAISS">

This walkthrough uses the `FAISS` vector database, which makes use of the Facebook AI Similarity Search (FAISS) library.

```bash
Expand All @@ -14,22 +54,71 @@ import getpass
os.environ['OPENAI_API_KEY'] = getpass.getpass('OpenAI API Key:')
```


```python
from langchain.document_loaders import TextLoader
from langchain.embeddings.openai import OpenAIEmbeddings
from langchain.text_splitter import CharacterTextSplitter
from langchain.vectorstores import FAISS


# Load the document, split it into chunks, embed each chunk and load it into the vector store.
raw_documents = TextLoader('../../../state_of_the_union.txt').load()
text_splitter = CharacterTextSplitter(chunk_size=1000, chunk_overlap=0)
documents = text_splitter.split_documents(raw_documents)
db = FAISS.from_documents(documents, OpenAIEmbeddings())
```

</TabItem>
<TabItem value="lance" label="Lance">

embeddings = OpenAIEmbeddings()
db = FAISS.from_documents(documents, embeddings)
This notebook shows how to use functionality related to the LanceDB vector database based on the Lance data format.

```bash
pip install lancedb
```

We want to use OpenAIEmbeddings so we have to get the OpenAI API Key.


```python
import os
import getpass

os.environ['OPENAI_API_KEY'] = getpass.getpass('OpenAI API Key:')
```

```python
from langchain.document_loaders import TextLoader
from langchain.embeddings.openai import OpenAIEmbeddings
from langchain.text_splitter import CharacterTextSplitter
from langchain.vectorstores import LanceDB

import lancedb

db = lancedb.connect("/tmp/lancedb")
table = db.create_table(
"my_table",
data=[
{
"vector": embeddings.embed_query("Hello World"),
"text": "Hello World",
"id": "1",
}
],
mode="overwrite",
)

# Load the document, split it into chunks, embed each chunk and load it into the vector store.
raw_documents = TextLoader('../../../state_of_the_union.txt').load()
text_splitter = CharacterTextSplitter(chunk_size=1000, chunk_overlap=0)
documents = text_splitter.split_documents(raw_documents)
db = LanceDB.from_documents(documents, OpenAIEmbeddings(), connection=table)
```

</TabItem>
</Tabs>



### Similarity search

```python
Expand Down Expand Up @@ -57,6 +146,23 @@ print(docs[0].page_content)
It is also possible to do a search for documents similar to a given embedding vector using `similarity_search_by_vector` which accepts an embedding vector as a parameter instead of a string.

```python
embedding_vector = embeddings.embed_query(query)
embedding_vector = OpenAIEmbeddings().embed_query(query)
docs = db.similarity_search_by_vector(embedding_vector)
print(docs[0].page_content)
```

The query is the same, and so the result is also the same.

<CodeOutputBlock lang="python">

```
Tonight. I call on the Senate to: Pass the Freedom to Vote Act. Pass the John Lewis Voting Rights Act. And while you’re at it, pass the Disclose Act so Americans can know who is funding our elections.
Tonight, I’d like to honor someone who has dedicated his life to serve this country: Justice Stephen Breyer—an Army veteran, Constitutional scholar, and retiring Justice of the United States Supreme Court. Justice Breyer, thank you for your service.
One of the most serious constitutional responsibilities a President has is nominating someone to serve on the United States Supreme Court.
And I did that 4 days ago, when I nominated Circuit Court of Appeals Judge Ketanji Brown Jackson. One of our nation’s top legal minds, who will continue Justice Breyer’s legacy of excellence.
```

</CodeOutputBlock>

0 comments on commit dc8b790

Please sign in to comment.