Skip to content

Commit

Permalink
Updates
Browse files Browse the repository at this point in the history
  • Loading branch information
databyjp committed Feb 17, 2024
1 parent 34d092d commit 5821726
Show file tree
Hide file tree
Showing 5 changed files with 159 additions and 56 deletions.
Original file line number Diff line number Diff line change
@@ -1,7 +1,7 @@
import os

your_wcs_url = os.getenv("WCS_DEMO_URL")
your_wcs_key = os.getenv("WCS_DEMO_RO_KEY")
your_wcs_key = os.getenv("WCS_DEMO_ADMIN_KEY")

# WCSInstantiation
import weaviate
Expand Down
Original file line number Diff line number Diff line change
Expand Up @@ -2,83 +2,77 @@
title: Create a collection
---

import Tabs from '@theme/Tabs';
import TabItem from '@theme/TabItem';
import FilteredTextBlock from '@site/src/components/Documentation/FilteredTextBlock';
import PyCode from '!!raw-loader!./_snippets/20_create_collection.py';

## <i class="fa-solid fa-square-chevron-right"></i> Background

Weaviate stores data in "collections". A collection is a set of objects that share the same data structure. In our movie database, we might have a collection of movies, a collection of actors, and a collection of reviews.

Here we will create a collection of movies.

### <i class="fa-solid fa-code"></i> Collection creation


### <i class="fa-solid fa-chalkboard"></i> Data structure
## <i class="fa-solid fa-square-chevron-right"></i> Collection creation

Each collection
### <i class="fa-solid fa-code"></i> Code

This example creates a collection for the movie data:

Some text
<FilteredTextBlock
text={PyCode}
startMarker="# CreateMovieCollection"
endMarker="# END CreateMovieCollection"
language="py"
/>

### <i class="fa-solid fa-chalkboard"></i> Theory subhead
### <i class="fa-solid fa-code"></i> Practical subhead
Each collection definition must have a name. Then, you can define additional parameters like we've done in this example. Let's break it down:

Some text
### <i class="fa-solid fa-chalkboard"></i> Properties

## <i class="fa-solid fa-square-chevron-right"></i> TOP LEVEL HEADING
Properties are the object attributes that you want to store in the collection. Each property has a name and a data type.

Some text
In our movie database, we have properties like `title`, `release_date` and `genre_ids`, with data types like `TEXT` (string), `DATE` (date), or `INT` (integer). It's also possible to have arrays of integers, like we have with `genre_ids`.

### <i class="fa-solid fa-chalkboard"></i> Theory subhead
### <i class="fa-solid fa-code"></i> Practical subhead
### <i class="fa-solid fa-chalkboard"></i> Vectorizer configuration

Some text
If you do not specify the vector yourself, Weaviate will use a specified vectorizer to generate vector embeddings from your data.

### <i class="fa-solid fa-chalkboard"></i> Theory subhead
### <i class="fa-solid fa-code"></i> Practical subhead
In this code example, we specify the `text2vec-openai` module with default options.

Some text
<FilteredTextBlock
text={PyCode}
startMarker="# Define the vectorizer module"
endMarker="# Define the generative module"
language="py"
/>

### <i class="fa-solid fa-chalkboard"></i> Generative configuration

If you wish to use your collection with a generative model (e.g. a large language model), you must specify the generative module.

## <i class="fa-solid fa-square-chevron-right"></i> Review
In this code example, we specify the `openai` module (`generative-openai` is the full name) with default options.

<Quiz questions={varName} />
<FilteredTextBlock
text={PyCode}
startMarker="# Define the generative module"
endMarker="# END generativeDefinition"
language="py"
/>

Any quiz questions
### <i class="fa-solid fa-code"></i> Python classes

### <i class="fa-solid fa-pen-to-square"></i> Review exercise
The code example makes use of classes such as `Property`, `DataType` and `Configure`. They are defined in the `weaviate.classes.config` submodule and are used to define the collection.

:::note <i class="fa-solid fa-square-terminal"></i> Exercise
Try out ...
:::
For convenience, we import the submodule as `wc` and use classes from it.

### <i class="fa-solid fa-lightbulb-on"></i> Key takeaways

:::info
Add summary
:::
<FilteredTextBlock
text={PyCode}
startMarker="# SubmoduleImport"
endMarker="# END SubmoduleImport"
language="py"
/>

import { GiscusDocComment } from '/src/components/GiscusComment';

<GiscusDocComment />

import Quiz from '/src/components/Academy/quiz.js'
const varName = [{
questionText: 'questionText',
answerOptions: [
{
answerText: 'answerOne',
isCorrect: false,
feedback: 'feedbackOne',
},
{
answerText: 'answerTwo',
isCorrect: false,
feedback: 'feedbackTwo',
},
{
answerText: 'answerThree',
isCorrect: false,
feedback: 'feedbackThree',
},
]
}];
Original file line number Diff line number Diff line change
Expand Up @@ -2,13 +2,23 @@
title: Import data
---

import imageUrl from '../../tmp_images/academy_placeholder.jpg';
import Tabs from '@theme/Tabs';
import TabItem from '@theme/TabItem';
import FilteredTextBlock from '@site/src/components/Documentation/FilteredTextBlock';
import PyCode from '!!raw-loader!./_snippets/30_import_data.py';

<img src={imageUrl} alt="Image alt" width="75%"/>
## <i class="fa-solid fa-square-chevron-right"></i> Collection creation

## <i class="fa-solid fa-square-chevron-right"></i> TOP LEVEL HEADING
### <i class="fa-solid fa-code"></i> Code

Some text
This example imports the movie data into our collection.

<FilteredTextBlock
text={PyCode}
startMarker="# BatchImportData"
endMarker="# END BatchImportData"
language="py"
/>

### <i class="fa-solid fa-chalkboard"></i> Theory subhead
### <i class="fa-solid fa-code"></i> Practical subhead
Expand Down
Original file line number Diff line number Diff line change
@@ -0,0 +1,48 @@
import os

your_wcs_url = os.getenv("WCS_DEMO_URL")
your_wcs_key = os.getenv("WCS_DEMO_ADMIN_KEY")

# CreateMovieCollection
import weaviate
# CreateMovieCollection # SubmoduleImport
import weaviate.classes.config as wc
# CreateMovieCollection # END SubmoduleImport

client = weaviate.connect_to_wcs(
cluster_url=your_wcs_url, # Replace with your WCS URL
auth_credentials=weaviate.auth.AuthApiKey(your_wcs_key) # Replace with your WCS key
)

# END CreateMovieCollection

client.close()

# Actual instantiation
client = weaviate.connect_to_local()

client.collections.delete("Movie")

# CreateMovieCollection
try:
client.collections.create(
name="Movie",
properties=[
wc.Property(name="title", data_type=wc.DataType.TEXT),
wc.Property(name="overview", data_type=wc.DataType.TEXT),
wc.Property(name="vote_average", data_type=wc.DataType.NUMBER),
wc.Property(name="genre_ids", data_type=wc.DataType.INT_ARRAY),
wc.Property(name="release_date", data_type=wc.DataType.DATE),
wc.Property(name="tmdb_id", data_type=wc.DataType.INT),
],
# Define the vectorizer module
vectorizer_config=wc.Configure.Vectorizer.text2vec_openai(),
# Define the generative module
generative_config=wc.Configure.Generative.openai()
# END generativeDefinition # CreateMovieCollection
)

finally:
client.close() # Release resources
# END CreateMovieCollection

Original file line number Diff line number Diff line change
@@ -0,0 +1,51 @@
import os

your_wcs_url = os.getenv("WCS_DEMO_URL")
your_wcs_key = os.getenv("WCS_DEMO_ADMIN_KEY")

# BatchImportData
import weaviate
import pandas as pd
import requests
import datetime
from weaviate.util import generate_uuid5

client = weaviate.connect_to_wcs(
cluster_url=your_wcs_url, # Replace with your WCS URL
auth_credentials=weaviate.auth.AuthApiKey(your_wcs_key) # Replace with your WCS key
)

# END BatchImportData

client.close()

# Actual instantiation
client = weaviate.connect_to_local()
client.connect()

# BatchImportData
data_url = "https://raw.githubusercontent.com/weaviate-tutorials/edu-datasets/main/movies_data_1990_2024.json"
resp = requests.get(data_url)
df = pd.DataFrame(resp.json())

try:
movies = client.collections.get("Movie")
with movies.batch.dynamic() as batch:
for i, movie in df[:10].iterrows():
release_date = datetime.datetime.strptime(movie["release_date"], "%Y-%m-%d")
movie_obj = {
"title": movie["title"],
"overview": movie["overview"],
"vote_average": movie["vote_average"],
"genre_ids": movie["genre_ids"],
"release_date": release_date,
"tmdb_id": movie["id"]
}
batch.add_object(
properties=movie_obj,
uuid=generate_uuid5(movie["id"])
)

finally:
client.close()

0 comments on commit 5821726

Please sign in to comment.