Skip to content
New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

Academy/ts multimodal #2825

Open
wants to merge 7 commits into
base: main
Choose a base branch
from
Open
Show file tree
Hide file tree
Changes from 3 commits
Commits
File filter

Filter by extension

Filter by extension

Conversations
Failed to load comments.
Loading
Jump to
Jump to file
Failed to load files.
Loading
Diff view
Diff view
Original file line number Diff line number Diff line change
@@ -0,0 +1,37 @@
---
title: Weaviate Typescript client
description: Client Setup for Multimodal Data in Weaviate
---

## <i class="fa-solid fa-code"></i> Installation

The latest Weaviate TypeScript client library can be installed using npm. The client library is tested on Node v18 and later. Install it using the following command:

```bash
npm install weaviate-client
```

The latest major version is `v3` (e.g. `3.x.x`). You can check the version like so:

```bash
npm view weaviate-client version
```

## <i class="fa-solid fa-code"></i> Basic usage

You can import the Weaviate client library like so:

```typescript
import weaviate, { generateUuid5, ApiKey } from "weaviate-client"
```

The client provides sets of helper functions (e.g. `generateUuid5, ApiKey`) to make it easier to interact with Weaviate.

Next, we'll show you how create a Weaviate instance and connect to it.


## Questions and feedback

import DocsFeedback from '/_includes/docs-feedback.mdx';

<DocsFeedback/>
Original file line number Diff line number Diff line change
@@ -0,0 +1,94 @@
---
title: Create a local Docker instance
---

import Tabs from '@theme/Tabs';
import TabItem from '@theme/TabItem';
import FilteredTextBlock from '@site/src/components/Documentation/FilteredTextBlock';
import TSCode from '!!raw-loader!../_snippets/101_connect.ts';

:::note Can I use a cloud instance?
Generating multimodal vectors is currently only possible with local models, and as a result this course uses a local, Docker instance of Weaviate. If you are generating vectors outside of Weaviate, you can use a cloud instance. See the [Work with: your own vectors](../../starter_custom_vectors/index.md) course for more information.
:::

Here, you will create a Weaviate instance and a multi-modal vectorizer container using Docker.

### <i class="fa-solid fa-chalkboard"></i> Download and run the docker-compose file

Install Docker on your machine. We recommend following the [official Docker installation guide](https://docs.docker.com/get-docker/).

Create a new directory and navigate to it in your terminal. Then, create a new file called `docker-compose.yml` and add the following content:

```yaml
---
services:
weaviate:
command:
- --host
- 0.0.0.0
- --port
- '8080'
- --scheme
- http
image: cr.weaviate.io/semitechnologies/weaviate:||site.weaviate_version||
ports:
- 8080:8080
- 50051:50051
volumes:
- weaviate_data:/var/lib/weaviate
restart: on-failure:0
environment:
COHERE_APIKEY: $COHERE_APIKEY
QUERY_DEFAULTS_LIMIT: 25
AUTHENTICATION_ANONYMOUS_ACCESS_ENABLED: 'true'
PERSISTENCE_DATA_PATH: '/var/lib/weaviate'
DEFAULT_VECTORIZER_MODULE: 'multi2vec-cohere'
ENABLE_MODULES: 'multi2vec-cohere,generative-cohere'
CLUSTER_HOSTNAME: 'node1'
volumes:
weaviate_data:
...

```

### <i class="fa-solid fa-chalkboard"></i> Create a Weaviate instance

Run the following command to start Weaviate:

```bash
docker compose up
```

### <i class="fa-solid fa-chalkboard"></i> Your Weaviate instance details

Once the instance is created, you can access it at `http://localhost:8080`.

### <i class="fa-solid fa-code"></i> Connect to your Weaviate instance

To connect to the Weaviate instance, use the `connect_to_local` function.

<FilteredTextBlock
text={TSCode}
startMarker="// DockerInstantiation"
endMarker="// END DockerInstantiation"
language="ts"
/>

#### Provide inference API keys

Some Weaviate modules can use inference APIs for vectorizing data or large language model integration. You can provide the API keys for these services to Weaviate at instantiation.

This course uses Cohere (for retrieval augmented generation), so you can provide the Cohere API key to Weaviate through `headers={"X-Cohere-Api-Key": <YOUR_KEY>}` as shown below:

<FilteredTextBlock
text={TSCode}
startMarker="// DockerAPIKeyInstantiation"
endMarker="// END DockerAPIKeyInstantiation"
language="ts"
/>

## Questions and feedback

import DocsFeedback from '/_includes/docs-feedback.mdx';

<DocsFeedback/>
Original file line number Diff line number Diff line change
@@ -0,0 +1,64 @@
---
title: Communicate with Weaviate
description: Communication Setup for Multimodal Data
---
import Tabs from '@theme/Tabs';
import TabItem from '@theme/TabItem';
import FilteredTextBlock from '@site/src/components/Documentation/FilteredTextBlock';
import TSCode from '!!raw-loader!../_snippets/101_connect.ts';

Here, we'll perform basic operations to communicate with Weaviate using the TypeScript client library.

### <i class="fa-solid fa-code"></i> Check Weaviate status

You can check whether the Weaviate instance is up using the `isLive` function.

<FilteredTextBlock
text={TSCode}
startMarker="// PollLiveness"
endMarker="// END PollLiveness"
language="ts"
/>

### <i class="fa-solid fa-code"></i> Retrieve server meta information

You can retrieve meta information about the Weaviate instance using the `getMeta` function.

<FilteredTextBlock
text={TSCode}
startMarker="// GetMeta"
endMarker="// END GetMeta"
language="ts"
/>

This will print the server meta information to the console. The output will look similar to the following:

<details>
<summary>Example <code>getMeta()</code> output</summary>

<FilteredTextBlock
text={TSCode}
startMarker="// OutputGetMeta"
endMarker="// END OutputGetMeta"
language="ts"
/>
</details>

### <i class="fa-solid fa-code"></i> Close the connection

After you have finished using the Weaviate client, you should close the connection. This frees up resources and ensures that the connection is properly closed.

We suggest using a `try`-`finally` block as a best practice. For brevity, we will not include the `try`-`finally` blocks in the remaining code snippets.

<FilteredTextBlock
text={TSCode}
startMarker="// TryFinallyCloseDemo"
endMarker="// END TryFinallyCloseDemo"
language="ts"
/>

## Questions and feedback

import DocsFeedback from '/_includes/docs-feedback.mdx';

<DocsFeedback/>
Original file line number Diff line number Diff line change
@@ -0,0 +1,26 @@
---
title: Set up Weaviate
description: Weaviate Setup for Multimodal Data
---
<!-- Like a subject number (e.g. CS101) -->

<!-- ## Overview -->

<!-- Provide context for this course, in addition to the concrete learning goals and outcomes. Why would someone want to do this unit? -->

<!-- :::warning TODO
Intro video here
::: -->

## <i class="fa-solid fa-bullseye-arrow"></i> Learning objectives

import LearningGoals from '/src/components/Academy/learningGoals.jsx';

<!-- Replace unitName with name from `unitData.js` - will pull in learning goals and outcomes -->
<LearningGoals unitName="text_setup_weaviate_ts"/>

## Questions and feedback

import DocsFeedback from '/_includes/docs-feedback.mdx';

<DocsFeedback/>
Original file line number Diff line number Diff line change
@@ -0,0 +1,36 @@
---
title: Preparation
---

In this section you are going to populate your Weaviate instance with a movie dataset, using the multi-modal, CLIP models to embed the text and image data.

### <i class="fa-solid fa-chalkboard"></i> Weaviate instance

Make sure to have your Weaviate instance set up. You should have [created an instance](../101_setup_weaviate/20_create_docker.mdx) and be able to connect to it.

### <i class="fa-solid fa-code"></i> Source data

We are going to use a movie dataset sourced from [TMDB](https://www.themoviedb.org/). The dataset can be found in this [GitHub repository](https://raw.githubusercontent.com/weaviate-tutorials/edu-datasets/main/movies_data_1990_2024.json), and it contains bibliographic information on ~700 movies released between 1990 and 2024.

As a multimodal project, we'll also use [corresponding posters for each movie](https://raw.githubusercontent.com/weaviate-tutorials/edu-datasets/main/movies_data_1990_2024_posters.zip), which are available in the same repository.

<details>
<summary>See sample text data</summary>

| | backdrop_path | genre_ids | id | original_language | original_title | overview | popularity | poster_path | release_date | title | video | vote_average | vote_count |
|---:|:---------------------------------|:----------------|-----:|:--------------------|:----------------------------|:--------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------|-------------:|:---------------------------------|:---------------|:----------------------------|:--------|---------------:|-------------:|
| 0 | /3Nn5BOM1EVw1IYrv6MsbOS6N1Ol.jpg | [14, 18, 10749] | 162 | en | Edward Scissorhands | A small suburban town receives a visit from a castaway unfinished science experiment named Edward. | 45.694 | /1RFIbuW9Z3eN9Oxw2KaQG5DfLmD.jpg | 1990-12-07 | Edward Scissorhands | False | 7.7 | 12305 |
| 1 | /sw7mordbZxgITU877yTpZCud90M.jpg | [18, 80] | 769 | en | GoodFellas | The true story of Henry Hill, a half-Irish, half-Sicilian Brooklyn kid who is adopted by neighbourhood gangsters at an early age and climbs the ranks of a Mafia family under the guidance of Jimmy Conway. | 57.228 | /aKuFiU82s5ISJpGZp7YkIr3kCUd.jpg | 1990-09-12 | GoodFellas | False | 8.5 | 12106 |
| 2 | /6uLhSLXzB1ooJ3522ydrBZ2Hh0W.jpg | [35, 10751] | 771 | en | Home Alone | Eight-year-old Kevin McCallister makes the most of the situation after his family unwittingly leaves him behind when they go on Christmas vacation. But when a pair of bungling burglars set their sights on Kevin's house, the plucky kid stands ready to defend his territory. By planting booby traps galore, adorably mischievous Kevin stands his ground as his frantic mother attempts to race home before Christmas Day. | 3.538 | /onTSipZ8R3bliBdKfPtsDuHTdlL.jpg | 1990-11-16 | Home Alone | False | 7.4 | 10599 |
| 3 | /vKp3NvqBkcjHkCHSGi6EbcP7g4J.jpg | [12, 35, 878] | 196 | en | Back to the Future Part III | The final installment of the Back to the Future trilogy finds Marty digging the trusty DeLorean out of a mineshaft and looking for Doc in the Wild West of 1885. But when their time machine breaks down, the travelers are stranded in a land of spurs. More problems arise when Doc falls for pretty schoolteacher Clara Clayton, and Marty tangles with Buford Tannen. | 28.896 | /crzoVQnMzIrRfHtQw0tLBirNfVg.jpg | 1990-05-25 | Back to the Future Part III | False | 7.5 | 9918 |
| 4 | /3tuWpnCTe14zZZPt6sI1W9ByOXx.jpg | [35, 10749] | 114 | en | Pretty Woman | When a millionaire wheeler-dealer enters a business contract with a Hollywood hooker Vivian Ward, he loses his heart in the bargain. | 97.953 | /hVHUfT801LQATGd26VPzhorIYza.jpg | 1990-03-23 | Pretty Woman | False | 7.5 | 7671 |

</details>

Next, you will create a corresponding object collection and import the data.

## Questions and feedback

import DocsFeedback from '/_includes/docs-feedback.mdx';

<DocsFeedback/>
Original file line number Diff line number Diff line change
@@ -0,0 +1,91 @@
---
title: Create a collection
description: Creating Multimodal Data Collections
---

import Tabs from '@theme/Tabs';
import TabItem from '@theme/TabItem';
import FilteredTextBlock from '@site/src/components/Documentation/FilteredTextBlock';
import TSCode from '!!raw-loader!../_snippets/102_collection.ts';

Weaviate stores data in "collections". A collection is a set of objects that share the same data structure. In our movie database, we might have a collection of movies, a collection of actors, and a collection of reviews.

Here we will create a collection of movies.

## <i class="fa-solid fa-code"></i> Code

This example creates a collection for the movie data:

<FilteredTextBlock
text={TSCode}
startMarker="// CreateMovieCollection"
endMarker="// END CreateMovieCollection"
language="ts"
/>

Each collection definition must have a name. Then, you can define additional parameters like we've done in this example.

## <i class="fa-solid fa-chalkboard"></i> Explain the code

### <i class="fa-solid fa-chalkboard"></i> Properties

Properties are the object attributes that you want to store in the collection. Each property has a name and a data type.

In our movie database, we have properties like `title`, `release_date` and `genre_ids`, with data types like `TEXT` (string), `DATE` (date), or `INT` (integer). It's also possible to have arrays of integers, like we have with `genre_ids`.

As a multimodal object, we also have the `poster` property which is the image data, which is saved as a `BLOB` (binary large object) data type.

#### Auto-schema

Weaviate can automatically [infer the schema](/developers/weaviate/config-refs/schema/index.md#auto-schema) from the data. However, it's a good practice to define the properties explicitly, for better control and to avoid surprises.

### <i class="fa-solid fa-chalkboard"></i> Vectorizer configuration

If you do not specify the vector yourself, Weaviate will use a specified vectorizer to generate vector embeddings from your data.

In this code example, we specify the `multi2vec-clip` module. This module uses the CLIP model to generate vector embeddings from the text and image data.

You can specify any number of text and image properties to be used for vectorization, and weight them differently. The weights are used to determine the relative importance of each property in the vector embedding generation process. In this example, we vectorize the `poster` property (an image) with a 90% weight and the `title` property (a string) with a 10% weight.

<FilteredTextBlock
text={TSCode}
startMarker="// Define & configure the vectorizer module"
endMarker="// Define the generative module"
language="ts"
/>

### <i class="fa-solid fa-chalkboard"></i> Generative configuration

If you wish to use your collection with a generative model (e.g. a large language model), you must specify the generative module.

In this code example, we specify the `openai` module (`generative-openai` is the full name) with default options.

<FilteredTextBlock
text={TSCode}
startMarker="// Define the generative module"
endMarker="// END generativeDefinition"
language="ts"
/>

import MutableGenerativeConfig from '/_includes/mutable-generative-config.md';

<MutableGenerativeConfig />

### <i class="fa-solid fa-code"></i> Python classes

The code example makes use of classes such as `Property`, `DataType` and `Configure`. They are defined in the `weaviate.classes.config` submodule and are used to define the collection.

For convenience, we import the submodule as `wc` and use classes from it.

<FilteredTextBlock
text={TSCode}
startMarker="// SubmoduleImport"
endMarker="// END SubmoduleImport"
language="ts"
/>

## Questions and feedback

import DocsFeedback from '/_includes/docs-feedback.mdx';

<DocsFeedback/>
Loading
Loading