Skip to content

Commit

Permalink
Add RAG and Doc Ingest techniques
Browse files Browse the repository at this point in the history
  • Loading branch information
rishabh committed Aug 22, 2024
1 parent 9e84cf3 commit fdb2fe2
Show file tree
Hide file tree
Showing 30 changed files with 3,328 additions and 0 deletions.
96 changes: 96 additions & 0 deletions techniques/embed-documents/README.md
Original file line number Diff line number Diff line change
@@ -0,0 +1,96 @@
# Title

<details>
<summary>How to run this example</summary>
<br/>

```bash
# Set your API key as an environment variable.
export SUBSTRATE_API_KEY=ENTER_YOUR_KEY

# Run the TypeScript example
cd typescript # Navigate to the typescript example
npm install # Install dependencies
ts-node example.ts # Run the example

# If using Poetry:
cd python # Navigate to the python example
poetry install # Install dependencies and build the example
poetry run main # Run the example

# If using Rye:
# Update pyproject.toml to switch to Rye.
cd python # Navigate to the python example
rye sync # Install dependencies and build the example
rye run main # Run the example
```

</details>

We follow this procedure to create consistent, high volume content.

1. Come up with a short readable slug, e.g. `generate-json` and a title.
2. Create a folder in the [examples repo](https://github.com/SubstrateLabs/examples), copying this folder
3. Write the code in TS or Python, and keep it simple. Ideally it’s just a script with no additional dependencies.
1. Consider creating illustrative variations of the script (e.g. `ComputeText` and `MultiComputeText` - [example](https://github.com/SubstrateLabs/examples/tree/main/basics/generate-text))
2. Translate your script to the other language. (TODO: automated translation with Substrate)
3. Make sure both examples run and produce simple polished output.
4. Simplify the code
1. Wrap lines (multi-line node declarations are easier to read)
2. Consider inlining variables
4. Fill out this README with walkthrough text and generate new image assets.

![hero](hero.png)

To generate text with an LLM, use [`ComputeText`](https://www.substrate.run/nodes#ComputeText).

In the code snippets below, note how we've simplified the example code to:

- Use a hardcoded API key, rather than reading from an environment variable.
- Remove the main function
- Combine getting the result of a node and printing it

Try your best to limit extraneous content in both text and code.

```python Python
# example.py
from substrate import Substrate, ComputeText

substrate = Substrate(api_key="YOUR_API_KEY")

story = ComputeText(prompt="tell me a short 2-sentence story")
res = substrate.run(story)

print(res.get(story).text)
```

```typescript TypeScript
// example.ts
import { Substrate, ComputeText } from "substrate";

const substrate = new Substrate({ apiKey: "YOUR_API_KEY" });

const story = new ComputeText({ prompt: "tell me a short 2-sentence story" });
const res = await substrate.run(story);

console.log(res.get(story).text);
```

When you're done, generate some images. You'll need a banner image.

- For the text, keep it simple, e.g. you can just use the name of a node: `ComputeText`.

```bash
cd _internal
poetry run marimo edit marketing.py
```

If your example is a graph, create a diagram.

![diagram](diagram.svg)

To edit the diagram, run:

```bash
d2 -w diagram.d2 diagram.svg
```
43 changes: 43 additions & 0 deletions techniques/embed-documents/diagram.d2
Original file line number Diff line number Diff line change
@@ -0,0 +1,43 @@
direction: right
classes: {
substrate: {
label: "substrate"
style: {
font: mono
font-color: gray
font-size: 20
stroke: gray
stroke-dash: 1
fill: "transparent"
border-radius: 16
}
}
node: {
style: {
font: mono
font-size: 24
stroke-width: 2
fill: transparent
stroke: gray
border-radius: 16
stroke-dash: 1
3d: true
}
}
edge: {
style: {
stroke: "#000"
stroke-dash: 2
}
}
}

substrate.class: substrate
substrate.a.class: node
substrate.b.class: node
substrate.c.class: node
substrate.a.label: heuristic
substrate.b.label: symbolic
substrate.c.label: computation
substrate.a->substrate.c { class: edge }
substrate.b->substrate.c { class: edge }
111 changes: 111 additions & 0 deletions techniques/embed-documents/diagram.svg
Loading
Sorry, something went wrong. Reload?
Sorry, we cannot display this file.
Sorry, this file is invalid so it cannot be displayed.
Binary file added techniques/embed-documents/hero.png
Loading
Sorry, something went wrong. Reload?
Sorry, we cannot display this file.
Sorry, this file is invalid so it cannot be displayed.
10 changes: 10 additions & 0 deletions techniques/embed-documents/python/.gitignore
Original file line number Diff line number Diff line change
@@ -0,0 +1,10 @@
# python generated files
__pycache__/
*.py[oc]
build/
dist/
wheels/
*.egg-info

# venv
.venv
26 changes: 26 additions & 0 deletions techniques/embed-documents/python/README.md
Original file line number Diff line number Diff line change
@@ -0,0 +1,26 @@
# Substrate Example Templates in Python

This is a Substrate example template written in Python. To run this example,

```bash
# Set your API key as an environment variable.
# Get one here https://www.substrate.run/dashboard/keys if this is your first time.
export SUBSTRATE_API_KEY=<your Substrate API key>

# Navigate to the python example directory.
cd python
```

To run the example with Poetry (default), run the following.

```bash
poetry install
poetry run main
```

To run the example with Rye, comment out the Poetry sections and uncomment the Rye sections in `pyproject.toml` and run the following.

```bash
rye sync
rye run main
```
Loading

0 comments on commit fdb2fe2

Please sign in to comment.