Skip to content

Commit eb3bb8d

Browse files
authored
PropertyGraphStore support for Amazon Neptune (run-llama#15126)
1 parent c6c9c64 commit eb3bb8d

File tree

11 files changed

+1422
-230
lines changed

11 files changed

+1422
-230
lines changed
Lines changed: 320 additions & 0 deletions
Original file line numberDiff line numberDiff line change
@@ -0,0 +1,320 @@
1+
{
2+
"cells": [
3+
{
4+
"cell_type": "markdown",
5+
"id": "27bc87b7",
6+
"metadata": {},
7+
"source": [
8+
"# Amazon Neptune Property Graph Store"
9+
]
10+
},
11+
{
12+
"cell_type": "code",
13+
"execution_count": null,
14+
"id": "78b60432",
15+
"metadata": {},
16+
"outputs": [],
17+
"source": [
18+
"%pip install boto3\n",
19+
"%pip install llama-index-llms-bedrock\n",
20+
"%pip install llama-index-graph-stores-neptune\n",
21+
"%pip install llama-index-embeddings-bedrock"
22+
]
23+
},
24+
{
25+
"cell_type": "markdown",
26+
"id": "be3f7baa-1c0a-430b-981b-83ddca9e71f2",
27+
"metadata": {},
28+
"source": [
29+
"## Using Property Graph with Amazon Neptune"
30+
]
31+
},
32+
{
33+
"cell_type": "markdown",
34+
"id": "97221c15",
35+
"metadata": {},
36+
"source": [
37+
"### Add the required imports"
38+
]
39+
},
40+
{
41+
"cell_type": "code",
42+
"execution_count": null,
43+
"id": "c79c7f2e",
44+
"metadata": {},
45+
"outputs": [],
46+
"source": [
47+
"from llama_index.llms.bedrock import Bedrock\n",
48+
"from llama_index.embeddings.bedrock import BedrockEmbedding\n",
49+
"from llama_index.core import (\n",
50+
" StorageContext,\n",
51+
" SimpleDirectoryReader,\n",
52+
" PropertyGraphIndex,\n",
53+
" Settings,\n",
54+
")\n",
55+
"from llama_index.graph_stores.neptune import (\n",
56+
" NeptuneAnalyticsPropertyGraphStore,\n",
57+
" NeptuneDatabasePropertyGraphStore,\n",
58+
")\n",
59+
"from IPython.display import Markdown, display"
60+
]
61+
},
62+
{
63+
"cell_type": "markdown",
64+
"id": "f553e01f",
65+
"metadata": {},
66+
"source": [
67+
"### Configure the LLM to use, in this case Amazon Bedrock and Claude 2.1"
68+
]
69+
},
70+
{
71+
"cell_type": "code",
72+
"execution_count": null,
73+
"id": "032264ce",
74+
"metadata": {},
75+
"outputs": [],
76+
"source": [
77+
"llm = Bedrock(model=\"anthropic.claude-v2\")\n",
78+
"embed_model = BedrockEmbedding(model=\"amazon.titan-embed-text-v1\")\n",
79+
"\n",
80+
"Settings.llm = llm\n",
81+
"Settings.embed_model = embed_model\n",
82+
"Settings.chunk_size = 512"
83+
]
84+
},
85+
{
86+
"cell_type": "markdown",
87+
"id": "75f1d565-04e8-41bc-9165-166dc89b6b47",
88+
"metadata": {},
89+
"source": [
90+
"### Building the Graph\n",
91+
"\n",
92+
"### Read in the sample file"
93+
]
94+
},
95+
{
96+
"cell_type": "code",
97+
"execution_count": null,
98+
"id": "1c297fd3-3424-41d8-9d0d-25fe6310ab62",
99+
"metadata": {},
100+
"outputs": [],
101+
"source": [
102+
"documents = SimpleDirectoryReader(\n",
103+
" \"../../../../examples/paul_graham_essay/data\"\n",
104+
").load_data()"
105+
]
106+
},
107+
{
108+
"cell_type": "markdown",
109+
"id": "f0edbc99",
110+
"metadata": {},
111+
"source": [
112+
"### Instantiate Neptune Property Graph Indexes\n",
113+
"\n",
114+
"When using Amazon Neptune you can choose to use either Neptune Database or Neptune Analytics.\n",
115+
"\n",
116+
"Neptune Database is a serverless graph database designed for optimal scalability and availability. It provides a solution for graph database workloads that need to scale to 100,000 queries per second, Multi-AZ high availability, and multi-Region deployments. You can use Neptune Database for social networking, fraud alerting, and Customer 360 applications.\n",
117+
"\n",
118+
"Neptune Analytics is an analytics database engine that can quickly analyze large amounts of graph data in memory to get insights and find trends. Neptune Analytics is a solution for quickly analyzing existing graph databases or graph datasets stored in a data lake. It uses popular graph analytic algorithms and low-latency analytic queries.\n",
119+
"\n",
120+
"\n",
121+
"#### Using Neptune Database\n",
122+
"If you can choose to use [Neptune Database](https://docs.aws.amazon.com/neptune/latest/userguide/feature-overview.html) to store your property graph index you can create the graph store as shown below."
123+
]
124+
},
125+
{
126+
"cell_type": "code",
127+
"execution_count": null,
128+
"id": "31ca71c6",
129+
"metadata": {},
130+
"outputs": [],
131+
"source": [
132+
"graph_store = NeptuneDatabasePropertyGraphStore(\n",
133+
" host=\"<GRAPH NAME>.<CLUSTER ID>.<REGION>.neptune.amazonaws.com\", port=8182\n",
134+
")"
135+
]
136+
},
137+
{
138+
"cell_type": "markdown",
139+
"id": "67418411",
140+
"metadata": {},
141+
"source": [
142+
"#### Neptune Analytics\n",
143+
"\n",
144+
"If you can choose to use [Neptune Analytics](https://docs.aws.amazon.com/neptune-analytics/latest/userguide/what-is-neptune-analytics.html) to store your property index you can create the graph store as shown below."
145+
]
146+
},
147+
{
148+
"cell_type": "code",
149+
"execution_count": null,
150+
"id": "b6b11a9d",
151+
"metadata": {},
152+
"outputs": [],
153+
"source": [
154+
"graph_store = NeptuneAnalyticsPropertyGraphStore(\n",
155+
" graph_identifier=\"<INSERT GRAPH IDENIFIER>\"\n",
156+
")"
157+
]
158+
},
159+
{
160+
"cell_type": "code",
161+
"execution_count": null,
162+
"id": "370fd08f-56ff-4c24-b0c4-c93116a6d482",
163+
"metadata": {},
164+
"outputs": [],
165+
"source": [
166+
"storage_context = StorageContext.from_defaults(graph_store=graph_store)\n",
167+
"\n",
168+
"# NOTE: can take a while!\n",
169+
"index = PropertyGraphIndex.from_documents(\n",
170+
" documents,\n",
171+
" property_graph_store = graph_store,\n",
172+
" storage_context=storage_context\n",
173+
")\n",
174+
"\n",
175+
"# Loading from an existing graph\n",
176+
" index = PropertyGraphIndex.from_existing(\n",
177+
" property_graph_store=graph_store\n",
178+
" )"
179+
]
180+
},
181+
{
182+
"cell_type": "markdown",
183+
"id": "c39a0eeb-ef16-4982-8ba8-b37c2c5f4437",
184+
"metadata": {},
185+
"source": [
186+
"#### Querying the Property Graph\n",
187+
"\n",
188+
"First, we can query and send only the values to the LLM."
189+
]
190+
},
191+
{
192+
"cell_type": "code",
193+
"execution_count": null,
194+
"id": "670300d8-d0a8-4201-bbcd-4a74b199fcdd",
195+
"metadata": {},
196+
"outputs": [],
197+
"source": [
198+
"query_engine = index.as_query_engine(\n",
199+
" include_text=True,\n",
200+
" llm=llm,\n",
201+
")\n",
202+
"\n",
203+
"response = query_engine.query(\"Tell me more about Interleaf\")"
204+
]
205+
},
206+
{
207+
"cell_type": "code",
208+
"execution_count": null,
209+
"id": "eecf2d57-3efa-4b0d-941a-95438d42893c",
210+
"metadata": {},
211+
"outputs": [],
212+
"source": [
213+
"display(Markdown(f\"<b>{response}</b>\"))"
214+
]
215+
},
216+
{
217+
"cell_type": "markdown",
218+
"id": "70ff01f3",
219+
"metadata": {},
220+
"source": [
221+
"Second, we can use the query using a retriever"
222+
]
223+
},
224+
{
225+
"cell_type": "code",
226+
"execution_count": null,
227+
"id": "ba48ad92",
228+
"metadata": {},
229+
"outputs": [],
230+
"source": [
231+
"retriever = index.as_retriever(\n",
232+
" include_text=True,\n",
233+
")\n",
234+
"\n",
235+
"nodes = retriever.retrieve(\"What happened at Interleaf and Viaweb?\")"
236+
]
237+
},
238+
{
239+
"cell_type": "markdown",
240+
"id": "84d44440",
241+
"metadata": {},
242+
"source": [
243+
"Third, we can use a `TextToCypherRetriever` to convert natural language questions into dynamic openCypher queries"
244+
]
245+
},
246+
{
247+
"cell_type": "code",
248+
"execution_count": null,
249+
"id": "bcdc59f0",
250+
"metadata": {},
251+
"outputs": [],
252+
"source": [
253+
"from llama_index.core.indices.property_graph import TextToCypherRetriever\n",
254+
"\n",
255+
"cypher_retriever = TextToCypherRetriever(index.property_graph_store)\n",
256+
"\n",
257+
"nodes = cypher_retriever.retrieve(\"What happened at Interleaf and Viaweb?\")\n",
258+
"print(nodes)"
259+
]
260+
},
261+
{
262+
"cell_type": "markdown",
263+
"id": "d0d5c8e3",
264+
"metadata": {},
265+
"source": [
266+
"Finally, we can use a `CypherTemplateRetriever` to provide a more constrained version of the `TextToCypherRetriever`. Rather than letting the LLM have free-range of generating any openCypher statement, we can instead provide a openCypher template and have the LLM fill in the blanks."
267+
]
268+
},
269+
{
270+
"cell_type": "code",
271+
"execution_count": null,
272+
"id": "06aba3cd",
273+
"metadata": {},
274+
"outputs": [],
275+
"source": [
276+
"from pydantic.v1 import BaseModel, \n",
277+
"from llama_index.core.indices.property_graph import CypherTemplateRetriever\n",
278+
"\n",
279+
"cypher_query = \"\"\"\n",
280+
" MATCH (c:Chunk)-[:MENTIONS]->(o)\n",
281+
" WHERE o.name IN $names\n",
282+
" RETURN c.text, o.name, o.label;\n",
283+
" \"\"\"\n",
284+
"\n",
285+
"class TemplateParams(BaseModel):\n",
286+
" \"\"\"Template params for a cypher query.\"\"\"\n",
287+
"\n",
288+
" names: list[str] = Field(\n",
289+
" description=\"A list of entity names or keywords to use for lookup in a knowledge graph.\"\n",
290+
" )\n",
291+
"\n",
292+
"cypher_retriever = CypherTemplateRetriever(\n",
293+
" index.property_graph_store, TemplateParams, cypher_query\n",
294+
")\n",
295+
"nodes = cypher_retriever.retrieve(\"What happened at Interleaf and Viaweb?\")\n",
296+
"print(nodes)"
297+
]
298+
}
299+
],
300+
"metadata": {
301+
"kernelspec": {
302+
"display_name": "Python 3 (ipykernel)",
303+
"language": "python",
304+
"name": "python3"
305+
},
306+
"language_info": {
307+
"codemirror_mode": {
308+
"name": "ipython",
309+
"version": 3
310+
},
311+
"file_extension": ".py",
312+
"mimetype": "text/x-python",
313+
"name": "python",
314+
"nbconvert_exporter": "python",
315+
"pygments_lexer": "ipython3"
316+
}
317+
},
318+
"nbformat": 4,
319+
"nbformat_minor": 5
320+
}
Lines changed: 35 additions & 0 deletions
Original file line numberDiff line numberDiff line change
@@ -1 +1,36 @@
11
# LlamaIndex Graph_Stores Integration: Neptune
2+
3+
Amazon Neptune makes it easy to work with graph data in the AWS Cloud. Amazon Neptune includes both Neptune Database and Neptune Analytics.
4+
5+
Neptune Database is a serverless graph database designed for optimal scalability and availability. It provides a solution for graph database workloads that need to scale to 100,000 queries per second, Multi-AZ high availability, and multi-Region deployments. You can use Neptune Database for social networking, fraud alerting, and Customer 360 applications.
6+
7+
Neptune Analytics is an analytics database engine that can quickly analyze large amounts of graph data in memory to get insights and find trends. Neptune Analytics is a solution for quickly analyzing existing graph databases or graph datasets stored in a data lake. It uses popular graph analytic algorithms and low-latency analytic queries.
8+
9+
In this project, we integrate both Neptune Database and Neptune Analytics as the graph store to store the LlamaIndex graph data,
10+
11+
and use openCypher to query the graph data. so that people can use Neptune to interact with LlamaIndex graph index.
12+
13+
- Neptune Database
14+
15+
- Property Graph Store: `NeptuneDatabasePropertyGraphStore`
16+
- Knowledge Graph Store: `NeptuneDatabaseGraphStore`
17+
18+
- Neptune Analytics
19+
- Property Graph Store: `NeptuneAnalyticsPropertyGraphStore`
20+
- Knowledge Graph Store: `NeptuneAnalyticsGraphStore`
21+
22+
## Installation
23+
24+
```shell
25+
pip install llama-index llama-index-graph-stores-neptune
26+
```
27+
28+
## Usage
29+
30+
### Property Graph Store
31+
32+
Please checkout this [tutorial](https://github.com/run-llama/llama_index/blob/main/docs/docs/examples/property_graph/property_graph_neptune.ipynb) to learn how to use Amazon Neptune with LlamaIndex.
33+
34+
### Knowledge Graph Store
35+
36+
Checkout this [tutorial](https://github.com/run-llama/llama_index/blob/main/docs/docs/examples/index_structs/knowledge_graph/NeptuneDatabaseKGIndexDemo.ipynb) to learn how to use Amazon Neptune with LlamaIndex.
Original file line numberDiff line numberDiff line change
@@ -1,4 +1,15 @@
11
from llama_index.graph_stores.neptune.analytics import NeptuneAnalyticsGraphStore
22
from llama_index.graph_stores.neptune.database import NeptuneDatabaseGraphStore
3+
from llama_index.graph_stores.neptune.analytics_property_graph import (
4+
NeptuneAnalyticsPropertyGraphStore,
5+
)
6+
from llama_index.graph_stores.neptune.database_property_graph import (
7+
NeptuneDatabasePropertyGraphStore,
8+
)
39

4-
__all__ = ["NeptuneAnalyticsGraphStore", "NeptuneDatabaseGraphStore"]
10+
__all__ = [
11+
"NeptuneAnalyticsGraphStore",
12+
"NeptuneDatabaseGraphStore",
13+
"NeptuneAnalyticsPropertyGraphStore",
14+
"NeptuneDatabasePropertyGraphStore",
15+
]

0 commit comments

Comments
 (0)