Skip to content
New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

Dev to Staging #1210

Merged
merged 125 commits into from
Apr 1, 2025
Merged
Show file tree
Hide file tree
Changes from all commits
Commits
Show all changes
125 commits
Select commit Hold shift + click to select a range
26581db
Read only mode for unauthenticated users (#1046)
kartikpersistent Jan 30, 2025
449552d
langchain updates (#1048)
prakriti-solankey Jan 30, 2025
2d86c5c
testing script changed for better logging errors and results of vario…
kaustubh-darekar Jan 30, 2025
578efad
Deepseek models integration (#1051)
kaustubh-darekar Jan 30, 2025
339488a
fixed top-line of drop-area (#1049)
kartikpersistent Jan 30, 2025
1dbd902
Schema viz (#1035)
prakriti-solankey Jan 30, 2025
6f3f863
updated to new ndl minor version and fixed sources modal display for …
kartikpersistent Feb 3, 2025
38eb72e
Chunk size overlap config (#1059)
prakriti-solankey Feb 7, 2025
6e60361
fix-load-existing-schema (#1061)
dhiaaeddine16 Feb 10, 2025
228ab9b
added bug report feature request and format fixes
kartikpersistent Feb 11, 2025
fbd9d3e
configured dependenabot for python
kartikpersistent Feb 11, 2025
6893a26
configuration fix
kartikpersistent Feb 11, 2025
4b831df
Fixed the logging time issue
praveshkumar1988 Feb 11, 2025
d53ba43
Backend connection config (#1060)
prakriti-solankey Feb 11, 2025
71e013e
Unable to get the status of document node resolved due to leading spa…
kaustubh-darekar Feb 11, 2025
d9a89f8
updated dependency
kartikpersistent Feb 11, 2025
2c8fe2a
Merge branch 'dev' of https://github.com/neo4j-labs/llm-graph-builder…
kartikpersistent Feb 11, 2025
192a1bc
always show schema button
prakriti-solankey Feb 11, 2025
e8e576f
always show schema button
prakriti-solankey Feb 11, 2025
ed69115
uri
prakriti-solankey Feb 11, 2025
3624feb
Update README.md
kartikpersistent Feb 11, 2025
738eecc
Update README.md
kartikpersistent Feb 11, 2025
69a1003
Update README.md
kartikpersistent Feb 12, 2025
4c48124
Update README.md
kartikpersistent Feb 12, 2025
bd917ae
Fixed the create community issue for backend connection configuration
praveshkumar1988 Feb 12, 2025
8b8368b
removal of unused code
prakriti-solankey Feb 12, 2025
5591741
Support added for gpt 3o mini & gemini flash 2.0 in dev (#1069)
kaustubh-darekar Feb 12, 2025
6762367
Cancelling the API's on Unmounting phase (#1068)
kartikpersistent Feb 13, 2025
7f075d5
Merge branch 'staging' into dev
prakriti-solankey Feb 13, 2025
6a6c82c
removed unused neo4j-driver
kartikpersistent Feb 13, 2025
f198ca5
added auth0 in the frame src
kartikpersistent Feb 13, 2025
a72f3cf
message change
prakriti-solankey Feb 14, 2025
43d3bed
Update docker-compose.yml
kartikpersistent Feb 16, 2025
7bd5dd3
Bump tailwindcss from 3.4.9 to 4.0.6 in /frontend (#1091)
dependabot[bot] Feb 17, 2025
c198260
message check
prakriti-solankey Feb 17, 2025
a0cd597
V0.7.1 documentation updates (#1094)
kartikpersistent Feb 17, 2025
1cc723c
Merge branch 'staging' into dev
kartikpersistent Feb 17, 2025
db20e29
Merge branch 'staging' into dev
prakriti-solankey Feb 17, 2025
896bdee
Bump react-dropzone from 14.2.3 to 14.3.5 in /frontend (#1084)
dependabot[bot] Feb 18, 2025
8938cd1
Bump @typescript-eslint/eslint-plugin from 6.21.0 to 7.0.0 in /fronte…
dependabot[bot] Feb 18, 2025
4ea5305
Bump eslint-plugin-react-hooks from 4.6.2 to 5.1.0 in /frontend (#1082)
dependabot[bot] Feb 18, 2025
74e8bdc
Bump typescript from 5.5.4 to 5.7.3 in /frontend (#1081)
dependabot[bot] Feb 18, 2025
5444e6b
fix-additional-instructions (#1089)
dhiaaeddine16 Feb 18, 2025
041837d
V0.7.1 minor fixes (#1097)
praveshkumar1988 Feb 19, 2025
e81655d
remove try except from llm.py
praveshkumar1988 Feb 19, 2025
78f1015
Remove example.env from main folder (#1099)
praveshkumar1988 Feb 19, 2025
fcb6bcc
moved to taulwind 3
kartikpersistent Feb 20, 2025
5c2029c
tailwind 4 migration
kartikpersistent Feb 20, 2025
36f2548
format fixes
kartikpersistent Feb 20, 2025
7e7a2c3
Source list api convert to post (#1102)
kartikpersistent Feb 20, 2025
7186ac4
Merge branch 'staging' into dev
prakriti-solankey Feb 20, 2025
7f8b8c9
height issue
prakriti-solankey Feb 20, 2025
cf92222
fix: Profile CSS Fix
kartikpersistent Feb 20, 2025
2274706
fix: display flex issue fix
kartikpersistent Feb 20, 2025
28780c5
Merge branch 'staging' into dev
prakriti-solankey Feb 21, 2025
c1b7b4d
Update dependabot.yml (#1122)
kaustubh-darekar Feb 24, 2025
5ca76aa
added automated linting and formatting through husky hooks
kartikpersistent Feb 24, 2025
0cf3f32
renamed the files
kartikpersistent Feb 24, 2025
2c9d1d6
husky setup fix
kartikpersistent Feb 24, 2025
381dc16
added permission
kartikpersistent Feb 24, 2025
97f0fd2
test commiy
kartikpersistent Feb 24, 2025
e4f1e91
type checking through husky hooks
kartikpersistent Feb 24, 2025
cc158d1
something bad code
kartikpersistent Feb 24, 2025
b88c7df
some bad code
kartikpersistent Feb 24, 2025
17ff72c
some bad code
kartikpersistent Feb 24, 2025
2f3f164
testing pre-commit code
kartikpersistent Feb 24, 2025
36c9c53
testing pre-commit code
kartikpersistent Feb 24, 2025
2bb53b1
lint setup on staged commits
kartikpersistent Feb 24, 2025
53da28c
test commt
kartikpersistent Feb 24, 2025
06d9b4f
test commit with errors
kartikpersistent Feb 24, 2025
2d084da
fix
kartikpersistent Feb 24, 2025
0b46eba
added pypandoc-binary package for OSError: No pandoc was found during…
kaustubh-darekar Feb 24, 2025
269d76b
added document plus icon
kartikpersistent Feb 25, 2025
f79feb8
Bump axios from 1.7.3 to 1.7.9 in /frontend (#1113)
dependabot[bot] Feb 25, 2025
ddb5852
Bump eslint-plugin-react-refresh from 0.4.9 to 0.4.19 in /frontend (#…
dependabot[bot] Feb 25, 2025
fcae55a
Bump postcss from 8.4.41 to 8.5.3 in /frontend (#1114)
dependabot[bot] Feb 25, 2025
69db442
Bump react-icons from 5.2.1 to 5.5.0 in /frontend (#1115)
dependabot[bot] Feb 25, 2025
b761f23
different url web page having same title issue fixed (#1110)
kaustubh-darekar Feb 25, 2025
3a222cc
Text file encoding issue (#1126)
kaustubh-darekar Feb 25, 2025
ede3095
Resolved UnicodeDecodeError issue for files having other than utf-8 e…
kaustubh-darekar Feb 26, 2025
370ab9e
Sanitizing additional instruction (#1130)
kaustubh-darekar Feb 26, 2025
ebbabd3
resolved UnboundLocalError: local variable 'graphDb_data_Access' refe…
kaustubh-darekar Feb 26, 2025
aca4f81
connection not there message for data resources (#1131)
prakriti-solankey Feb 26, 2025
455269b
dockerfile updates and utils functions change
prakriti-solankey Feb 26, 2025
e81aa00
fix: readonly issue fix
kartikpersistent Feb 27, 2025
a1ed635
Resolved uploaded file extraction failing on deployed version (#1136)
kaustubh-darekar Feb 27, 2025
cf11494
UI fixes v0.7.2 (#1138)
kartikpersistent Mar 3, 2025
979434d
Update BreakDownPopOver.tsx
kartikpersistent Mar 3, 2025
41b0370
chunk_count_val
prakriti-solankey Mar 3, 2025
bfe127f
type error
prakriti-solankey Mar 3, 2025
93ff881
spell fixes and protected route fixes
kartikpersistent Mar 3, 2025
a1a998e
top entities not found - bug resolved (#1150)
kaustubh-darekar Mar 4, 2025
506dfb0
limiting content fetching to current wikipedia page (#1151)
kaustubh-darekar Mar 4, 2025
3c8d669
added the link for login redirectig
kartikpersistent Mar 4, 2025
a8fb41a
removed loading statw
kartikpersistent Mar 5, 2025
de69dbd
added the padding and changed the message
kartikpersistent Mar 5, 2025
1ee0112
Bump re-resizable from 6.9.17 to 6.11.2 in /frontend (#1149)
dependabot[bot] Mar 5, 2025
e26a2e2
Bump eslint-plugin-react from 7.35.0 to 7.37.4 in /frontend (#1148)
dependabot[bot] Mar 5, 2025
2fe68f7
Bump @types/node from 20.14.14 to 22.13.9 in /frontend (#1152)
dependabot[bot] Mar 5, 2025
6f1e96d
Bump eslint-config-prettier from 8.10.0 to 10.0.2 in /frontend (#1146)
dependabot[bot] Mar 5, 2025
9e427c8
Bump react-dropzone from 14.3.5 to 14.3.8 in /frontend (#1145)
dependabot[bot] Mar 5, 2025
0090ae1
Update dependabot.yml
kartikpersistent Mar 5, 2025
738bc5b
Update the query to check DB is gds version (#1153)
praveshkumar1988 Mar 6, 2025
36c3fa4
Entity details shown for entity mode (#1154)
kaustubh-darekar Mar 6, 2025
dc0b83c
Merge branch 'staging' into dev
kartikpersistent Mar 6, 2025
e1fa2d5
bracket missing
prakriti-solankey Mar 6, 2025
ab75932
fix: auth 0 fix
kartikpersistent Mar 6, 2025
5f39980
Merge branch 'staging' into dev
kartikpersistent Mar 6, 2025
1014119
fixes (#1170)
kartikpersistent Mar 10, 2025
0f9c9a2
Bump @neo4j-nvl/react from 0.3.6 to 0.3.7 in /frontend (#1163)
dependabot[bot] Mar 17, 2025
d74b5ea
Bump @tailwindcss/postcss from 4.0.7 to 4.0.12 in /frontend (#1162)
dependabot[bot] Mar 17, 2025
11003a9
Bump prettier from 2.8.8 to 3.5.3 in /frontend (#1161)
dependabot[bot] Mar 17, 2025
cb907be
Bump @types/node from 22.13.9 to 22.13.10 in /frontend (#1160)
dependabot[bot] Mar 17, 2025
c67abd5
Bump axios from 1.7.9 to 1.8.2 in /frontend (#1159)
dependabot[bot] Mar 17, 2025
927c372
gitignore changes
kartikpersistent Mar 17, 2025
8604b5b
border missing for graph
prakriti-solankey Mar 17, 2025
7bde2ab
openai 4.5 and claude 3.7 added (#1181)
kaustubh-darekar Mar 18, 2025
59aebba
Handled deadlock errors in executing cypher query (#1187)
kaustubh-darekar Mar 19, 2025
7eb344e
Updating dependencies (#1189)
kaustubh-darekar Mar 19, 2025
93765a3
updating node & rel count in between extraction process (#1191)
kaustubh-darekar Mar 19, 2025
4dd1299
fix: Database name not being passed
kartikpersistent Mar 20, 2025
14e81c7
added generic type for queue
kartikpersistent Mar 21, 2025
7273d88
Fix : default value of function param
praveshkumar1988 Mar 24, 2025
4db55ab
log the info only when last chunk uploaded and merge the file
praveshkumar1988 Mar 24, 2025
81e7255
Product tour v1 (#1186)
kartikpersistent Mar 25, 2025
File filter

Filter by extension

Filter by extension


Conversations
Failed to load comments.
Loading
Jump to
Jump to file
Failed to load files.
Loading
Diff view
Diff view
3 changes: 2 additions & 1 deletion .gitignore
Original file line number Diff line number Diff line change
Expand Up @@ -172,4 +172,5 @@ google-cloud-cli-linux-x86_64.tar.gz
.vennv
newenv
files

startupbackend.sh
startupfrontend.sh
86 changes: 43 additions & 43 deletions backend/requirements.txt
Original file line number Diff line number Diff line change
@@ -1,63 +1,63 @@
asyncio==3.4.3
boto3==1.36.2
botocore==1.36.2
certifi==2024.8.30
fastapi==0.115.6
boto3==1.37.11
botocore==1.37.11
certifi==2025.1.31
fastapi==0.115.11
fastapi-health==0.4.0
google-api-core==2.24.0
google-auth==2.37.0
google-api-core==2.24.2
google-auth==2.38.0
google_auth_oauthlib==1.2.1
google-cloud-core==2.4.1
json-repair==0.30.2
google-cloud-core==2.4.3
json-repair==0.30.3
pip-install==1.3.5
langchain==0.3.15
langchain-aws==0.2.11
langchain-anthropic==0.3.3
langchain-fireworks==0.2.6
langchain-community==0.3.15
langchain-core==0.3.31
langchain==0.3.20
langchain-aws==0.2.15
langchain-anthropic==0.3.9
langchain-fireworks==0.2.7
langchain-community==0.3.19
langchain-core==0.3.45
langchain-experimental==0.3.4
langchain-google-vertexai==2.0.11
langchain-groq==0.2.3
langchain-openai==0.3.1
langchain-text-splitters==0.3.5
langchain-google-vertexai==2.0.15
langchain-groq==0.2.5
langchain-openai==0.3.8
langchain-text-splitters==0.3.6
langchain-huggingface==0.1.2
langdetect==1.0.9
langsmith==0.2.11
langsmith==0.3.13
langserve==0.3.1
neo4j-rust-ext
nltk==3.9.1
openai==1.59.9
opencv-python==4.10.0.84
psutil==6.1.0
pydantic==2.9.2
openai==1.66.2
opencv-python==4.11.0.86
psutil==7.0.0
pydantic==2.10.6
python-dotenv==1.0.1
python-magic==0.4.27
PyPDF2==3.0.1
PyMuPDF==1.24.14
starlette==0.41.3
sse-starlette==2.1.3
PyMuPDF==1.25.3
starlette==0.46.1
sse-starlette==2.2.1
starlette-session==0.4.3
tqdm==4.67.1
unstructured[all-docs]
unstructured==0.16.11
unstructured-client==0.28.1
unstructured-inference==0.8.1
urllib3==2.2.2
uvicorn==0.32.1
unstructured==0.16.25
unstructured-client==0.31.1
unstructured-inference==0.8.9
urllib3==2.3.0
uvicorn==0.34.0
gunicorn==23.0.0
wikipedia==1.4.0
wrapt==1.16.0
yarl==1.9.4
youtube-transcript-api==0.6.3
zipp==3.17.0
sentence-transformers==3.3.1
google-cloud-logging==3.11.3
pypandoc==1.13
graphdatascience==1.12
Secweb==1.11.0
ragas==0.2.11
wrapt==1.17.2
yarl==1.18.3
youtube-transcript-api==1.0.0
zipp==3.21.0
sentence-transformers==3.4.1
google-cloud-logging==3.11.4
pypandoc==1.15
graphdatascience==1.14
Secweb==1.18.1
ragas==0.2.14
rouge_score==0.1.2
langchain-neo4j==0.3.0
langchain-neo4j==0.4.0
pypandoc-binary==1.15
chardet==5.2.0
chardet==5.2.0
9 changes: 5 additions & 4 deletions backend/score.py
Original file line number Diff line number Diff line change
Expand Up @@ -576,9 +576,10 @@ async def upload_large_file_into_chunks(file:UploadFile = File(...), chunkNumber
result = await asyncio.to_thread(upload_file, graph, model, file, chunkNumber, totalChunks, originalname, uri, CHUNK_DIR, MERGED_DIR)
end = time.time()
elapsed_time = end - start
json_obj = {'api_name':'upload','db_url':uri,'userName':userName, 'database':database, 'chunkNumber':chunkNumber,'totalChunks':totalChunks,
'original_file_name':originalname,'model':model, 'logging_time': formatted_time(datetime.now(timezone.utc)), 'elapsed_api_time':f'{elapsed_time:.2f}','email':email}
logger.log_struct(json_obj, "INFO")
if int(chunkNumber) == int(totalChunks):
json_obj = {'api_name':'upload','db_url':uri,'userName':userName, 'database':database, 'chunkNumber':chunkNumber,'totalChunks':totalChunks,
'original_file_name':originalname,'model':model, 'logging_time': formatted_time(datetime.now(timezone.utc)), 'elapsed_api_time':f'{elapsed_time:.2f}','email':email}
logger.log_struct(json_obj, "INFO")
if int(chunkNumber) == int(totalChunks):
return create_api_response('Success',data=result, message='Source Node Created Successfully')
else:
Expand Down Expand Up @@ -894,7 +895,7 @@ async def retry_processing(uri=Form(None), userName=Form(None), password=Form(No
try:
start = time.time()
graph = create_graph_database_connection(uri, userName, password, database)
chunks = graph.query(QUERY_TO_GET_CHUNKS, params={"filename":file_name})
chunks = execute_graph_query(graph,QUERY_TO_GET_CHUNKS,params={"filename":file_name})
end = time.time()
elapsed_time = end - start
json_obj = {'api_name':'retry_processing', 'db_url':uri, 'userName':userName, 'database':database, 'file_name':file_name,'retry_condition':retry_condition,
Expand Down
4 changes: 2 additions & 2 deletions backend/src/QA_integration.py
Original file line number Diff line number Diff line change
Expand Up @@ -380,7 +380,7 @@ def create_retriever(neo_db, document_names, chat_mode_settings,search_k, score_
retriever = neo_db.as_retriever(
search_type="similarity_score_threshold",
search_kwargs={
'k': search_k,
'top_k': search_k,
'effective_search_ratio': ef_ratio,
'score_threshold': score_threshold,
'filter': {'fileName': {'$in': document_names}}
Expand All @@ -390,7 +390,7 @@ def create_retriever(neo_db, document_names, chat_mode_settings,search_k, score_
else:
retriever = neo_db.as_retriever(
search_type="similarity_score_threshold",
search_kwargs={'k': search_k,'effective_search_ratio': ef_ratio, 'score_threshold': score_threshold}
search_kwargs={'top_k': search_k,'effective_search_ratio': ef_ratio, 'score_threshold': score_threshold}
)
logging.info(f"Successfully created retriever with search_k={search_k}, score_threshold={score_threshold}")
return retriever
Expand Down
7 changes: 5 additions & 2 deletions backend/src/document_sources/youtube.py
Original file line number Diff line number Diff line change
@@ -1,6 +1,7 @@
from langchain.docstore.document import Document
from src.shared.llm_graph_builder_exception import LLMGraphBuilderException
from youtube_transcript_api import YouTubeTranscriptApi
from youtube_transcript_api.proxies import GenericProxyConfig
import logging
from urllib.parse import urlparse,parse_qs
from difflib import SequenceMatcher
Expand All @@ -12,8 +13,10 @@
def get_youtube_transcript(youtube_id):
try:
proxy = os.environ.get("YOUTUBE_TRANSCRIPT_PROXY")
proxies = { 'https': proxy }
transcript_pieces = YouTubeTranscriptApi.get_transcript(youtube_id, proxies = proxies)
proxy_config = GenericProxyConfig(http_url=proxy, https_url=proxy) if proxy else None
youtube_api = YouTubeTranscriptApi(proxy_config=proxy_config)
transcript_pieces = youtube_api.fetch(youtube_id, preserve_formatting=True)
transcript_pieces = transcript_pieces.to_raw_data()
return transcript_pieces
except Exception as e:
message = f"Youtube transcript is not available for youtube Id: {youtube_id}"
Expand Down
20 changes: 17 additions & 3 deletions backend/src/graphDB_dataAccess.py
Original file line number Diff line number Diff line change
@@ -1,5 +1,7 @@
import logging
import os
import time
from neo4j.exceptions import TransientError
from langchain_neo4j import Neo4jGraph
from src.shared.common_fn import create_gcs_bucket_folder_name_hashed, delete_uploaded_local_file, load_embedding_model
from src.document_sources.gcs_bucket import delete_file_from_gcs
Expand All @@ -16,7 +18,7 @@ class graphDBdataAccess:
def __init__(self, graph: Neo4jGraph):
self.graph = graph

def update_exception_db(self, file_name, exp_msg, retry_condition):
def update_exception_db(self, file_name, exp_msg, retry_condition=None):
try:
job_status = "Failed"
result = self.get_current_status_document_node(file_name)
Expand Down Expand Up @@ -254,8 +256,20 @@ def connection_check_and_get_vector_dimensions(self,database):
else:
return {'message':"Connection Successful","gds_status": gds_status,"write_access":write_access}

def execute_query(self, query, param=None):
return self.graph.query(query, param)
def execute_query(self, query, param=None,max_retries=3, delay=2):
retries = 0
while retries < max_retries:
try:
return self.graph.query(query, param)
except TransientError as e:
if "DeadlockDetected" in str(e):
retries += 1
logging.info(f"Deadlock detected. Retrying {retries}/{max_retries} in {delay} seconds...")
time.sleep(delay) # Wait before retrying
else:
raise
logging.error("Failed to execute query after maximum retries due to persistent deadlocks.")
raise RuntimeError("Query execution failed after multiple retries due to deadlock.")

def get_current_status_document_node(self, file_name):
query = """
Expand Down
29 changes: 9 additions & 20 deletions backend/src/main.py
Original file line number Diff line number Diff line change
Expand Up @@ -400,7 +400,7 @@ async def processing_source(uri, userName, password, database, model, file_name,
obj_source_node.processing_time = processed_time
obj_source_node.processed_chunk = select_chunks_upto+select_chunks_with_retry
if retry_condition == START_FROM_BEGINNING:
result = graph.query(QUERY_TO_GET_NODES_AND_RELATIONS_OF_A_DOCUMENT, params={"filename":file_name})
result = execute_graph_query(graph,QUERY_TO_GET_NODES_AND_RELATIONS_OF_A_DOCUMENT, params={"filename":file_name})
obj_source_node.node_count = result[0]['nodes']
obj_source_node.relationship_count = result[0]['rels']
else:
Expand Down Expand Up @@ -503,21 +503,10 @@ async def processing_chunks(chunkId_chunkDoc_list,graph,uri, userName, password,
logging.info(f'Time taken to create relationship between chunk and entities: {elapsed_relationship:.2f} seconds')
latency_processing_chunk["relationship_between_chunk_entity"] = f'{elapsed_relationship:.2f}'

distinct_nodes = set()
relations = []
for graph_document in graph_documents:
#get distinct nodes
for node in graph_document.nodes:
node_id = node.id
node_type= node.type
if (node_id, node_type) not in distinct_nodes:
distinct_nodes.add((node_id, node_type))
#get all relations
for relation in graph_document.relationships:
relations.append(relation.type)

node_count += len(distinct_nodes)
rel_count += len(relations)
graphDb_data_Access = graphDBdataAccess(graph)
count_response = graphDb_data_Access.update_node_relationship_count(file_name)
node_count = count_response[file_name].get('nodeCount',"0")
rel_count = count_response[file_name].get('relationshipCount',"0")
return node_count,rel_count,latency_processing_chunk

def get_chunkId_chunkDoc_list(graph, file_name, pages, token_chunk_size, chunk_overlap, retry_condition):
Expand All @@ -539,7 +528,7 @@ def get_chunkId_chunkDoc_list(graph, file_name, pages, token_chunk_size, chunk_o

else:
chunkId_chunkDoc_list=[]
chunks = graph.query(QUERY_TO_GET_CHUNKS, params={"filename":file_name})
chunks = execute_graph_query(graph,QUERY_TO_GET_CHUNKS, params={"filename":file_name})

if chunks[0]['text'] is None or chunks[0]['text']=="" or not chunks :
raise LLMGraphBuilderException(f"Chunks are not created for {file_name}. Please re-upload file and try again.")
Expand All @@ -550,13 +539,13 @@ def get_chunkId_chunkDoc_list(graph, file_name, pages, token_chunk_size, chunk_o

if retry_condition == START_FROM_LAST_PROCESSED_POSITION:
logging.info(f"Retry : start_from_last_processed_position")
starting_chunk = graph.query(QUERY_TO_GET_LAST_PROCESSED_CHUNK_POSITION, params={"filename":file_name})
starting_chunk = execute_graph_query(graph,QUERY_TO_GET_LAST_PROCESSED_CHUNK_POSITION, params={"filename":file_name})

if starting_chunk and starting_chunk[0]["position"] < len(chunkId_chunkDoc_list):
return len(chunks), chunkId_chunkDoc_list[starting_chunk[0]["position"] - 1:]

elif starting_chunk and starting_chunk[0]["position"] == len(chunkId_chunkDoc_list):
starting_chunk = graph.query(QUERY_TO_GET_LAST_PROCESSED_CHUNK_WITHOUT_ENTITY, params={"filename":file_name})
starting_chunk = execute_graph_query(graph,QUERY_TO_GET_LAST_PROCESSED_CHUNK_WITHOUT_ENTITY, params={"filename":file_name})
return len(chunks), chunkId_chunkDoc_list[starting_chunk[0]["position"] - 1:]

else:
Expand Down Expand Up @@ -741,7 +730,7 @@ def set_status_retry(graph, file_name, retry_condition):
if retry_condition == DELETE_ENTITIES_AND_START_FROM_BEGINNING or retry_condition == START_FROM_BEGINNING:
obj_source_node.processed_chunk=0
if retry_condition == DELETE_ENTITIES_AND_START_FROM_BEGINNING:
graph.query(QUERY_TO_DELETE_EXISTING_ENTITIES, params={"filename":file_name})
execute_graph_query(graph,QUERY_TO_DELETE_EXISTING_ENTITIES, params={"filename":file_name})
obj_source_node.node_count=0
obj_source_node.relationship_count=0
logging.info(obj_source_node)
Expand Down
19 changes: 9 additions & 10 deletions backend/src/make_relationships.py
Original file line number Diff line number Diff line change
@@ -1,6 +1,6 @@
from langchain_neo4j import Neo4jGraph
from langchain.docstore.document import Document
from src.shared.common_fn import load_embedding_model
from src.shared.common_fn import load_embedding_model,execute_graph_query
import logging
from typing import List
import os
Expand Down Expand Up @@ -33,7 +33,7 @@ def merge_relationship_between_chunk_and_entites(graph: Neo4jGraph, graph_docume
CALL apoc.merge.node([data.node_type], {id: data.node_id}) YIELD node AS n
MERGE (c)-[:HAS_ENTITY]->(n)
"""
graph.query(unwind_query, params={"batch_data": batch_data})
execute_graph_query(graph,unwind_query, params={"batch_data": batch_data})


def create_chunk_embeddings(graph, chunkId_chunkDoc_list, file_name):
Expand All @@ -59,7 +59,7 @@ def create_chunk_embeddings(graph, chunkId_chunkDoc_list, file_name):
SET c.embedding = row.embeddings
MERGE (c)-[:PART_OF]->(d)
"""
graph.query(query_to_create_embedding, params={"fileName":file_name, "data":data_for_query})
execute_graph_query(graph,query_to_create_embedding, params={"fileName":file_name, "data":data_for_query})

def create_relation_between_chunks(graph, file_name, chunks: List[Document])->list:
logging.info("creating FIRST_CHUNK and NEXT_CHUNK relationships between chunks")
Expand Down Expand Up @@ -127,7 +127,7 @@ def create_relation_between_chunks(graph, file_name, chunks: List[Document])->li
MATCH (d:Document {fileName: data.f_name})
MERGE (c)-[:PART_OF]->(d)
"""
graph.query(query_to_create_chunk_and_PART_OF_relation, params={"batch_data": batch_data})
execute_graph_query(graph,query_to_create_chunk_and_PART_OF_relation, params={"batch_data": batch_data})

query_to_create_FIRST_relation = """
UNWIND $relationships AS relationship
Expand All @@ -136,7 +136,7 @@ def create_relation_between_chunks(graph, file_name, chunks: List[Document])->li
FOREACH(r IN CASE WHEN relationship.type = 'FIRST_CHUNK' THEN [1] ELSE [] END |
MERGE (d)-[:FIRST_CHUNK]->(c))
"""
graph.query(query_to_create_FIRST_relation, params={"f_name": file_name, "relationships": relationships})
execute_graph_query(graph,query_to_create_FIRST_relation, params={"f_name": file_name, "relationships": relationships})

query_to_create_NEXT_CHUNK_relation = """
UNWIND $relationships AS relationship
Expand All @@ -145,17 +145,16 @@ def create_relation_between_chunks(graph, file_name, chunks: List[Document])->li
MATCH (pc:Chunk {id: relationship.previous_chunk_id})
FOREACH(r IN CASE WHEN relationship.type = 'NEXT_CHUNK' THEN [1] ELSE [] END |
MERGE (c)<-[:NEXT_CHUNK]-(pc))
"""
graph.query(query_to_create_NEXT_CHUNK_relation, params={"relationships": relationships})

"""
execute_graph_query(graph,query_to_create_NEXT_CHUNK_relation, params={"relationships": relationships})
return lst_chunks_including_hash


def create_chunk_vector_index(graph):
start_time = time.time()
try:
vector_index = graph.query("SHOW INDEXES YIELD * WHERE labelsOrTypes = ['Chunk'] and type = 'VECTOR' AND name = 'vector' return options")

vector_index_query = "SHOW INDEXES YIELD * WHERE labelsOrTypes = ['Chunk'] and type = 'VECTOR' AND name = 'vector' return options"
vector_index = execute_graph_query(graph,vector_index_query)
if not vector_index:
vector_store = Neo4jVector(embedding=EMBEDDING_FUNCTION,
graph=graph,
Expand Down
14 changes: 7 additions & 7 deletions backend/src/post_processing.py
Original file line number Diff line number Diff line change
Expand Up @@ -4,7 +4,7 @@
from langchain_neo4j import Neo4jGraph
import os
from src.graph_query import get_graphDB_driver
from src.shared.common_fn import load_embedding_model
from src.shared.common_fn import load_embedding_model,execute_graph_query
from langchain_core.output_parsers import JsonOutputParser
from langchain_core.prompts import ChatPromptTemplate
from src.shared.constants import GRAPH_CLEANUP_PROMPT
Expand Down Expand Up @@ -179,8 +179,8 @@ def fetch_entities_for_embedding(graph):
MATCH (e)
WHERE NOT (e:Chunk OR e:Document OR e:`__Community__`) AND e.embedding IS NULL AND e.id IS NOT NULL
RETURN elementId(e) AS elementId, e.id + " " + coalesce(e.description, "") AS text
"""
result = graph.query(query)
"""
result = execute_graph_query(graph,query)
return [{"elementId": record["elementId"], "text": record["text"]} for record in result]

def update_embeddings(rows, graph):
Expand All @@ -194,7 +194,7 @@ def update_embeddings(rows, graph):
MATCH (e) WHERE elementId(e) = row.elementId
CALL db.create.setNodeVectorProperty(e, "embedding", row.embedding)
"""
return graph.query(query,params={'rows':rows})
return execute_graph_query(graph,query,params={'rows':rows})

def graph_schema_consolidation(graph):
graphDb_data_Access = graphDBdataAccess(graph)
Expand Down Expand Up @@ -223,14 +223,14 @@ def graph_schema_consolidation(graph):
SET n:`{new_label}`
REMOVE n:`{old_label}`
"""
graph.query(query)
execute_graph_query(graph,query)

for old_label, new_label in relation_mapping.items():
query = f"""
MATCH (n)-[r:`{old_label}`]->(m)
CREATE (n)-[r2:`{new_label}`]->(m)
DELETE r
"""
graph.query(query)
execute_graph_query(graph,query)

return None
Loading