Welcome to the resources for the Neo4j live session titled "Enhancing Text2Cypher with In-Context Learning and Fine-Tuning."
You can watch the live session recording here:
Text2Cypher generation is a crucial step in unlocking the full potential of graph databases. By bridging the gap between natural language and Cypher queries, we're making it easier to retrieve complex, interconnected data with precision. This session will guide you through the process of enhancing Text2Cypher models using in-context learning and fine-tuning techniques, enabling more accurate and efficient data retrieval.
- PPT Presentation: Enhancing Text2Cypher Presentation
- Colab Notebook: Text2Cypher Fine-Tuning Colab
The following datasets are provided to support the session:
-
dataset.csv
A preprocessed dataset originally sourced from Kaggle. Special thanks to Manish Kumar for the LinkedIn User Profiles dataset. -
text2cypher_questions.csv
Generated questions using Nemotron, based on the graph schema. -
raw_text2cypher.csv
Generated Cypher query language based on the graph schema, providing answers to each query from the previous step. -
detailed_text2cypher.csv
A table with additional columns (syntax_error
,timeout
,returns_results
) used to verify the correctness of the generated Cypher queries. -
final_text2cypher.csv
The final dataset used for fine-tuning. This dataset includes the following columns:question
,type
,cypher
,syntax_error
,timeout
,returns_results
.
Last but not least, if you want to use your fine-tuned model locally, you could also push it to Ollama. Make sure to save the quantized version, then follow the instructions in the command.txt
file.