The goal of this project is to design and implement a Data Warehouse (DW) and explore the use of NoSQL technologies to manage heterogeneous data. We use Neo4j and Cypher for graph-based data representation.
- Data integration and transformation from heterogeneous sources
- Data modeling in graph form in Neo4j
- Advanced querying with Cypher
- Visualization of relationships and data exploration
- Clustering analysis and Big Data management
- Neo4j: Graph database
- Cypher: Query language for Neo4j
- Python: For data extraction and processing
- Pandas: For handling tabular data
- Graph Data Science (GDS): For advanced graph analysis
📦 data-warehouse-project
┣ 📂 SQL # Queries for the Data Warehouse
┣ 📂 df_for_Neo4j # Relations and Nodes for Cypher
┣ 📜 README.md # Documentation of the project
┣ 📜 requirements.txt # Python requirements
┣ 📜 graph.cypher # Queries for cypher
┣ 📜 ml2.ipynb # Machine learning for k-means algo
┣ 📜 prep_bdd.ipynb
┣ 📜 schema.md # Star Schema for data integration
┣ 📜 viz2.ipynb # Data visualization
- Neo4j Desktop
- Python 3.8+
pip install -r requirements.txt
- Create a local database
- Set the credentials (neo4j / password)
- Enable the Graph Data Science (GDS) plugin
Run the data loading script:
python scripts/load_data.py
Access Neo4j Browser and run queries like:
MATCH (n) RETURN n LIMIT 10;
The Jupyter notebooks contain clustering analysis and graph visualizations. To run them:
jupyter notebook
- Nom 1 (@GuillaumeDeSaintEtienne)
- Nom 2 (@MaelGalliou)
- Nom 3 (@Sachafrft)
- Nom 4 (@emiliengodet)
This project is licensed under the MIT License. You are free to use and modify it as needed.
🚀 Happy project and enjoy data exploration !