This repository contains the source code, the configuration files, and the queries used in the experimental study presented in our paper Join Ordering of SPARQL Property Path Queries.
To quickly get started, run the following commands on one machine, it will install everything you need to reproduce our experimental results.
-
Clone and install the project
Details
We use conda to manage the project dependencies. If conda is not installed on your system, you can download it from their website.
git clone https://github.com/JulienDavat/Join-Ordering-of-SPARQL-Property-Path-Queries.git xp-eswc2023 cd xp-eswc2023 conda env create -f environment.yml conda activate xp
-
Install HDT
Details
In this project we use a custom version of HDT that need to be installed on your system.
git clone https://github.com/JulienDavat/hdt-bindings.git hdt cd hdt git clone [email protected]:rdfhdt/hdt-cpp.git cd hdt-cpp git checkout tags/v1.3.3 -b master cd .. python -m pip install .
-
Download HDT files
-
Install Virtuoso v7.2.7
Details
wget https://github.com/openlink/virtuoso-opensource/releases/download/v7.2.7/virtuoso-opensource-7.2.7.tar.gz tar -zxvf virtuoso-opensource-7.2.7.tar.gz cd virtuoso-opensource-7.2.7 ./configure make make install
The configuration file used in our experiments is available in the config directory. You just have to indicate the location of Virtuoso on your system. The location of Virtuoso must also be reported in the server.sh script. Finally, you need to add the bin directory of Virtuoso in your PATH variable.
If everything went well, you should be able to start Virtuoso with the following command:
bash server.sh start virtuoso
Virtuoso can be stopped using the same command:
bash server.sh stop virtuoso
-
Install BlazeGraph v2.1.6
Details
wget https://github.com/blazegraph/database/releases/download/BLAZEGRAPH_2_1_6_RC/bigdata.jar
The configuration file used in our experiments is available in the config directory. You just have to copy it in the same directory as the .jar file. The location of BlazeGraph must be reported in the server.sh script.
If everything went well, you should be able to start BlazeGraph with the following command:
bash server.sh start blazegraph
BlazeGraph can be stopped using the same command:
bash server.sh stop blazegraph
-
Download the WDBench dataset.
Details
The dataset can be downloaded from Figshare. If there is any problem, please refer to their official github repository.
-
Load data into Virtuoso
Details
The WDBench dataset can be loaded into Virtuoso using the following commands. You just have to indicate the location of the .nt file.
isql "EXEC=ld_dir('<your file here>', '*.nt', 'http://example.com/wdbench');" isql "EXEC=rdf_loader_run();" isql "EXEC=checkpoint;"
-
Load data into BlazeGraph
Details
The WDBench dataset can be loaded into BlazeGraph using the following command. You just have to indicate the location of the .nt file.
java -cp blazegraph.jar com.bigdata.rdf.store.DataLoader -defaultGraph http://example.com/wdbench blazegraph.properties <your file here>
Experiments are powered by snakemake, a scientific workflow management system in Python. To re-run our experiments just run the following commands:
# For Virtuoso
snakemake --configfile virtuoso.yaml -C runs=[1,2,3,4] timeout=900000 -c1
# For BlazeGraph
snakemake --configfile blazegraph.yaml -C runs=[1,2,3,4] timeout=900 -c1
The data generated by the two snakemake commands are available in the output directory. To visualize the data, you can use the provided jupyter notebook. You just have to run the following command:
jupyter notebook eswc2023.ipynb