-
Notifications
You must be signed in to change notification settings - Fork 665
GH-3026: spatial index per graph and kryo serialization #3027
New issue
Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.
By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.
Already on GitHub? Sign in to your account
base: main
Are you sure you want to change the base?
Conversation
0829aa5
to
01bad5d
Compare
It should build now. Its about 90% done. One issue is, that @LorenzBuehmann used Right now I switched to JTS native WKB serialization which makes the build work normally (in principle, the geometry-serializer classname could be made part of a spatial index metadata header - such a header does not exist yet). I also added a little compatibility layer for the old index IO which used java serialization. So old index files should continue to load. Saving indexes will use the new serializer. |
The dependencies will need to be checked - kryo is 3-clause BSD so that has implications. (I haven't looked at the new dependencies, just kyro.) Apache Sedona has some entries in its LICENSE
These are a nuisance because they are not guaranteed to be maintained by dependency updates. 😞 |
About performance differences, @SimonBin found this article: https://kontinuation.one/posts/2022/12/improving-geometry-serde-performance-of-apache-sedona/ This suggests that Sedona's ShapeSerde is generally a lot faster. Not sure if the current implementation is perhaps already even one of the optimized variants. |
Some notes about the remaining issues I am working on to resolve. I am a bit worried that a hard dependency on sedona-spark may cause too many issues, so I am trying to decouple the geometry serde (serializer/deserializer) from the rest of the spatial-index serde. For this, I added a plain text JSON header field to the spatial index format with a field for the geometry serde. This field can be read and used to configure the remaining kryo serde. The index reader makes a lookup with Class.forName() and tries to create an instance via the default constructor. With this approach it would be possible to use the slower JTS-based geometry serialization for now and have a compatible upgrade path to a faster approach at a later point. Perhaps the geometrySerde header field is not needed, because I see that in Sedona the implementation is serializable and apparently their approach is to just read/write the geometry serde instance out using plain Java serialization during the kryo serialization process. So I need to check what is the best way, but something along these lines should work to support the same spatial-index serde with different geometry serdes in a compatible way. This snippet is an example for how the header could be set up: // SpatialIndexIoKryo.java
public static void writeToOutputStream(OutputStream os, SpatialIndexPerGraph index) {
// geometrySerde could later be switched to sedona's ShapeSerde.
// (Need to check whether I can readily reuse an interface
// from JTS/Sedona instead of my own GeometrySerdeAdapter)
GeometrySerdeAdapter geometrySerde = new GeometrySerdeAdapterJtsWkb();
JsonObject header = new JsonObject();
header.addProperty("type", "jena-spatial-index");
header.addProperty("version", "2.0.0");
header.addProperty("geometrySerde", geometrySerde.getClass().getName());
Kryo kryo = new Kryo();
JtsKryoRegistrator.registerClasses(kryo, geometrySerde);
try (Output output = new Output(os)) {
writeJson(output, header);
writeIndex(kryo, output, index);
output.flush();
} |
b0a9b8b
to
ae4ba13
Compare
/* This file is an adapted copy of org.locationtech.jts.index.strtree.IndexSerde. */ | ||
/* The change is, that geometry read/write is delegated to a GeometrySerdeAdapter instance. */ |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
This class is copy&adapt from sedona-common 1.7.0 (latest)- they actually used this package hack to access non-visible stuff. in the long run, jts should fix this design.
...c/main/java/org/apache/jena/geosparql/spatial/index/v2/serde/sedona/CustomGeometrySerde.java
Outdated
Show resolved
Hide resolved
...in/java/org/apache/jena/geosparql/spatial/index/v2/serde/sedona/CustomSpatialIndexSerde.java
Outdated
Show resolved
Hide resolved
So I had to copy and adapt a total of They all have an Apache V2 header - so it should be fine? |
The spatial index does not store any geometries - just their envelopes and feature nodes - so geometry serialization performance is not an issue in the first place 🤦♂️ - but good having talked about it 😀
|
52317c9
to
fcae47a
Compare
I now also added the benchmark that evaluates loading a spatial index from disk - so w.r.t. index loading during server startup, the kryo version is (still) significantly faster.
A geometry mix comprises 8 different geometry types (point, linestring, linearring, polygon, multipoint, multilinestring, multipolygon, geometrycollection). So the actual number of envelopes in the spatial index is |
16570ee
to
fe69be3
Compare
@afs What would be the best way to add a Fuseki module for re-building the index for selected graphs? In an ideal world, the spatial index would be synchronized with the dataset, but for the time being we created a little Fuseki-Mod, where one can select the graphs for which to rebuild the index. Back then we forked your This is what the spatial index rebuilder looks like. The html is currently a single self-contained file (ai generated) - so no extra frontend build is needed. |
6c202b9
to
164b9f7
Compare
Right, I'll look into cleaning up those warnings; especially I haven't checked the javadoc much yet. As for deprecation, I recall that adding forRemoval=true caused even more warnings, so I removed it again. As for the deprecation warnings related to SpatialIndexV1, I wanted to keep the implementation for compatibility and also comparison reasons, although it should be considered deprecated. Not sure what do best do there; for the time being perhaps disable deprecation warnings for the module (if possible) or remove the deprecations for now. |
No need to use (I find it a bit erratic as in javadoc across Eclispe and the maven javadoc plugin - it seems to be considered a different warning, sometimes) |
jena-geosparql/src/main/java/org/locationtech/jts/index/strtree/STRtreeSerializer.java
Outdated
Show resolved
Hide resolved
adc16e6
to
3aa8be2
Compare
Here are the Java warnings I am seeing in Eclipse: Eclipse Warnings
|
jena-fuseki-mod-geosparql is shaded. Making a dependency of jena-fuseki-server produces a lot (!!) of overlap warnings. Either:
|
1d1c564
to
eb50b4f
Compare
I realized that when All your reported warnings (and some more) should now be cleaned up - I updated my Eclipse error/warning level configuration accordingly. |
My intention was to go with 2 - but I had overlooked the shade configuration for the The bundle produced by the |
...geosparql/src/main/java/org/apache/jena/fuseki/mod/geosparql/SpatialIndexComputeService.java
Show resolved
Hide resolved
...geosparql/src/main/java/org/apache/jena/fuseki/mod/geosparql/SpatialIndexComputeService.java
Show resolved
Hide resolved
@@ -116,6 +116,7 @@ | |||
<ver.org.openjdk.jmh>1.37</ver.org.openjdk.jmh> | |||
|
|||
<!-- GeoSPARQL related --> | |||
<ver.kryo>4.0.3</ver.kryo> |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
This will need an entry in jena-fuseki2/apache-jena-fuseki/dist/LICENSE
(Anywhere Jena is actually shipping kryo needs to checked)
@@ -42,7 +42,7 @@ public static void init() { | |||
if ( initialized ) | |||
return ; | |||
synchronized (initLock) { | |||
if ( initialized ) { | |||
if ( initialized || System.getProperty("jena.geosparql.skip", "false").equalsIgnoreCase("true") ) { |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
That's going to need documenting!
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
@LorenzBuehmann Do you recall the intention of this line?
One reason I can think of is that that jena-geosparql takes long to initialize. IIRC due to setting up a derby database with SIS data. So any command line jar that uses jena-geosparql takes long to load.
The loading time of own java cli tool increased from 200ms to 1.000ms just by adding jena-geosparql.
So a future issue is to change parts of jena-geosparql's eager init to lazy init on first use.
From this persective, the feature to disable jena-geosparql could be removed from this PR and handled in a separate PR. I think the option would not be needed if the init time was improved.
I agree this is the way to go. |
Where does this leave That loses the command line argument processing (is the geosparql assembler sufficient?) - we could ask users about this. |
According to the jena-fuseki-geosparql documentation it is a bit of a jena-geo toolkit:
So I think in the long run there should be only one Fuseki, and the features of jena-fuseki-geosparql would be either moved to mods or separate cli tools. But before that happens I guess things just stay as they are(?) |
782f293
to
21b1b5a
Compare
Fair enough. Is the geosparql assembler sufficient? |
There is:
If that is all OK then, with those last items, this is good then from my side - this is good to go. |
Two aspects I came across that I should still address:
So I should be able to resolve this issues this weekend. |
GitHub issue resolved #3026
Pull request Description: Adds support for spatial index per graph and kryo serialization.
Updated
jena-geosparql
with spatial index-per-graph(STRtreePerGraph.java)
structure and kryo serialization.The upgrade should be backward compatible: If reading of a spatial index file fails then the old (slow) deserialization method will be attempted.
Added

jena-fuseki-mod-geosparql
module underjena-fuseki2
. This module comes with a simple HTML page where users can pick which graphs to update in the index:Added
jena-fuseki-mod-geosparql
tojena-fuseki-server
. This bundles the fmod (and geosparql) with Fuseki, but--modules=true
must still be set to enable it!GeoSPARQL
module are included.spatial-indexer fuseki module
and its UI are included. Note, that the test uses Selenium to click a couple of buttons in the browser which requires a local Chrome. Therefore the test is set to ignore by default.By submitting this pull request, I acknowledge that I am making a contribution to the Apache Software Foundation under the terms and conditions of the Contributor's Agreement.
See the Apache Jena "Contributing" guide.