-
Notifications
You must be signed in to change notification settings - Fork 205
New issue
Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.
By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.
Already on GitHub? Sign in to your account
Is the performance of boolean operation algorithms good enough? #822
Comments
@erikpols We're happy to improve the performance of our impls. The bbox. heuristic has been discussed in #649 before; we could discuss it there. Reg. retreiving the intersection: it is not available here, but refer #620 for other options. PRs / interest in impl. either of these is very welcome; please revive the issue, and we can plan it out. |
ok ty. Those links are super helpful to me, I wasn't aware of geo-clipper and it makes all the difference. I understand its a c++ implementation but for me that's ok. But just FYI it was a 15-30x difference |
@erikpols Could you share a minimal repo. of your bench-marking code? Would be very helpful when we address the bbox. issue. |
@rmanoka The easiest thing would be to benchmark and test |
Yeah, we have some benchmarks in |
I prototyped an integration of rtree into geo here to see what kind of performance wins we might get: A good benchmark suite to compare against would be great. Lacking good benchmarks, here are some mediocre ones from benches/relate.rs : relate overlapping 50-point polygons:
disjoint polygons
relate norway (8854 points) with a rotated norway:
relate norway (8854 points) with an offset (but still overlapping) norway:
jts test suite:Interestingly, benching against the entire JTS test suite takes a pretty big hit. I haven't dug into why yet. It does have an outsized emphasis on pathological geometries, so maybe that's responsible.
|
Interesting that you're seeing at most a ~99 % speedup (which to be clear is great), vs a purported 15x to 30x speedup from a JS M-R implementation. @erikpols it would be great to get a look at your test suite if at all possible… |
Yeah, the 99% reduction (100x speedup) is pretty wild. I'd love it if someone can scrutinize it, or at least run the benches themselves against |
Hi guys,
I prepared a repo with a minimal example of both approaches. you can find it here: the data consists of a set of cadastre plots that is intersected with a set of regulatory zones. The data is retrieved as FeatureCollections, and then first converted to vectors of Polygons, which are then input for both algorithms. results for geo results for geo_clipping Interestingly enough the results are not the same. Im guessing this has to do with the multiply parameter that clipper needs (clipper works with integers so all data needs to be multiplied by a parameter first) |
I've clarified the benchmark a little bit to show the output of geo-clipper vs geo here: https://github.com/erikpols/vellum-public/compare/master...michaelkirk:mkirk/clarify-bench?expand=1
So initially, I'm seeing that geo is 5x-6x slower. I've improved the situation a bit by doing a bbox check to short-circuit the intersection computation here:
With that, we're better, but still about 2x as slow as the clipper implementation. It's worth pointing out that geo vs. geo-clipper implementations in your example are doing two pretty different things:
This might explain the discrepancy in the results ( One question, that I haven't dug into yet, is if geo-clipper is using robust operations like we do in geo (which also might explain the discrepancy in results and perf) In any case, it sounds like ultimately you're looking to generate the actual shape of the intersection, not just check for the existence, right? AFAIK geo-booleanops is the only option for that in pure rust. I've shown that integration here: https://github.com/erikpols/vellum-public/compare/master...michaelkirk:mkirk/geo-booleanop?expand=1 It's similarly 2x slower than geo-clipper.
|
Oh also, I've turned this dataset into a benchmark for the geo repo in #828. Thanks for the test cases! Hopefully we'll continue to find ways to speed it up. |
No problem happy to help; BTW im made a bit of a classical error; I wondered why I got 30-40x, and you 5-6x. But I ran the tests in debug mode. No big implications though. Fixed below. I included the bbox tests as well in the repo, and also for the geo-clipper approach for good measures. Interestingly enough, including the bbox test with clipper also leads to an improvement. This might be explained by the fact that doing the bbox in advance saves having to do the conversion to integers for those polygons, which is what geo-clipping does. However, the downside of course is that the bbox comparison is now done twice.
Correct, I also need the actual intersection itself, so that is a benefit of clipper for me. Knowing the dataset, the different intersection results don't surprise me. Some of that data has shared lines between the zones and the plots. So small discrepancies in the input data (like the multiplier I mentioned introduces) will result in small differences in intersection result. To fully negate this for the test, it might be possible to multiply the datasets beforehand and convert them to integers, and only then feeding them to both. That should lead to equal results. I'm getting the idea that clipper is some kind of standard library/ approach that seems to be ported to several languages 'as is'. Might be an idea to do that as well? The typescript version is fairly easy to read. |
All other things being equal, (which is tricky, see below) integer operations will be faster than floating point on x86_64 (and probably on aarch64) in many cases, although this is an extremely general statement; pipeline length, cache, and compiler optimisations are big confounding factors here, and that's just the tip of the iceberg. At the very least we should be comparing perf on the same machine. |
Indeed; but I wonder whether the multiplication (tot avoid that too much details are lost when using lon/lat coordinates) as well as the casting to integers offset the benefits enough. Apparently it does but it surprises me |
@erikpols, in case you missed it, as of geo-0.21, we have boolean operations (construction an Intersection, Union, Xor, or Difference) built directly into geo: https://docs.rs/geo/latest/geo/algorithm/bool_ops/trait.BooleanOps.html The approach is slightly different from geo-boolean-ops - it's currently slower, but does handle some edge cases better. I've updated my fork of your benchmark to test it here: https://github.com/erikpols/vellum-public-geo-performance-rust/compare/master...michaelkirk:mkirk/geo-booleanop?expand=1 |
We have a bool ops. implementation, and its performance is in the ball-park of the bool predicates implementation ( |
I've seen a mention of maybe replacing the boolean operation algorithms for example here:
#620
What is the general state of the boolean operation algorithms from a performance perspective in this crate?
Reason is I ran a (somewhat simple) comparison test between this crate and npm's Turf crate. The intersect algorithm on polygons of Turf seems to be magnitudes faster even though it is written in typescript. I had a look at the source, and one reason I saw is the turf algorithm does a bounding box comparison before the actual intersect.
I would have no problem doing this myself, but it leads me to the question: Are you guys happy with the current algorithms? What can I expect in the future?
Add-on question: Is it possible to retrieve the resulting intersection when doing the intersection between two polygons?
The text was updated successfully, but these errors were encountered: