-
Notifications
You must be signed in to change notification settings - Fork 48
New issue
Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.
By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.
Already on GitHub? Sign in to your account
Non-deterministic row counts for spatial joins in DuckDB 1.2.0 #508
Comments
This should be fixed since yesterday, could you try reinstalling the v1.2.0 spatial extension? I.e. execute In short, we accidentally introduced a bug in the version of the extension we distribute on v1.2.0 when trying to backport another fix. But we've now replaced the binary again, with yet another patch applied that should fix this. Some more context |
Hi @Maxxen, I'm also still seeing this issue. I'm on
which is about 2 weeks old.
because a nightly v1.2.1 doesn't seem to be available. I've replicated the above on both x86 Mac OSX and x86 Ubuntu EC2 instance with fresh installs of |
@bmcandr Hello! Are you able to provide a reproduction? |
Extension version 6b3d93c is not the version that should be distributed for v1.2.1 |
Sure, I'm playing around with a Parquet file I created from a STAC Collection describing 7M+ satellite images. I'm just trying to select images intersecting a particular geometry, like the state of California: WITH cali as (
SELECT ST_GeomFromGeoJSON(geometry) geometry
FROM read_json('https://spelunker.whosonfirst.org/id/85688637/geojson')
) SELECT COUNT(s.*)
FROM "s3://satellogic-earthview-stac-geoparquet/satellogic-earthview-stac-items.parquet" s
JOIN cali ON ST_Intersects(cali.geometry, s.geometry); There are many images over California contained in the dataset, but the query frequently returns |
Alright, seems like we distribute the wrong extension version. This was fixed in a commit after 6b3d93c. Strangely main DuckDB CI is pinned at 919c69f and 2905968 on both main and v1.2-branches, so im not sure how we ended up with 6b3d93c. Ill see check in with the others once I get into the office tomorrow. |
Thanks! |
Ok, like always caching is the culprit. We've made adjustments on our end so if you already have spatial installed a simple |
Can confirm this version works. Thanks again @Maxxen! |
What happens?
Description
I'm experiencing inconsistent query results when running the same spatial query with DuckDB on EC2. When executing the identical query multiple times in succession, I get different row counts:
First run: 35,817 records
Second run: 291,614 records
Third run: 9,555 records
The correct number of records - which I get from running locally with 1.1.3 is 1,632,012
This appears to be a non-deterministic behavior introduced in DuckDB 1.2.0.
Environment
Local: DuckDB 1.1.3 (gives consistent results)
EC2: DuckDB 1.2.0
R package: duckdb
Extensions: spatial, aws, httpfs, icu
I'm not able to installed 1.1.3 into the EC2, closest I get is v1.1.3-dev165 but then the spatial extensions aren't working
To Reproduce
SQL I'm running
SETUP before running
The 'events' source is exactly the same location in S3 that the local and EC2 versions are querying. the 'route_ways' dataframe is exactly the same, and I've confirmed both the EC2 and local queries return the correct results for the count from the 'events' CTE and 'filtered_route_ways' CTE.
UPDATE:
I replaced the ST_Intersects function with the following SQL and I'm getting consistent results between local and EC2 versions
OS:
Mac M2
DuckDB Version:
1.2.0
DuckDB Client:
R
Hardware:
No response
Full Name:
Gareth Robins
Affiliation:
Robinsight
What is the latest build you tested with? If possible, we recommend testing with the latest nightly build.
I have tested with a stable release
Did you include all relevant data sets for reproducing the issue?
No - Other reason (please specify in the issue body)
Did you include all code required to reproduce the issue?
Did you include all relevant configuration (e.g., CPU architecture, Python version, Linux distribution) to reproduce the issue?
The text was updated successfully, but these errors were encountered: