Skip to content

Conversation

@bunnysocks
Copy link
Contributor

Did you read the Contributor Guide?

Is this PR related to a ticket?

What changes were proposed in this PR?

This PR adds new tests to improve coverage for Sedona’s GeoPandas integration, specifically for the sjoin() function using the predicate="dwithin" parameter.

Key updates:

  • Added a new test module test_sjoin_match.py under sedona/python/tests/geopandas/.
  • Implemented comparisons between Sedona’s and GeoPandas’ sjoin() outputs for dwithin() predicate.
  • Covered edge cases including:
    • Varying distance thresholds (0.5, 0.05, etc.)
    • Empty and small-distance joins
    • Type and integrity checks on returned GeoDataFrames

These tests aim to ensure Sedona’s spatial join behavior matches GeoPandas’ expected results and help detect future regressions.

How was this patch tested?

  • Manually verified using pytest locally.
  • Confirmed that:
    • Both Sedona and GeoPandas return equivalent results for valid dwithin() joins.
    • Output GeoDataFrames contain consistent left/right ID pairs.
  • CI will re-verify via automated testing upon PR submission.

Did this PR include necessary documentation updates?

  • No, this PR does not affect any public API, so no documentation changes are required.

@bunnysocks bunnysocks requested a review from jiayuasu as a code owner October 31, 2025 08:51
@petern48 petern48 self-requested a review November 1, 2025 03:53
Comment on lines 48 to 49
Copy link
Collaborator

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Thanks for working on this. It looks like you based this branch on your previous one in #2464, so the changes in your old PR render in addition to the changes you intended. Could you remove these changes from this PR? You can either remove the old commits (and force push) or add a new commit that undos these deletions.

import pytest
import geopandas as gpd
from shapely.geometry import Point
from sedona.geopandas import sjoin
Copy link
Collaborator

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Suggested change
from sedona.geopandas import sjoin
from sedona.spark.geopandas import sjoin

I tried running this locally and found that there are also multiple issues with this test code (more than just this), that prevent it from running properly. Simple things like data types issues. @bunnysocks Do you need help with setting up your developer environment? You can follow these directions for setting it up, so you can test locally. Getting it working by following those instructions should work pretty smoothly. The logic this PR proposes is on the right track, but it will be very helpful for you to run the tests locally to avoid simple errors. Feel free to ping if you get stuck or need help, including with dev env setup.

e.g failed CI run
https://github.com/apache/sedona/actions/runs/18967529015/job/54243890091?pr=2466

Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment

Projects

None yet

Development

Successfully merging this pull request may close these issues.

chore(geopandas): Write sjoin match tests for dwithin()

2 participants