Skip to content
This repository has been archived by the owner on Jun 1, 2022. It is now read-only.

Validate latitude/longitude on Locations by reverse geocoding #388

Draft
wants to merge 1 commit into
base: main
Choose a base branch
from

Conversation

shajith
Copy link
Contributor

@shajith shajith commented Apr 21, 2021

Solving this:
image

Uses: https://github.com/thampiman/reverse-geocoder

Checks that the latitude/longitude, when geocoded, produces a spot that is in the state that the location is claimed to be in.

Does this if not:

image

@shajith shajith force-pushed the shajith/validate-lat-lng branch from 17caec4 to ec1a024 Compare April 21, 2021 22:37
@shajith shajith force-pushed the shajith/validate-lat-lng branch from ec1a024 to b2c2ad6 Compare April 21, 2021 22:51
@codecov
Copy link

codecov bot commented Apr 21, 2021

The author of this PR, shajith, is not an activated member of this organization on Codecov.
Please activate this user on Codecov to display this PR comment.
Coverage data is still being uploaded to Codecov.io for purposes of overall coverage calculations.
Please don't hesitate to email us at [email protected] with any questions.

@simonw
Copy link
Collaborator

simonw commented Apr 21, 2021

I don't think this is the right approach here, for a couple of reasons.

Firstly, this is a really heavy new dependency - it pulls in the whole of SciPy and NumPy, which is a big increase in our overall dependency size. I don't think it's worth adding that to our application server footprint for this single use-case.

More importantly, having this as a hard validation check is going to cause us problems for any location that is close to a border.

41.9950887,-123.7249003 for example is a location in California right next to the Oregon border - but when I run this:

reverse_geocoder.search((41.9950887,-123.7249003))

I get back:

[{'lat': '42.16289',
  'lon': '-123.64812',
  'name': 'Cave Junction',
  'admin1': 'Oregon',
  'admin2': 'Josephine County',
  'cc': 'US'}]

Finally, there's quite a performance overhead in running this function:

%timeit reverse_geocoder.search((41.9950887,-123.7249003))
799 ms ± 126 ms per loop (mean ± std. dev. of 7 runs, 1 loop each)

Running the validator in the model itself means this will add 800ms to each save we make for a Location - which is a big problem for our bulk data import APIs like /api/importLocations and /api/updateLocations which could be run against hundreds of locations at once.

@simonw
Copy link
Collaborator

simonw commented Apr 21, 2021

Rather than implementing this as validation logic in VIAL, I suggest we run this as a separate process. We could set something up that runs on a schedule to verify that locations in VIAL aren't dramatically outside their suggested boundaries - using either this library or some other mechanism, such as the API I built at https://us-counties.datasette.io/counties/county_for_latitude_longitude?longitude=-123.7249003&latitude=41.9950887 for resolving a location to a county and state based on geographic polygons from the U.S. census.

@simonw
Copy link
Collaborator

simonw commented Apr 21, 2021

This will be a lot easier once I add the "stream all results" mechanism to the search API in #367 - since you'll be able to do things like retrieve all 10,000 CA locations in one stream of newline-delimited GeoJSON and feed that into a script that geocodes them all.

@shajith
Copy link
Contributor Author

shajith commented Apr 21, 2021

All that makes sense. This IS a terribly large dependency for this use case, and I had my misgivings about that. I do want to catch wildly off values during manual data entry though, but I'm happy wait to see whether this becomes a larger problem.

How should I go about setting up a separate process to verify or geocode lat/lng values on locations? Do we have any precedent for that in the codebase? It seems generally useful to be able to run that sort of thing on a schedule over the data.

@simonw
Copy link
Collaborator

simonw commented Apr 21, 2021

We could definitely make an improvement here for manual data entry - especially if we could show people a warning there but let them override it (for weird edge cases like a cruise ship offering vaccinations just off the coast of Puerto Rico or whatever)

For that I'd suggest some element of client-side validation in the admin interface itself. I'd like us to show the lat/Lon there on a map and let people drag to adjust the marker, which could relate into this too.

@shajith
Copy link
Contributor Author

shajith commented Apr 21, 2021

Validating client-side seems sufficient for my purposes. I can go look for ways to inject JS into django admin. Would it be appropriate to use the datasette API you linked above for this purpose (as opposed to google maps or something)?

@shajith shajith marked this pull request as draft April 22, 2021 01:16
Sign up for free to subscribe to this conversation on GitHub. Already have an account? Sign in.
Labels
None yet
Projects
None yet
Development

Successfully merging this pull request may close these issues.

2 participants