New options: Percent coverage selection and weighting #136

sgoodm · 2016-11-21T19:49:40Z

Adds code to generate percent coverage estimates for raster cells by rasterizing vector features at a finer resolution than raster data, then aggregating back to raster data resolution. An adjustable scale can be used (percent_cover_scale int arg) to adjust how accurate the estimates are. A scale of 10 = 1 order of magnitude finer resolution and is generally good enough to get estimates <10% from actual. A scale of 100 will usually get you well under 1% but is slower and requires quite a bit more memory*.

Users can use these coverage estimates to either discard cells that do not meet a minimum coverage (percent_cover_selection float arg) or use the weights to generate (potentially) more accurate measures of mean, count, and sum stats (percent_cover_weighting bool arg)

*I have some misc memory improvements that I can put into another pull request

(I can take care of merging this PR with the latitude correction PR i submitted if you want to use both of them)

…some test coverage).

coveralls · 2016-11-21T19:57:25Z

Coverage decreased (-6.4%) to 91.328% when pulling ba627d7 on sgoodm:percent_cover into 6bdb524 on perrygeo:master.

coveralls · 2017-01-11T16:09:21Z

Coverage decreased (-5.9%) to 91.837% when pulling 6725da1 on sgoodm:percent_cover into 6bdb524 on perrygeo:master.

cmutel · 2017-03-03T12:31:25Z

src/rasterstats/main.py

+            if percent_cover_selection is not None:
+                masked = np.ma.MaskedArray(
+                    fsrc.array,
+                    mask=(isnodata | ~rv_array | percent_cover > percent_cover_selection))


This seems to be wrong - percent_cover is a boolean, and shouldn't the inequality go the other way? Right now it excludes values which are higher than the selection criteria. But maybe I am just not reading it correctly...

right. that should be rv_array < percent_cover_selection

cmutel · 2017-03-03T12:47:27Z

src/rasterstats/main.py

+            else:
+                masked = np.ma.MaskedArray(
+                    fsrc.array,
+                    mask=(isnodata | ~rv_array))


This will raise an TypeError is rv_array comes from rasterize_pctcover_geom - it won't be boolean, but instead have dtype float32, raising TypeError: ufunc 'invert' not supported for the input types.

good catch. think i can switch it to np.logical_not(rv_array) instead

cmutel · 2017-03-03T12:57:49Z

src/rasterstats/main.py

+            else:
+                rv_array = rasterize_geom(
+                    geom, shape=fsrc.shape, affine=fsrc.affine,
+                    all_touched=all_touched)


This looks like it is asking for trouble - these shouldn't be labelled the same thing, as they have different meanings and dtypes.

agreed, will change that

…nitions Create separate var name for pct cover rasterized array (rv_pct_array, float) to distinguish from basic rv_array (bool). Update masks for these to use np.logical_not to handle non bool data. Fix bug in pct cover selection mask (was using wrong var and wrong logical check. Updated all rv_array references related to pct cover cases to use new rv_pct_array

coveralls · 2017-03-21T15:49:14Z

Coverage decreased (-6.2%) to 91.575% when pulling a03eb04 on sgoodm:percent_cover into 6bdb524 on perrygeo:master.

coveralls · 2017-03-21T18:13:11Z

Coverage increased (+0.04%) to 97.802% when pulling 7000632 on sgoodm:percent_cover into 6bdb524 on perrygeo:master.

Pixels aren't always square

coveralls · 2017-03-24T16:23:19Z

Coverage increased (+0.04%) to 97.806% when pulling 85f62a1 on sgoodm:percent_cover into 6bdb524 on perrygeo:master.

Clearly distinguish between raster mask and weight arrays

sgoodm · 2017-03-24T17:31:14Z

Will need to consider how using percent cover as a selection method impacts nodata stat. May need to be a separate field?

coveralls · 2017-03-24T17:35:05Z

Coverage increased (+0.02%) to 97.786% when pulling 644ddc3 on sgoodm:percent_cover into 6bdb524 on perrygeo:master.

Percent cover into geo master

perrygeo

Concretely, I'd like to explore a different implementation that:

removes the latitude correction concept (keep PR focused)
kept code complexity in the main function to a minimum (less ifs)
removes the breaking change to the rasterize_geom function signature
uses a more flexible function for upsampling (I need to look more closely at this implementation, not sure I understanding it fully)

Things like docs and command line interface can come in a later PR.

src/rasterstats/utils.py

perrygeo · 2018-02-03T15:34:16Z

src/rasterstats/main.py

+                warnings.warn('Value for `percent_cover_scale` given ({0}) '
+                              'was converted to int ({1}) but does not '
+                              'match original value'.format(
+                                percent_cover_scale, int(percent_cover_scale)))


Can you explain why we need to limit to integers?

reshape performed in the rebin_sum function requires ints
https://docs.scipy.org/doc/numpy/reference/generated/numpy.reshape.html

perrygeo · 2018-02-03T15:36:19Z

src/rasterstats/utils.py

+#   resize-with-averaging-or-rebin-a-numpy-2d-array/8090605#8090605
+def rebin_sum(a, shape, dtype):
+    sh = shape[0],a.shape[0]//shape[0],shape[1],a.shape[1]//shape[1]
+    return a.reshape(sh).sum(-1, dtype=dtype).sum(1, dtype=dtype)


I haven't dug into this code but why choose this implementation over other methods of resampling? Specifically, using Rasterio's resampling techniques would give us more control over the resampling methods versus assuming sum: https://mapbox.github.io/rasterio/topics/resampling.html

"rebin" implies categorizing pixel values, I think "upsample" or similar would be a more accurate function name.

I don't think I looked into Rasterio's resampling methods, but I tested out a couple of different implementations (one was a proof of concept you put together for Rasterio, rasterio/rasterio#232, another was a more generalized aggregation scheme I pulled from another project of mine which had way too much overhead for what was needed here) and this method was a fair bit faster with less code. My main concern with any method here is going to be minimizing the additional time/memory required to run when using this feature.

Did you have a use case in mind that would require using a method other than sum?

I am on board with renaming to something more accurate, I had just kept it similar to the original function from SO I used.

perrygeo · 2018-02-03T15:39:25Z

src/rasterstats/main.py

+                                   (masked.shape[1] - np.sum(masked.mask, axis=1))))
+                    else:
+                        feature_stats['mean'] = float(masked.mean())
+


The latitude correction and the percent cover add branching complexity to the code and it detracts from readability a bit. Let's implement this with less if statements to test.

sounds good

dbaston · 2018-06-20T17:07:43Z

Sorry if this is off-topic/spam, but I've been working on a similar project that I thought might be of interest. It's some C++ code, exposed as an R package, that computes percent coverage exactly (it's a vector-based calculation), but also extremely quickly (it's faster than most cell-center-based implementations.) The approach is described here. It seems like it should be possible to wrap it up so that it can be called on numpy arrays.

sgoodm · 2018-08-10T16:20:27Z

Hoping to have a chance to get back to this in the next month or two

…ic to separate util functions (could expand this to class that covers all stat options later on)

sgoodm · 2018-08-31T17:49:17Z

Finally had some time to come back to this

stripped out the "latitude correction" stuff I had accidentally merged into PR branch
kept the like arg in the new and existing rasterization functions
minimized main function complexity by moving stat calculations related to percent cover to functions in utils

@perrygeo - let me know if you still have concerns regarding the percent_cover_scale or rebin_sum function (or anything else)

Also, I would definitely be interested in checking out a wrapper of what @dbaston has put together

sgoodm · 2018-09-04T16:00:26Z

triggering CI rebuild

jhamman · 2019-04-24T02:46:13Z

Hi all - just checking on the status here. This seems to have stalled more than once and I'd like to see if we can help move it forward, if at all possible.

akrherz · 2019-09-23T18:43:06Z

I hate to do the annoying whats-the-status-of-this-PR, but here I am :) Is there some known big caveat before considering attempting to use this change?

perrygeo · 2019-09-23T21:44:18Z

@akrherz I haven't tested it extensively but I believe, from a correctness-of-output perspective, you can rely on these changes. You could help by validating the results and posting here.

The remaining work concerns mitigating performance impacts, documentation, API changes, etc before creating a release out. Nothing too significant but I simply don't have the time to do the work. If you can use this branch instead of an official release, please do and let us know how it works for you.

davidabek1 · 2019-12-21T12:06:34Z

@sgoodm, I'm geo-spatial noob, but before I saw your approach, to solve weighted extract,
I saw @dbaston exact approach, that is different to yours that produce estimate weight.
I took an idea from the second answer to this question How does QGIS Zonal Statistics handle partially overlapping pixels?
which is a simple approach of vecotrizing each pixel and intersect with the overlay polygon, dividing areas will result for the fraction coverage of that pixel dividing that with sum of fractions will result with the weight of that pixel/poly part.
I was wondering if this approach has flaws?

sgoodm · 2019-12-23T01:25:21Z

@davidabek1 - vectorizing for exact intersections will not have any flaws in terms of accuracy, but will be more costly in terms of computations. For a few small geometries, like in that Stack question, the additional computation probably won't matter to you, but if you wanted to get exact coverage for many large geometries then the cost of those vector based intersections over many pixels adds up.

dbaston · 2020-01-22T20:46:45Z

@davidabek1 @sgoodm The approach of https://github.com/isciences/exactextract is equivalent to the Stack Exchange link; the difference is that exactextract does the vector-based intersections in a way that takes advantage of the fact that the raster isn't an arbitrary polygon, it's a grid. Done this way there is no performance penalty for the exact calculation. In many cases it is faster because of all of the avoided point-in-polygon tests.

sgoodm added 2 commits November 21, 2016 13:17

Add cell percent coverage selection method and stats weighting (with …

34dff52

…some test coverage).

Fix tests.

ba627d7

Modify all_touched for percent cover to rely on user input

6725da1

cmutel reviewed Mar 3, 2017

View reviewed changes

cmutel and others added 3 commits March 3, 2017 14:10

Clearly distinguish between raster mask and weight arrays

4ffe2ad

Pixels aren't always square

c0d9bc3

sgoodm added 4 commits March 21, 2017 13:48

Add tests for percent cover functions

795d86e

Fix

a9a4a3c

Fix mask conditionals for non-bool arrays

cfa198a

Update old version of test case

7000632

sgoodm mentioned this pull request Mar 21, 2017

Rasterization pixel overlap strategy #90

Open

sgoodm added 2 commits March 24, 2017 12:09

Merge branch 'percent_cover' into rectangular-pixels

b52a368

Merge pull request #3 from cmutel/rectangular-pixels

85f62a1

Pixels aren't always square

sgoodm and others added 3 commits March 24, 2017 13:14

Merge branch 'percent_cover' into percent_cover

76c8667

Merge pull request #2 from cmutel/percent_cover

cb87d40

Clearly distinguish between raster mask and weight arrays

Update nodata stat for percent cover changes (may need review)

644ddc3

perrygeo mentioned this pull request Mar 26, 2017

Test coverage #144

Merged

sgoodm added 2 commits March 30, 2017 15:10

Merge branch 'geo-master' into percent_cover

143a7cf

Merge pull request #4 from sgoodm/percent_cover

7a59a52

Percent cover into geo master

sgoodm force-pushed the percent_cover branch from 143a7cf to 644ddc3 Compare March 30, 2017 19:33

perrygeo requested changes Feb 3, 2018

View reviewed changes

sgoodm added 3 commits August 31, 2018 11:51

Resolve upstream merge

41222f9

Use 'like' object for rasterize funcs to keep inline with original code

6955476

Add correct usage of in rasterize_pctcover_geom func

1c39067

sgoodm force-pushed the percent_cover branch from 3b5f3bb to 1c39067 Compare August 31, 2018 17:06

sgoodm added 3 commits August 31, 2018 13:24

Reduce main function complexity by moving percent cover stat calc log…

2fe6d66

…ic to separate util functions (could expand this to class that covers all stat options later on)

Fix indent

d986437

Fix check for percent cover usage in stat functions

46fcebc

sgoodm added 3 commits September 4, 2018 11:07

Reduce lines to get test coverage

fc558c2

Add in some easy test coverage

c475b58

Add missing import for tests

a52055f

sgoodm closed this Sep 4, 2018

sgoodm reopened this Sep 4, 2018

perrygeo self-assigned this Sep 28, 2018

mathause mentioned this pull request Sep 17, 2019

fractional overlap regionmask/regionmask#38

Closed

sgoodm mentioned this pull request Jul 21, 2022

Proposal for a third rasterization strategy #259

Closed

perrygeo added the Needs Additional Info label Jan 15, 2023

perrygeo removed their assignment Apr 25, 2023

d-saikrishna mentioned this pull request Sep 7, 2023

IDS-DRR IMD Source Pipeline CivicDataLab/IDS-DRR-Assam-Risk-Model#11

Closed

vincentsarago mentioned this pull request Sep 21, 2023

Compute area-weighted statistics cogeotiff/rio-tiler#640

Merged

4 tasks

New options: Percent coverage selection and weighting #136

Are you sure you want to change the base?

New options: Percent coverage selection and weighting #136

Uh oh!

Conversation

sgoodm commented Nov 21, 2016

Uh oh!

coveralls commented Nov 21, 2016 • edited Loading Uh oh! There was an error while loading. Please reload this page.

Uh oh!

Uh oh!

coveralls commented Jan 11, 2017 • edited Loading Uh oh! There was an error while loading. Please reload this page.

Uh oh!

Uh oh!

Choose a reason for hiding this comment

Uh oh!

Choose a reason for hiding this comment

Uh oh!

Choose a reason for hiding this comment

Uh oh!

Choose a reason for hiding this comment

Uh oh!

Choose a reason for hiding this comment

Uh oh!

Choose a reason for hiding this comment

Uh oh!

coveralls commented Mar 21, 2017

Uh oh!

coveralls commented Mar 21, 2017

Uh oh!

coveralls commented Mar 24, 2017

Uh oh!

sgoodm commented Mar 24, 2017

Uh oh!

coveralls commented Mar 24, 2017

Uh oh!

perrygeo left a comment • edited Loading Uh oh! There was an error while loading. Please reload this page.

Uh oh!

Choose a reason for hiding this comment

Uh oh!

Uh oh!

Uh oh!

Uh oh!

Choose a reason for hiding this comment

Uh oh!

Choose a reason for hiding this comment

Uh oh!

Choose a reason for hiding this comment

Uh oh!

sgoodm Feb 5, 2018 • edited Loading Uh oh! There was an error while loading. Please reload this page.

Uh oh!

Choose a reason for hiding this comment

Uh oh!

Choose a reason for hiding this comment

Uh oh!

Choose a reason for hiding this comment

Uh oh!

dbaston commented Jun 20, 2018

Uh oh!

sgoodm commented Aug 10, 2018

Uh oh!

sgoodm commented Aug 31, 2018 • edited Loading Uh oh! There was an error while loading. Please reload this page.

Uh oh!

Uh oh!

sgoodm commented Sep 4, 2018

Uh oh!

jhamman commented Apr 24, 2019

Uh oh!

akrherz commented Sep 23, 2019

Uh oh!

perrygeo commented Sep 23, 2019

Uh oh!

davidabek1 commented Dec 21, 2019

Uh oh!

sgoodm commented Dec 23, 2019

Uh oh!

dbaston commented Jan 22, 2020

Uh oh!

Uh oh!

coveralls commented Nov 21, 2016 •

edited

Loading

coveralls commented Jan 11, 2017 •

edited

Loading

perrygeo left a comment •

edited

Loading

sgoodm Feb 5, 2018 •

edited

Loading

sgoodm commented Aug 31, 2018 •

edited

Loading