marp | theme | _class | class | footer | header | author | paginate | backgroundColor | transition | size | style |
---|---|---|---|---|---|---|---|---|---|---|---|
true |
freud |
lead |
default |
Building Open Climate Change Information Services in Python |
PyCon Lithuania 2024 |
Trevor James Smith |
true |
white |
fade |
58140 |
footer {
left: 5%;
font-size: 20px;
text-shadow: 0px 0px 10px #fff;
}
header {
right: 10%;
left: 60%;
text-align: right;
font-size: 20px;
text-shadow: 0px 0px 10px #fff;
}
img[alt~="center"] {
display: block;
margin: 0 auto;
}
.container{
display: flex;
}
.col{
flex: 1;
}
|
- Trevor James Smith PyCon Lithuania April 4th, 2024 Vilnius, Lithuania
<style scoped> li {font-size: 30px;} </style>
- Who am I? / What is Ouranos?
- What's our context?
- Climate Services?
xclim
: climate operationsfinch
:xclim
as a Service- Climate WPS Frontends
- Open Source Climate Services
- Acknowledgements
<style scoped> p {font-size: 30px;} </style>
Trevor James Smith
github.com/Zeitsperre [email protected]
- Research software developer/packager/maintainer from Montréal, Québec, Canada 🇨🇦
- Studied climate change impacts on wine viticulture 🍇 in Southern Québec
- Making stuff with Python 🐍 for ~6.5 years
- Užupio Respublikos 🖐️ pilietis (nuo 2024 m.)
<style scoped> p { font-size: 18px; text-align: right; } </style>
What is Ouranos? 🌀
- Non-profit research consortium established in 2003 in Montréal, Québec, Canada
- Created in response to the January 1998 North American Ice Storm 🌨️
- Climate Change Adaptation Planning
- Climate Model Data Producer/Provider
- Climate Information Services
Photo credit: https://www.communitystories.ca/v2/grand-verglas-saint-jean-sur-richelieu_ice-storm/
- Climate Change is having major impacts on Earth's environmental systems
- IPCC: Global average temperature has increased > +1.1 °C since 1850s.
- > +1.5 °C is considered to be beyond a safe limit
<style scoped> footer { position: absolute; bottom: 3%; font-size: 15px; } </style>
Climate science is a "Big Data" problem
- New climate models being developed every year
- More climate simulations being produced every day
- Higher resolution input and output datasets (gridded data)
- Specialised analyses and more personalized user needs
- Tailoring objectives and information to different user needs
- Providing access to climate information
- Building local mitigation/adaptation capacity
- Offering training and support
- Making sense of Big climate Data
<style scoped> li {font-size: 30px;} </style>
Climate Indicators, e.g.:
- Hot Days (Days with temperature >= 22 deg Celsius) 🌡️
- Beginning / End / Length of the growing season 🌷
- Average seasonal rainfall (3-Month moving average precipitation) ☔
- Many more examples
Planning Tools, e.g. :
- Maps 🗺️
- Point estimates at geographic locations 📈
- Gridded values 🌐
- Not really sure what they need? ❓ ➔ Guidance from experts!
MATLAB
-based in-house libraries (proprietary 💰)- No source code review
- Issues with data storage / access / processing
- Small team unable to meet demand 😫
- Lack of output data uniformity between researchers
⁉️ - Lots of bugs 🐛 and human error 🙅
- Data analysis/requests served manually ⏳
- Software testing + data validation? Not really. 😱
What does it need to perform?
- Climate Indicators
- Units management
- Metadata management
- Ensemble statistics;
- Bias Adjustment;
- Data Quality Assurance Checks
Implementation goals?
- Operational : Capable of handling very large ensembles of climate data
- Foolproof : Automatic verification of data and metadata validity by default
- Extensible : Flexibility of use and able to easily provide custom indicators, as needed
- Yes
- Robust, trustworthy, and fast scientific Python libraries
- Python's Readability / Reviewability (Peer Review)
- Growing demand for climate services / products
- Let the users help themselves
- The timing was right
- Internal and external demand for common tools
- Less time writing code, more time spent doing research
<style scoped> h2{ position: absolute; top: 7%; } li { position: absolute; bottom: 10%; font-size: 35px; } </style>
- Data Structure
- Algorithms
- Data and Metdata Conventions
<style scoped> h1 { position: absolute; bottom: 45%; } p { position: absolute; bottom: 10%; } </style>
~1625 tests (baseline)
+ Doctests
+ Jupyter Notebook tests
+ Optional module tests
+ Multiplatform/Anaconda Python tests
+ ReadtheDocs (fail-on-warning: true
)
@declare_units(snd="[length]")
def snow_depth(
snd: xarray.DataArray,
freq: str = "YS",
) -> xarray.DataArray:
"""Mean of daily average snow depth.
Resample the original daily mean snow depth series by taking the mean over each period.
Parameters
----------
snd : xarray.DataArray
Mean daily snow depth.
freq : str
Resampling frequency.
Returns
-------
xarray.DataArray, [same units as snd]
The mean daily snow depth at the given time frequency
"""
return snd.resample(time=freq).mean(dim="time").assign_attrs(units=snd.units)
indicators
(End-User API)- Metadata standards checks
- Data quality checks
- Time frequency checks
- Missing data-compliance
- Calendar-compliance
indice
(Core API)- For users that don't care for the standards and quality checks
import xclim
from clisops.core import subset
# Data is in Kelvin, threshold is in Celsius, and other combinations
# Extract a single point location for the example
ds_pt = subset.subset_gridpoint(ds, lon=-73, lat=44)
# Calculate indicators with different units
# Kelvin and Celsius
out1 = xclim.atmos.growing_degree_days(tas=ds_pt.tas, thresh="5 degC", freq="MS")
# Fahrenheit and Celsius
out2 = xclim.atmos.growing_degree_days(tas=ds_pt.tas_F, thresh="5 degC", freq="MS")
# Fahrenheit and Kelvin
out3 = xclim.atmos.growing_degree_days(tas=ds_pt.tas_F, thresh="278.15 K", freq="MS")
<style scoped> img { position: absolute; box-shadow: 0px 0px 10px rgba(0, 0, 0, 0.5); left: 8%; size: 90%; } </style>
import xclim
from clisops.core import subset
# Data is in Kelvin, threshold is in Celsius, and other combinations
# Extract a single point location for the example
ds_pt = subset.subset_gridpoint(ds, lon=-73, lat=44)
# Calculate indicators with different units
# Kelvin and Celsius
out1 = xclim.atmos.growing_degree_days(tas=ds_pt.tas, thresh="5 degC", freq="MS")
# Fahrenheit and Celsius
out2 = xclim.atmos.growing_degree_days(tas=ds_pt.tas_F, thresh="5 degC", freq="MS")
# Fahrenheit and Kelvin
out3 = xclim.atmos.growing_degree_days(tas=ds_pt.tas_F, thresh="278.15 K", freq="MS")
import xarray as xr
import xclim
ds = xr.open_dataset("my_dataset.nc")
with xclim.set_options(
# Drop timesteps with more than 5% of missing data
set_missing="pct", missing_options=dict(pct={"tolerance": 0.05}),
metadata_locales=["fr"] # Add French language metadata
):
# Calculate Annual Frost Days (days with min temperature < 0 °C)
FD = xclim.atmos.frost_days(ds.tas, freq="YS")
<style scoped> img { box-shadow: 0px 0px 10px rgba(0, 0, 0, 0.5); left: 15%; position: absolute; top: 20%; width: 70%; } </style>
import xarray as xr
import xclim
ds = xr.open_dataset("my_dataset.nc")
with xclim.set_options(
# Drop timesteps with more than 5% of missing data
set_missing="pct", missing_options=dict(pct={"tolerance": 0.05}),
metadata_locales=["fr"] # Add French language metadata
):
# Calculate Annual Frost Days (days with min temperature < 0 °C)
FD = aclim.atmos.frost_days(ds.tas, freq="YS")
<style scoped> h2 { position: absolute; top: 10%; } p { bottom: 8%; position: absolute; } </style>
Average temperature from the years 1991-2020 average across 14 Regional Climate Models (extreme warming scenario: SSP3-7.0)
- Model
train
/adjust
approach
- Non-standard calendar (
cftime
) support inxarray.groupby
- Quantile methods in
xarray.groupby
- Non-standard calendar conversion migrated from
xclim
toxarray
- Climate and Forecasting (CF) unit definitions inspired from
MetPy
- Inspiring work in
cf-xarray
- Inspiring work in
- Weighted variance, standard deviations, and quantiles in
xarray
(for ensemble statistics) - Faster
NaN
-aware quantiles innumpy
- Initial polyfit function in
xarray
- Also, we help maintain
xESMF
,intake-esm
,cf-xarray
,xncml
,climpred
and others forxclim
-related tools
-
There's just too much data that we need to crunch :
- The data could be spread across servers globally
- Local computing power is not powerful enough for the analyses
-
The user knows programming but not Python :
- A biologist who uses
R
or a different program for their work - An engineer who just needs a range of estimates for future rainfall
- A biologist who uses
-
The user just wants to see some custom maps :
- Agronomist who is curious about average growing conditions in 10 years?
- WMS : Web Mapping Service
- Google Maps
- WFS : Web Feature Service
- WCS : Web Coverage Service
- WPS : Web Processing Service
- Running geospatial analyses over the internet
<style scoped> h1 { position: absolute; top: 10%; } h3 { position: absolute; bottom: 10%; } h4 { position: absolute; top: 17%; right: 10%; } </style>
from birdy import WPSClient
wps = WPSClient("https://ouranos.ca/example/finch/wps")
# Using the OPeNDAP protocol
remote_dataset = "www.exampledata.lt/climate.ncml"
# The indicator call looks a lot like the one from `xclim` but
# passing a url instead of an `xarray` object.
response = wps.growing_degree_days(
remote_dataset,
thresh='10 degC',
freq='MS',
variable='tas'
)
# Returned as a streaming `xarray` data object
out = response.get(asobj=True).output_netcdf
out.growing_degree_days.plot(hue='location')
Bird-house/birdy -> PyWPS Helper Library
<style scoped> img { box-shadow: 0 0 10px rgba(0, 0, 0, 0.5); left: 10%; position: absolute; top: 15%; width: 80%; } </style>
from birdy import WPSClient
wps = WPSClient(finch_url)
# Using the OPeNDAP protocol
remote_dataset = "www.exampledata.lt/climate.ncml"
# The indicator call looks a lot like the one from `xclim` but
# passing a url instead of an `xarray` object.
response = wps.growing_degree_days(
remote_dataset,
thresh='10 degC',
freq='MS',
variable='tas'
)
# Returned as a streaming `xarray` data object
out = response.get(asobj=True).output_netcdf
out.growing_degree_days.plot(hue='location')
Bird-house/birdy -> PyWPS Helper Library
<style scoped> h1 { background-color: white; border-radius: 30px; font-size: 40px; left: 5%; opacity: 80%; padding: 16px; position: absolute; right: auto; top: 35%; } h2 { background-color: white; border-radius: 30px; font-size: 40px; left: 10%; opacity: 80%; padding: 16px; position: absolute; right: auto; top: 50%; } </style>
- Open Source Python libraries (
numpy
,sklearn
,xarray
, etc.) - Multithreading and streaming data formats (e.g.
OPeNDAP
andZARR
) - Common tools built collaboratively and shared widely (
xclim
,finch
) - Docker-deployed Web-Service-based infrastructure
- Testing, CI/CD pipelines, and validation workflows
- Peer-Reviewed software (pyOpenSci and JOSS)
<style scoped> li { font-size: 20px; } h1 { background: linear-gradient(#FFB81C, #FFB81C) top, linear-gradient(#046A38, #046A38) center, linear-gradient(#BE3A34, #BE3A34) bottom; background-size: 100% 33.33%; background-repeat: no-repeat; color: white; font-size: 75px; height: 12%; text-align: center; top: 100%; } </style>
- Pascal Bourgault
- David Huard
- Travis Logan
- Abel Aoun
- Juliette Lavoie
- Éric Dupuis
- Gabriel Rondeau-Genesse
- Carsten Ehbrecht
- Long Vu
- Sarah Gammon
- David Caron and many more contributors!
Have a great rest of PyCon Lithuania! 🇱🇹
This presentation: https://zeitsperre.github.io/PyConLT2024/