Skip to content
Open
Show file tree
Hide file tree
Changes from all commits
Commits
Show all changes
27 commits
Select commit Hold shift + click to select a range
File filter

Filter by extension

Filter by extension


Conversations
Failed to load comments.
Loading
Jump to
Jump to file
Failed to load files.
Loading
Diff view
Diff view
16 changes: 16 additions & 0 deletions .github/workflows/black.yml
Original file line number Diff line number Diff line change
@@ -0,0 +1,16 @@
name: Black

on: push

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

we can use a concurrency block to cancel tests if this one fails. It'd look something like

concurrency:
  group: ${{ github.workflow }}-${{ github.event.number || github.ref }}
  cancel-in-progress: true

jobs:
black:

runs-on: ubuntu-24.04

steps:
- uses: actions/checkout@v2
- uses: actions/setup-python@v2
- uses: psf/black@stable
with:
options: ". --check"
version: "25.9.0"
23 changes: 23 additions & 0 deletions .github/workflows/docker-publish.yml
Original file line number Diff line number Diff line change
@@ -0,0 +1,23 @@
name: Docker Publishing

on:
push:
branches:
- '*'
tags:
- '[0-9]+.[0-9]+.[0-9]+'

jobs:
publish:

runs-on: ubuntu-24.04

steps:
- uses: actions/checkout@v2
- name: Publish to Registry
uses: docker/build-push-action@v1
with:
username: ${{ secrets.pcicdevops_at_dockerhub_username }}
password: ${{ secrets.pcicdevops_at_dockerhub_password }}
repository: pcic/ncpartitioner
tag_with_ref: true
35 changes: 35 additions & 0 deletions .github/workflows/python-ci.yml
Original file line number Diff line number Diff line change
@@ -0,0 +1,35 @@
name: Python CI

on: push

jobs:
test:

runs-on: ubuntu-22.04

steps:
- name: Checkout
uses: actions/checkout@v2

- name: Set up Python
uses: actions/setup-python@v2
with:
python-version: 3.9

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Q already mentioned it but heres a code example from pycds using the python version matrix: https://github.com/pacificclimate/pycds/blob/master/.github/workflows/python-ci.yml#L10


- name: Install OS Dependencies
run: |
sudo apt update
sudo apt install python3-dev
sudo apt-get install nco
sudo apt-get install curl

- name: Install Poetry
run: |
pip install poetry==2.2

- name: Install project
run: |
poetry install

- name: Test with pytest
run: poetry run pytest
15 changes: 15 additions & 0 deletions Dockerfile
Original file line number Diff line number Diff line change
@@ -0,0 +1,15 @@
FROM python:3.9-slim

RUN apt-get update && apt-get install -y \
nco \
curl

COPY . /app
WORKDIR /app

RUN pip install poetry==2.2
ENV PATH="/root/.local/bin:$PATH"
RUN poetry install

EXPOSE 5000
CMD ["poetry", "run", "gunicorn", "--workers=10", "--bind=0.0.0.0:5000", "wsgi:app", "--timeout=300"]
60 changes: 60 additions & 0 deletions README.md
Original file line number Diff line number Diff line change
@@ -0,0 +1,60 @@
# NCPartitioner

This container generates user-requested netCDF files using `ncks` and makes them available for download via THREDDS.

## Run for Development

The Unidata netCDF tools must be installed. This package can be installed with poetry:

```
apt get install nco
github clone http://github.com/pacificclimate/ncpartitioner
poetry install
```

To do end-to-end testing, you will also need a THREDDS instance running on your workstation, though the test suite does not need a working THREDDS instance. Set the environment variables:

* `OUTPUT_DIR` - file directory to put the partitioned files in. It should be accessible to THREDDS
* `THREDDS_HTTP_BASE` - the base URL for the THREDDS http server (probably ends /fileserver); a user will be redirected to download the completed file
* `THREDDS_DAP_BASE` - the base URL for the THREDDS openDAP server (probably ends /dodsC): used to fulfill metadata requests
* `DATA_ROOT` - directory under which all data is found; prevents files outside the directory from being served

Run with flask:
```
poetry run flask run
```

Run the test suite (environment variables will be provided by pytest and do not need to be set):
```
poetry run pytest
```

## Data assumptions

This server assumes all files to be downloaded are netCDF4 files with dimensions named `lat`, `lon`, and `time`, and that all variables one might wish to download have those dimensions. Timeless files or station-based geometries cannot be downloaded via this server. Only one variable may be downloaded at a time.

## Request format

Request format is indicated by concatenating an extension onto the `filepath` parameter. Some request formats require an additional `targets` parameter. Request attributes other than `targets` and `filepath` are ignored.

This server supports four request formats. Three of them are simply redirected to the THREDDS server:

### DDS request
`https://server/partition/?filepath=path/to/file.nc.dds&targets=time`

Redirects to a THREDDS page displaying metadata about the `time` dimension. This request accepts a single dimension.

### DAS request
`https://server/partition/?filepath=path/to/file.nc.das`

Redirects to a THREDDS page displaying metadata about all variables and attributes.`targets` attribute is ignored, if present.

### ASCII request
`https://server/partition/?filepath=path/to/file.nc.ascii&targets=lat,lon`

Redirects to a THREDDS page displaying values for the requested dimension variable(s) in ASCII format. This server will only display values for dimension variables (`lat`, `lon`, and `time`) via this request type. OpenDAP standards support requesting any variable in ASCII format this way, but since THREDDS has a 500MB maximum file size for DAP requests, this server only supports requesting the dimension variables, not multidimensional data variables.

### Partition request
`https://server/partition/?filepath=path/to/file.nc.nc&targets=time[0:10],lat[0:20],lon[0:30],tasmax[0:10][0:20][0:30]`

Creates a file with the requested dimensions using `ncks`, then redirects the user to the THREDDS page to download the newly created file. Note that the variable is always trimmed to the hyperslab specified in the dimensions portion of the `targets` attribute; if the variable portion of the `targets` attribute is different, it will be overruled.
15 changes: 15 additions & 0 deletions ncpartitioner/__init__.py
Original file line number Diff line number Diff line change
@@ -0,0 +1,15 @@
from flask import Flask
import logging


def create_app(config=None):
app = Flask(__name__)
app.config.from_object(config)
Copy link

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

I think this is currently a no-op since we don’t pass a config. It might be worth adding a comment noting that this is intentional and keeps the app factory ready for future configuration.

app.logger.setLevel(logging.INFO)

with app.app_context():
from .routes import partition

app.register_blueprint(partition)

return app
77 changes: 77 additions & 0 deletions ncpartitioner/response.py
Original file line number Diff line number Diff line change
@@ -0,0 +1,77 @@
"""send responses to user requests. TResponses are always a redirect to a THREDDS-served file.
In cases of DDS and DAS, the file already exists; for data requests the filemust be created first.
"""

from posixpath import dirname
import subprocess
import os
from flask import redirect
import logging

logger = logging.getLogger(__name__)


def slice(args):
output_dir = os.getenv("OUTPUT_DIR")
thredds_base = os.getenv("THREDDS_HTTP_BASE")

logger.info(f"Slicing file")
subprocess.run(
Copy link

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Should we add a timeout or capture stderr in case NCO fails or hangs?

[
"ncks",
"-v",
f"{args['variable']}",
"-d",
f"time,{args['time'][0]},{args['time'][1]}",
"-d",
f"lat,{args['lat'][0]},{args['lat'][1]}",
"-d",
f"lon,{args['lon'][0]},{args['lon'][1]}",
f"/{args['dirname']}/{args['basename']}.{args['extension']}",
os.path.join(
output_dir,
f"{args['basename']}_{args['timestamp']}.{args['extension']}",
),
],
check=True,
)

output_filename = f"{args['basename']}_{args['timestamp']}.{args['extension']}"
logger.info(
f"Slice complete; file saved to {os.path.join(output_dir, output_filename)}"
)
logger.info(f"Sending redirect to {thredds_base}{output_dir}/{output_filename}")

return redirect(f"{thredds_base}{output_dir}/{output_filename}")


def dap_filepath(args):
"""construct the filepath for DDS/DAS requests"""
thredds_base = os.getenv("THREDDS_DAP_BASE")
return f"{thredds_base}/{args['dirname']}/{args['basename']}.{args['extension']}"


def dds(args):
filepath = dap_filepath(args)
logger.info(f"Received DDS request: filepath={filepath}")
if "target" in args:
return redirect(f"{filepath}.dds?{args['target']}")
return redirect(f"{filepath}.dds")


def das(args):
filepath = dap_filepath(args)
logger.info(f"Received DAS request: filepath={filepath}")

return redirect(f"{filepath}.das")


def asc(args):
# returns requested dimension data in ASCII format; this function does not return gridded data.
filepath = dap_filepath(args)
dims = (
args["target"] if isinstance(args["target"], str) else ",".join(args["target"])
)
logger.info(f"Received ASCII request: filepath={filepath}")

return redirect(f"{filepath}.ascii?{dims}")
47 changes: 47 additions & 0 deletions ncpartitioner/routes.py
Original file line number Diff line number Diff line change
@@ -0,0 +1,47 @@
from flask import Blueprint, request, redirect
from ncpartitioner.sanitize import (
check_filepath,
check_targets_slice,
check_targets_dds,
check_ranges,
check_targets_ascii,
)
from ncpartitioner.response import slice, dds, das, asc
import logging

logger = logging.getLogger(__name__)

partition = Blueprint("partition", __name__, url_prefix="/partition")


@partition.route("/", methods=["GET"])
def ncpartitioner():
"""creates the requested netCDF with NCO, moves it to where THREDDS can serve it, and returns a link to the user"""
logger.info(f"received request {request.url}")
filepath = request.args.get("filepath")
targets = request.args.get("targets", None)

try:
args = check_filepath(filepath)
except ValueError as ve:
logger.error(f"Input error: {ve}")
return f"Input error: {ve}", 400

if args["request_format"] == "dds":
args.update(check_targets_dds(targets, args))
return dds(args)
elif args["request_format"] == "das":
return das(args)
elif args["request_format"] == "nc":
try:
args.update(check_targets_slice(targets))
check_ranges(args)
except ValueError as ve:
logger.error(f"Input error: {ve}")
return f"Input error: {ve}", 400

logger.info(f"Received slice request: filepath={filepath}, targets={targets}")
return slice(args)
elif args["request_format"] in ["ascii", "asc"]:
args.update(check_targets_ascii(targets))
return asc(args)
Loading
Loading