Conversation
1eda785 to
70abc7a
Compare
There was a problem hiding this comment.
Looks great! Builds fine locally (after updating Poetry) and tests run successfully (no local thredds).
I’ve now tested the demo, see my follow-up comment below. Leaving a few inline comments and two minor suggestions:
- CI is currently pinned to Python 3.9. Do we have a reason to stay there, or could we move to 3.10/3.11?
- A few workflows are still using older action versions (
actions/checkout@v2,actions/setup-python@v2,docker/build-push-action@v1). It might be worth bumping these to current releases.
| import pytest | ||
| import os | ||
|
|
||
| # WARNING: in the test suite, paths are sometimes specified absolutely and sometimes relatively. |
There was a problem hiding this comment.
Possible fix using: pathlib-parent
tests_dir = Path(__file__).resolve().parent # .../tests
os.environ["DATA_ROOT"] = str(tests_dir / "data")
| # remaining filepath must end with .nc | ||
|
|
||
| # Flask strips the leading slash from the filepath argument, so we strip it here (and add it later) | ||
| data_root = os.getenv("DATA_ROOT".lstrip("/"), "storage/") |
There was a problem hiding this comment.
I think this should be:
os.getenv("DATA_ROOT", "storage/").lstrip("/")
| thredds_base = os.getenv("THREDDS_HTTP_BASE") | ||
|
|
||
| logger.info(f"Slicing file") | ||
| subprocess.run( |
There was a problem hiding this comment.
Should we add a timeout or capture stderr in case NCO fails or hangs?
|
|
||
| def create_app(config=None): | ||
| app = Flask(__name__) | ||
| app.config.from_object(config) |
There was a problem hiding this comment.
I think this is currently a no-op since we don’t pass a config. It might be worth adding a comment noting that this is intentional and keeps the app factory ready for future configuration.
QSparks
left a comment
There was a problem hiding this comment.
Testing the demo this morning. The partition call can exceed Gunicorn’s default 30s timeout, which kills the worker. We’ll need to extend the Gunicorn timeout (and possibly add subprocess timeouts).
| - name: Set up Python | ||
| uses: actions/setup-python@v2 | ||
| with: | ||
| python-version: 3.9 |
There was a problem hiding this comment.
Q already mentioned it but heres a code example from pycds using the python version matrix: https://github.com/pacificclimate/pycds/blob/master/.github/workflows/python-ci.yml#L10
| name: Black | ||
|
|
||
| on: push | ||
|
|
There was a problem hiding this comment.
we can use a concurrency block to cancel tests if this one fails. It'd look something like
concurrency:
group: ${{ github.workflow }}-${{ github.event.number || github.ref }}
cancel-in-progress: true| # remaining filepath must end with .nc | ||
|
|
||
| # Flask strips the leading slash from the filepath argument, so we strip it here (and add it later) | ||
| data_root = os.getenv("DATA_ROOT".lstrip("/"), "storage/") |
|
|
||
| def check_filepath(filepath): | ||
| """make sure user has requested a valid file that exists""" | ||
| args = {} |
There was a problem hiding this comment.
use of a namedtuple or a dataclass here might give you better type checking over a raw dictionary
Creates a simple translation app to sit between the PDP backend and THREDDS and handle the needs of the PDP to access netCDF files in various ways. The primary issue here is that THREDDS's streaming download has a maximum OpenDAP file size of 500 MB, but users often want to download larger files. Requests under 500MB are redirected to THREDDS; requests larger than 500MB are answered by using
ncksto create the requested data subset in a new file, and forwarding the user to THREDDS' whole file server to download the new file.tasmaxandpr, which can be larger than 500MB, are not allowed.ncksand then redirecting the user's browser to THREDD's whole-file download API, which does not have a size limit.At present, it is easy to run this app out of RAM during the file-creation step; improving RAM usage is an urgently needed next step, but I felt this PR was long enough already, so those improvements will be a separate PR.
Here is a demo PDP stack using this app. Please fool around and find requests the PDP generates that aren't covered yet!
Resolves #1