This repository was archived by the owner on Jan 23, 2023. It is now read-only.
-
Notifications
You must be signed in to change notification settings - Fork 2
This repository was archived by the owner on Jan 23, 2023. It is now read-only.
Proposed new design for /datasets API: Dataset metadata is still stored in json metadata in S3, but generated by separate github repo #3
Copy link
Copy link
Open
Labels
Description
Background:
Right now /v1/datasets returns datasets generated by the dataset metadata generator and stored on S3 (e.g. dev-dataset-metadata.json).
What does the metadata lambda do now?
- generates a temporal domain, list of dates that are valid for this particular dataset
- does something similar for "sites"
Goals:
- Users can PR new or updated datasets and have them automatically picked up by the datasets API (when merged to a
mainbranch). - Users can have a mosaicjson endpoint to visualize their dataset / data collection.
Problem with this approach:
- If people want to add new layers to the dashboard, they would still need to open PR and have it reviewed and approved.
- Alternatives: people can POST new datasets to the dataset API (but these could not work)
Acceptance criteria:
<env>-dataset-metadata.jsonstored on S3 is updated whenever PR is merged to newdashboard-datasets-starterrepo- Config files PR'd to this new repo can include STAC API URL and query parameters to generate a mosaic. The lambda will generate the mosaic endpoint and include that endpoint in the
<env>-dataset-metadata.json - For MAAP: user can create a PR to
dashboard-datasets-maapfor data with existing tiles endpoint and it will add layer to dashboard (once merged) - For MAAP: user can create a PR to
dashboard-datasets-maapfor SRTM data mosaic (using STAC API and query parameters) to add SRTM layer to dashboard (once merged)
Proposed solution:
Tasks:
- Create new repo in NASA-IMPACT “dashboard-datasets(-starter)” and reference / reuse code from dashboard-api-starter dataset metadata generator lambda for generating metadata via a lambda function and storing it on S3. This code could be copy/paste but acceptance criteria for this first task is just to take the existing dataset config files(s) (e.g. for MODIS) stored in the same repo and updates the S3 metadata file. S3 bucket location should configurable.
- Lambda code: (Code will should also be removed from) https://github.com/NASA-IMPACT/dashboard-api-starter/blob/main/lambda/dataset_metadata_generator
- https://github.com/NASA-IMPACT/dashboard-api-starter/blob/main/dashboard_api/db/static/
- Design what a revised config file should look like (e.g. one that can use STAC API and titiler endpoints)
- Include in new repo github workflow to trigger the lambda whenever a dataset is updated or created to the
mainbranch - Metadata generator lambda generates mosaic(s), creates or updates dataset json metadata and updates updates dataset json on S3
Improvements:
- Github linting of new updated dataset config files based off some basic checks
Questions:
- Can we remove all the
/sitescode for now and re-implement when requested? - Can we assume the generation of the temporal domain (e.g. this function https://github.com/NASA-IMPACT/dashboard-api-starter/blob/main/lambda/dataset_metadata_generator/src/main.py#L209) can and should still work?
- Is it the right approach to have the lambda call the POST /mosaic or will users always do that themselves and then PR a config file with the mosaic URL already defined?
- For MAAP, we might want to restrict datasets which can be visualized to those published in CMR. How can we quality control datasets?
Reactions are currently unavailable
