Skip to content
This repository was archived by the owner on Jan 23, 2023. It is now read-only.
This repository was archived by the owner on Jan 23, 2023. It is now read-only.

Proposed new design for /datasets API: Dataset metadata is still stored in json metadata in S3, but generated by separate github repo #3

@abarciauskas-bgse

Description

@abarciauskas-bgse

Background:
Right now /v1/datasets returns datasets generated by the dataset metadata generator and stored on S3 (e.g. dev-dataset-metadata.json).
What does the metadata lambda do now?

  • generates a temporal domain, list of dates that are valid for this particular dataset
  • does something similar for "sites"

Goals:

  • Users can PR new or updated datasets and have them automatically picked up by the datasets API (when merged to a main branch).
  • Users can have a mosaicjson endpoint to visualize their dataset / data collection.

Problem with this approach:

  • If people want to add new layers to the dashboard, they would still need to open PR and have it reviewed and approved.
  • Alternatives: people can POST new datasets to the dataset API (but these could not work)

Acceptance criteria:

  • <env>-dataset-metadata.json stored on S3 is updated whenever PR is merged to new dashboard-datasets-starter repo
  • Config files PR'd to this new repo can include STAC API URL and query parameters to generate a mosaic. The lambda will generate the mosaic endpoint and include that endpoint in the <env>-dataset-metadata.json
  • For MAAP: user can create a PR to dashboard-datasets-maap for data with existing tiles endpoint and it will add layer to dashboard (once merged)
  • For MAAP: user can create a PR to dashboard-datasets-maap for SRTM data mosaic (using STAC API and query parameters) to add SRTM layer to dashboard (once merged)

Proposed solution:

MAAP viz + dashboard use case design

Tasks:

  • Create new repo in NASA-IMPACT “dashboard-datasets(-starter)” and reference / reuse code from dashboard-api-starter dataset metadata generator lambda for generating metadata via a lambda function and storing it on S3. This code could be copy/paste but acceptance criteria for this first task is just to take the existing dataset config files(s) (e.g. for MODIS) stored in the same repo and updates the S3 metadata file. S3 bucket location should configurable.
  • Design what a revised config file should look like (e.g. one that can use STAC API and titiler endpoints)
  • Include in new repo github workflow to trigger the lambda whenever a dataset is updated or created to the main branch
  • Metadata generator lambda generates mosaic(s), creates or updates dataset json metadata and updates updates dataset json on S3

Improvements:

  • Github linting of new updated dataset config files based off some basic checks

Questions:

  • Can we remove all the /sites code for now and re-implement when requested?
  • Can we assume the generation of the temporal domain (e.g. this function https://github.com/NASA-IMPACT/dashboard-api-starter/blob/main/lambda/dataset_metadata_generator/src/main.py#L209) can and should still work?
  • Is it the right approach to have the lambda call the POST /mosaic or will users always do that themselves and then PR a config file with the mosaic URL already defined?
  • For MAAP, we might want to restrict datasets which can be visualized to those published in CMR. How can we quality control datasets?

Metadata

Metadata

Assignees

No one assigned

    Type

    No type

    Projects

    No projects

    Milestone

    No milestone

    Relationships

    None yet

    Development

    No branches or pull requests

    Issue actions