Skip to content
New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

Improve Portal Documentation (Data Catalog, How it Works, ...) #29

Open
davidgasquez opened this issue Jan 5, 2024 · 8 comments
Open
Assignees
Labels
documentation Improvements or additions to documentation

Comments

@davidgasquez
Copy link
Owner

We should expose both schemas and samples for all the curated datasets.

This will improve UX and make choosing datasets easier!

@davidgasquez davidgasquez added the documentation Improvements or additions to documentation label Jan 5, 2024
@davidgasquez davidgasquez self-assigned this Jan 5, 2024
@davidgasquez
Copy link
Owner Author

@DistributedDoge mentioned to publish a notebook that loops over tables or whatever export dir is, fetches schema from .parquetfiles and surfaces that.

@DistributedDoge
Copy link
Collaborator

The code has landed, catalog will be built next time you update github pages by doing make publish.

Just remember to do make run first so that notebook can access table schema from which catalog is being built.

@davidgasquez
Copy link
Owner Author

The code has landed, catalog will be built next time you update github pages by doing make publish. 🤦

I thought the website was published with each push! 🤷 Got confussed with the Filecoin one.

Created an issue now: #36

@davidgasquez
Copy link
Owner Author

@DistributedDoge
Copy link
Collaborator

Pretty neat. Three things I will try to add later:

  • download link to each file, made easier thanks to stable links in Add IPNS support #18
  • file size + row count so folks know how much data they have on hand
  • better display for columns with nested schema

@davidgasquez
Copy link
Owner Author

Sharing it here so I remember in the future.

Would be awesome to aim for something like this: https://py-code.org/datasets

Nice UX and UI!

@davidgasquez davidgasquez changed the title Surface Data Catalog in the website Improve website Data Catalog Jan 8, 2024
@DistributedDoge
Copy link
Collaborator

Also, on catalog side,

  • Detailed descriptions for 3 most used tables (rounds, projects, rounds_votes)
  • Introductory paragraph (data is mainly from indexer, updated at least weekly etc.)

Inspiration:

https://docs.passport.gitcoin.co/building-with-passport/passport-api/data-dictionary

@davidgasquez
Copy link
Owner Author

Cool find! I think we can do something similar with Dagster assets metadata.

Similar to what Subsets does.

Not sure how to deal with dbt models though! Perhaps we can extract the docs from the YAML files or even better, make Dagster understand dbt docs. 🤔

@davidgasquez davidgasquez changed the title Improve website Data Catalog Improve Portal Documentation (Data Catalog, How it Works, ...) Apr 8, 2024
Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment
Labels
documentation Improvements or additions to documentation
Projects
None yet
Development

No branches or pull requests

2 participants