Skip to content
New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

feat: ⚡️ add job for dune upload #42

Merged
merged 1 commit into from
Jan 11, 2024

Conversation

DistributedDoge
Copy link
Collaborator

@DistributedDoge DistributedDoge commented Jan 9, 2024

This PR builds an op and three jobs to see if we can play nicely with new dagster primitives and address #38.

It does not modify existing workflow in any way (or at least I hope it doesn't), but allows us to use new CLI commands:

dagster job list -m ggdp
dagster job execute -m ggdp -j build_indexer_assets

For dagster UI navigate to Overview =>Jobs to see new additions:

  • build all materializes everything
  • build_indexer_assets builds assets provided by indexer, except votes for fast testing
  • refresh dune materializes an asset and then uses op to upload said asset onto Dune.

To succeed job needs envar DUNE_API_KEY to be set first.

image

Once uploaded, dataset becomes accessible like so.

Copy link
Owner

@davidgasquez davidgasquez left a comment

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Great job! Easier that what I was expecting. 😊

I was thinking about this yesterday and came up with an alternative way of doing it that I think might fit better the Dagster approach.

Basically, we don't use Ops and Jobs and instead rely entirely on Dagster assets.

Similar to the approach Dagster and others take in their repos, the idea is to remove the IOManager and handle the materialization with a resource. In this case, we would have a DuneAPI resource and we could do simple dune.upload(asset) from:

  • New dune_x assets that we assign a different group so we don't materialize them locally.
  • Current assets like raw_round_applications, under an if statement checking if the environment is production.

What I'd love to have and I don't know how easy it would be is for us to generate these Dune assets dynamically basically on previous assets tags. Similar to how the new dbt integration treats dbt assets.

Does all of this makes sense? As always, if this is working. Let's merge it and leave improvements for the next iteration. Just wanted to share what I had in mind.

@DistributedDoge
Copy link
Collaborator Author

DistributedDoge commented Jan 10, 2024

I think it is ready to merge. I really do apperciate the notes tough! Seem very in-line with dagster way, for future refactor:

  • think assets dune_table_x (dune group) not operations push_to_dune
  • instead of op job => refresh dune create asset job => materialize *dune group
  • DuneAPI logic can be packaged into a resource for easier re-use and configuration

What I'd love to have and I don't know how easy it would be is for us to generate these Dune assets dynamically basically on previous assets tags. Similar to how the new dbt integration treats dbt assets.

This is something I can test for Flipside where we can actually afford to load multiple tables without hitting upload limit.

@davidgasquez
Copy link
Owner

One thing before merging. Do you mind checking if re-uploading to Dune the same or new data is possible? I think it might give an error and we might need to do an upsert or something like that.

@DistributedDoge
Copy link
Collaborator Author

Last time I tried, was working as-is. The data branch of DuneAPI ` is limited to single POST endpoint, where you either:

  • upload with new table_name key, which creates new dataset
  • upload with pre-existing which overwrites old data with new one

Note that currently, there is no way to actually remove dataset, except for using interface.

@davidgasquez
Copy link
Owner

Alright! Will try to clear some time tomorrow morning (generate API keys and stuff) and get this deployed! 🚀

@davidgasquez davidgasquez merged commit 9d38bc8 into davidgasquez:main Jan 11, 2024
1 check passed
Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment
Labels
None yet
Projects
None yet
Development

Successfully merging this pull request may close these issues.

2 participants