Skip to content
New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

Introduction Tutorial #281

Merged
merged 12 commits into from
Feb 18, 2022
Merged

Introduction Tutorial #281

merged 12 commits into from
Feb 18, 2022

Conversation

rwegener2
Copy link
Member

What has been built?

This PR address the first part of Issue #280. It creates Part 1 of an updated Introduction Tutorial sequence, which focuses on Defining a recipe.

How was it done?

  • Created a new folder introduction_tutorial with the 1st part of the new tutorial and an index.md for that folder (2 more parts to come)
  • Updated the main docs/index.md with the new structure
  • Removed the old intro_tutorial.md

How can it be tested?

Clicking "Details" for the docs test that gets run by CI.

Notes

As a reminder from the intro issue, my goal with the introduction tutorial is for the user to be able to understand as little information as is necessary to perform a meaningful piece of work with the tool.

There are a few decisions I made while doing this that I wanted to flag for feedback:

  • The pace of this tutorial is quite a bit slower than the old one. My reasoning here is that it is easier (and, at least for me, more mentally satisfying) to skim past material you already know than to be lost. We don't want to loose people on an introduction tutorial.
  • I spend a long time on file patterns. The reason for this is that it strikes me as the core idea of a recipe and so articulating very clearly the implicit steps we are taking, even though they seem obvious, sets the user up to be able to think more abstractly about those steps as the recipes become more complicated.
  • I chose to use the pattern_from_file_sequence structure instead of the FilePattern structure for the reason that I think it is conceptually and technically easier.
  • I propose some language in this tutorial about "generalized URLs" to help the user see the steps involved in defining a file pattern. Defining a Generalized URL refers to the step of defining a recipe where you look at a URL and assess how its structure relates to the data. That idea becomes the format string. I'm not sold on that being the best way to talk about URLs, but I would like if we all agreed on a common term/methodology for that so it can be used consistently throughout the documentation.

One other thing that also occurred to me while writing this is that FilePattern and XarrayZarrRecipe objects for me feel challenging to inspect. I'll start a separate issue on that to follow up and ask questions.

I look forward to feedback and reactions!

@rwegener2 rwegener2 linked an issue Feb 14, 2022 that may be closed by this pull request
@cisaacstern
Copy link
Member

@rwegener2 and I had a great synchronous discussion of this PR. This is definitely pointing us in the right direction for OSM.

Just to record next steps:

  • Rachel is going to work on a few stylistic points we discussed for Part 1: Define a recipe.
  • In anticipation of Part 2: Run a recipe locally, I am going to look into:
    • Setting default storage targets
    • Providing a built-in setup_logging function

Rachel and I are going to check in again on video about these items on Thursday.

@rwegener2
Copy link
Member Author

@rabernat In the "Recipe Tutorials" section you were able to get the .ipynb files to show up. I can only figure out how to get readthedocs to see a file if I convert my notebook to a markdown file. Do you recall how you got the notebooks to render?

@rabernat
Copy link
Contributor

In the "Recipe Tutorials" section you were able to get the .ipynb files to show up.

Can you clarify what you mean by "show up"?

The sphinx site is configured with the myst_nb extension

extensions = [
"myst_nb",

This allows us to treat notebooks as valid documentation pages, e.g.

```{toctree}
:maxdepth: 1
xarray_zarr/netcdf_zarr_sequential
xarray_zarr/cmip6-recipe
xarray_zarr/multi_variable_recipe
xarray_zarr/terraclimate
xarray_zarr/opendap_subset_recipe
```

These references are to ipynb files.

@review-notebook-app
Copy link

Check out this pull request on  ReviewNB

See visual diffs & provide feedback on Jupyter Notebooks.


Powered by ReviewNB

@rwegener2
Copy link
Member Author

rwegener2 commented Feb 18, 2022

Done

The latest push adds improvements from @cisaacstern:

  • changing the way that the file structure of OISST is conveyed visually
  • always using FilePattern as a python object, not a phrase
  • display of the example file list under "Defining a FilePattern"
  • using the full length of the time series instead of a sample month
  • remove the word "datetime"
  • add a suggestion for inspecting a FilePattern

Saved for later

Ideas that I like but are enough of a lift that I would like to save them for a separate PR:

  • a stronger visual for dataset structure (ideally usable across all types of datasets)
  • a toggle dropdown that progressively shows how the code we are writing builds up. (Example here)

Ready for review

I think this is set so I'm requesting review! If I forget someone or add an unnecessary someone of course just update the reviewers.

@rabernat
Copy link
Contributor

@rwegener2 this is great progress! I had a quick skim and am going to give this a more thorough review later today.

One general comment I have is that I really think we should not use pattern_from_file_sequence in the introduction tutorial. I understand why it may seem easier or simpler. But in my mind, it is actually more complicated, and totally redundant, to manually generate the list of files. Instead, we should be encouraging users to write a function that generates the filename and passing this directly to FilePattern.

If helpful to flesh this idea out, I could push a commit to this branch which rewrites the tutorial along these lines. If we don't like it, we could revert. How does that sound?

@rwegener2
Copy link
Member Author

rwegener2 commented Feb 18, 2022

@rabernat That sounds great. I think it's important that the way of generating a file pattern is the easiest way possible, so if you think that way is without pattern_from_file_sequence I'm interested to see that approach. Thanks for being willing to demonstrate that!

@rabernat
Copy link
Contributor

@rwegener2 - have a look at what I just pushed and let me know if you think it is clear. I think it is actually more understandable this way, but of course I wrote the code so I am biased! 🙃

@rwegener2
Copy link
Member Author

Thanks @rabernat. I really like the tone and clarity you have while writing.

My thoughts on the direct use of FilePattern: I do think that is a more conceptually challenging flow, but on the flip side it starts building a foundation for how File Pattern objects work that will be useful to a user that continues to use the system. If I were being totally honest, I would personally stick with pattern_from_file_sequence because it gets the user accomplishing the tangible end goal (making ARCO data) sooner. It may be less elegant, but if a user is energized by that small success they might stick around to learn about FilePatterns in more depth. @rabernat you sound really passionate about preferring the direct use of FilePattern, though, so I can get behind your intuition here and support FilePattern object.

So, moving forward with the direct use of FilePattern object: I like the text @rabernat wrote. To keep the amount of material more even between tutorials I want to move making the recipe object (basically just the line recipe = XarrayZarrRecipe(pattern, inputs_per_chunk=10)) to Part 2 and let Part 1 be just creating a FilePattern object. I won't be able to do that before the end of today but I'll get another commit queued up for Monday.

Thanks for all the review, everyone. I'm happy we are dedicating time to this.

@rabernat
Copy link
Contributor

I can get behind your intuition here and support FilePattern object.

Thanks for your flexibility here Rachel. I do feel strongly that this is the best way to build a foundation of understanding. But I note your concerns about the extra complexity to digest. A deliberate design decision in Pangeo Forge has been to move a lot of the complexity in the FilePattern layer. That way the Recipe itself can be pretty simple.

I want to move making the recipe object...to Part 2

👍 from me

Do you want to merge this as is and keep working via another PR?

@rwegener2
Copy link
Member Author

In the spirit of iteration I think we can merge. Just give me a 👍🏻 if that is my job. If someone else does it they should feel free to squash merge because my commits got pretty messy.

@rwegener2 rwegener2 merged commit bc773e4 into pangeo-forge:master Feb 18, 2022
@rwegener2 rwegener2 deleted the intro_tutorial branch February 18, 2022 21:42
@cisaacstern
Copy link
Member

🎉 yay!

@rwegener2 rwegener2 added the documentation Improvements or additions to documentation label Mar 3, 2022
Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment
Labels
documentation Improvements or additions to documentation
Projects
None yet
Development

Successfully merging this pull request may close these issues.

Updated introduction tutorial
3 participants