Skip to content

Conversation

TomNicholas
Copy link
Member

@TomNicholas TomNicholas commented Sep 18, 2025

This refactor deals with a few related code structure issues:

  • core/datatree_io.py exists totally separate from the rest of the backends code
  • backends/api.py mixes code for opening data (which are truly part of the backend engines system) with code for saving data (which are not)
  • backends/api.py contains code that is not actually part of the public API

My changes instead move all code that has to do with writing out into a new backends/writers.py file. This means we can delete core/datatree_io.py, and everything in new backends/api.py is now actually public API.

There should be no functional changes, I'm just moving things around to separate concerns and organise better.

I'm open to better suggestions for names for the writers.py file.

Note this refactor is arguably the first step of #5954.

@TomNicholas TomNicholas added topic-backends topic-internals topic-DataTree Related to the implementation of a DataTree class labels Sep 18, 2025
@github-actions github-actions bot added the io label Sep 18, 2025
Copy link
Member

@shoyer shoyer left a comment

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

This looks great, thanks!

@shoyer
Copy link
Member

shoyer commented Sep 18, 2025

Would it make sense to rename api.py now to readers.py or highlevel.py?

@TomNicholas
Copy link
Member Author

TomNicholas commented Sep 19, 2025

Would it make sense to rename api.py now to readers.py or highlevel.py?

Maybe, but I actually kind of think api.py and writers.py should not be in the backends module at all. They call the code in backends, but they don't define the backends interface nor do they actually implement specific backends. In theory they should only really even use the public API defined in the rest of the backends module.

It would make more sense to me to only keep the code in the backends module that actually has to do with the backend registration system, and instead put the contents of api.py and writers.py into core/io/readers.py and core/io/writers.py or something like that.

But I was afraid of making such a drastic changes in this one PR in case it breaks things (potentially in external libraries that should not be importing these things, but probably are anyway), so I kept this PR to changes that I thought would be less likely to cause any issues.

cc'ing @max-sixty who I bet would have an opinion on this too 😄 (ref #10089)

@ianhi
Copy link
Contributor

ianhi commented Sep 19, 2025

case it breaks things (potentially in external libraries that should not be importing these things, but probably are anyway),

An in-between is to make the change to make xarray have an improved layout, and keep api.py for a bit, importing everything from the other files with a very obnoxious warning that this is deprecated and going away soon.

Copy link
Contributor

@dcherian dcherian left a comment

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Nice!

Copy link
Collaborator

@max-sixty max-sixty left a comment

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

looks great!

(no strong view on the api vs readers etc!)

@TomNicholas TomNicholas enabled auto-merge (squash) September 19, 2025 14:11
@TomNicholas TomNicholas merged commit 6470460 into pydata:main Sep 22, 2025
36 checks passed
@TomNicholas TomNicholas deleted the refactor-datatree-io branch September 22, 2025 16:50
TomNicholas added a commit to earth-mover/icechunk that referenced this pull request Sep 22, 2025
TomNicholas added a commit to earth-mover/icechunk that referenced this pull request Sep 22, 2025
* handle changes in pydata/xarray#10771

* remove split_every line as that will go in a follow-up PR

* attempt to satisfy myppy
Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment
Labels
io topic-backends topic-DataTree Related to the implementation of a DataTree class topic-internals
Projects
None yet
Development

Successfully merging this pull request may close these issues.

5 participants