-
-
Notifications
You must be signed in to change notification settings - Fork 1.1k
New issue
Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.
By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.
Already on GitHub? Sign in to your account
Repo layout proposal #10089
Comments
I agree. One specific suggestion: breaking |
I also support this.
There's another opportunity for refactoring that could split up large files in #9203. Related is the general issue of scope creep within the main repository. I think at some point we should revisit the idea of splitting out as much non-core functionality as possible into a separate package (there was a very old issue about this that I'm struggling to find right now - proposing " |
This would also improve the performance of real-time developer tools like lsp servers, linters, auto-formatters, tree-sitter, etc. (EDIT: not reworking the repo layout but reducing LOC files). One small suggestion: |
Start of pydata#10089 Move compatibility-related modules from xarray.core to new xarray.compat package: - array_api_compat.py - dask_array_compat.py - dask_array_ops.py - npcompat.py - pdcompat.py
Start of pydata#10089 Move compatibility-related modules from xarray.core to new xarray.compat package: - array_api_compat.py - dask_array_compat.py - dask_array_ops.py - npcompat.py - pdcompat.py
Start of #10089 Move compatibility-related modules from xarray.core to new xarray.compat package: - array_api_compat.py - dask_array_compat.py - dask_array_ops.py - npcompat.py - pdcompat.py
To what extent do we need to maintain backward-compat? (I would obv prefer to just move files and not build stub layers, but if there are other libs relying on these location of files, rather than just importing from the top level, then we can add them...) |
Any library not importing from the top level is using private API, which has no backwards compatibility guarantees. That's what our docs say. We have actually made significant changes to the locations of code multiple times in the past couple of years (especially as part of the NamedArray refactor, DataTree integration, and adding the ChunkManager entrypoint), with no real complaints. |
So far, I've:
I'll probably let that sit for a while and see how it goes. Some things we could do next:
The more difficult thing is going to be breaking up The alternative is to do nothing, wait for LLMs to get better with big files (Claude Code can already grab segments of files, for example). Many of the annoyances of us working with huge files, like |
That all sounds great @max-sixty . I think you can be relatively aggressive about this, and I'm happy to review things.
I asked Claude for ideas but several of them require doing things dynamically upon import. |
Results from a quick search of |
great idea @benbovy ! I don't see anything too concerning in the first few results — mostly hits for dataarray & dataset... |
Wow this is interesting! So many people importing classes from Quite a few hits for things in the A lot of the rest seem to be for the purposes of type hinting (e.g. I agree overall this don't look too concerning. |
|
The tests need to be reorganized too. A particular offender is cc @dcherian |
Also @max-sixty what do you think of having some kind of But I'm not sure how to name it - would it be |
The second approach, which Claude calls "2. Module-level functions with I don't think you need # xarray/core/curvefitting.py
from typing import TYPE_CHECKING
if TYPE_CHECKING: # avoid recursive import
from xarray.core.dataset import Dataset
def curvefit(self) -> Dataset: # or use typing.Self
"""Curve fitting optimization for arbitrary functions."""
...
# xarray/core/dataset.py
from . import curvefitting
class Dataset:
curvefit = curvefitting.curvefit |
I don't have a strong view on the exact layout one option — given we don't have many data structures and would like to split up then similarly for the other data structures
|
what did you mean by this @TomNicholas ? that there is code which we apply to multiple data structures? I would have thought that then lives in or is there code which is data structure related but exists across data structures? |
* Move chunks-related functions to a new file Part of #10089 * [pre-commit.ci] auto fixes from pre-commit.com hooks for more information, see https://pre-commit.ci * [pre-commit.ci] auto fixes from pre-commit.com hooks for more information, see https://pre-commit.ci --------- Co-authored-by: pre-commit-ci[bot] <66853113+pre-commit-ci[bot]@users.noreply.github.com>
* Move fit computation code to dedicated new file Part of #10089 * . * [pre-commit.ci] auto fixes from pre-commit.com hooks for more information, see https://pre-commit.ci --------- Co-authored-by: pre-commit-ci[bot] <66853113+pre-commit-ci[bot]@users.noreply.github.com>
What is your issue?
As part of the efforts described in #10039, I added #10088, and noticed the repo layout has arguably not kept up with the code growth over the past decade. This isn't the most pressing issue, but it does make the returns to refactors lower, since we're moving lines from 11K LOC files to 1K LOC files, rather than anything smaller.
(Even if you think LLMs aren't that useful / aren't going to get better / etc; these changes would still make the repo easier for people to navigate...)
In particular, 2/3 of our code is in
xarray/core
— 66873 LOC vs 97118 LOC inxarray
I can imagine splitting this up into a few categories:
dask_array_*
,npcompat
,pdcompat
,array_api_compat
computation
,arithmetic
,nanops
,weighted
, thecurvefit
that's currently indataset
,rolling
,rolling_exp
, maybemissing
merge
,alignment
,concat
I'd propose having each of those be paths within
xarray/
. Then there's more freedom to make new files within those paths relative to the current state, where a new file means adding onto a very long list of files inxarray/core
.I'm not confident on how much disruption that can cause to existing PRs. I think if we land them as commits which mostly just move the files, then git will mostly handle merges well. We can start slowly and see how it goes...
The text was updated successfully, but these errors were encountered: