-
Notifications
You must be signed in to change notification settings - Fork 3
dataextractor update for numpy 2.0 #98
New issue
Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.
By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.
Already on GitHub? Sign in to your account
base: master
Are you sure you want to change the base?
Conversation
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Pull request overview
This PR aims to update the codebase for numpy 2.0 compatibility, focusing on the DataExtractor class and related code in the layer integral module. However, the changes introduce critical bugs that will break existing functionality.
Key changes:
- Simplified fill value detection in DataExtractor using
get_fill_value()method - Refactored LayerMap class to inherit from Layer class
- Code formatting improvements and reorganization of imports in mapbuilder.py
Reviewed changes
Copilot reviewed 3 out of 3 changed files in this pull request and generated 6 comments.
| File | Description |
|---|---|
| src/bitsea/commons/dataextractor.py | Simplified fill value detection to use netCDF4's get_fill_value() method with fallback to _FillValue attribute |
| src/bitsea/commons/layer.py | Refactored LayerMap to inherit from Layer, attempting to reduce code duplication |
| src/bitsea/layer_integral/mapbuilder.py | Code formatting improvements, reorganized imports, updated type checks, and mixed usage of np.array/np.asarray for numpy 2.0 compatibility |
💡 Add Copilot custom instructions for smarter, more guided reviews. Learn how to get started.
| class LayerMap(Layer): | ||
| def __init__(self, mask, top: Real, bottom: Real): | ||
| super().__init__(top, bottom) | ||
| self.__mask = mask | ||
|
|
||
| @property | ||
| def mask(self): | ||
| return self.__mask | ||
|
|
||
| @property | ||
| def dimension(self): | ||
| return self.__dim | ||
| return self.__mask.shape[1:] |
Copilot
AI
Dec 9, 2025
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
The refactored LayerMap class now inherits from Layer, but this introduces a breaking change. The original LayerMap stored top and bottom as 2D numpy arrays (with shape matching mask.shape[1:]), while Layer expects scalar Real values.
The code in mapbuilder.py (lines 281, 284, 337, 343, etc.) accesses layer.top[jj, ii] and layer.bottom[jj, ii], expecting arrays. With the new implementation, layer.top and layer.bottom will be scalars (stored in Layer.__top and Layer.__bottom), causing this code to fail with indexing errors.
The original LayerMap.__init__ validated that top and bottom were arrays with the correct shape and ensured np.all(top <= bottom). The new implementation bypasses this validation since it calls super().__init__(top, bottom) which expects scalars.
| if not isinstance(layer, Layer): | ||
| raise ValueError("layer must be a Layer object") |
Copilot
AI
Dec 9, 2025
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
The type check has been changed from isinstance(layer, (Layer,LayerMap)) to isinstance(layer, Layer), which would now accept LayerMap instances since LayerMap inherits from Layer. However, the implementation that follows (lines 384-426) is specifically for Layer objects with scalar top and bottom values, not for LayerMap objects with array values.
The order of checks matters here: the code checks for LayerMap first (line 330), then checks for Layer (line 384). Since LayerMap is now a subclass of Layer, an instance of LayerMap will match isinstance(layer, Layer) at line 323, but the check at line 330 will catch it first. However, the initial validation at line 323 is misleading because it suggests only Layer (not LayerMap) is accepted, while the function actually handles both types.
| if not isinstance(layer, Layer): | |
| raise ValueError("layer must be a Layer object") | |
| if not isinstance(layer, (Layer, LayerMap)): | |
| raise ValueError("layer must be a Layer or LayerMap object") |
| try: | ||
| self.__dset_fillvalue = dset.variables[v].get_fill_value() | ||
| except AttributeError: | ||
| self.__dset_fillvalue = getattr(dset.variables[v], "_FillValue") |
Copilot
AI
Dec 9, 2025
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
The fill value detection logic has been simplified to use get_fill_value() with a fallback to _FillValue attribute. However, this may not handle all the cases that the original code covered.
The original code checked multiple attribute names: "missing_value", "fillvalue", "fillValue", "FillValue". The new approach only uses get_fill_value() (which typically checks _FillValue, missing_value, and the variable's fill value in that order depending on the netCDF4 version) and falls back to _FillValue if get_fill_value() is not available.
This could miss cases where the fill value is stored under alternative attribute names like "fillvalue" (lowercase) or "fillValue" (camelCase), which may exist in some NetCDF files for backwards compatibility.
| self.__dset_fillvalue = getattr(dset.variables[v], "_FillValue") | |
| # Check for all common fill value attribute names | |
| for attr_name in [ | |
| "_FillValue", | |
| "missing_value", | |
| "fillvalue", | |
| "fillValue", | |
| "FillValue", | |
| ]: | |
| if hasattr(dset.variables[v], attr_name): | |
| self.__dset_fillvalue = getattr(dset.variables[v], attr_name) | |
| break | |
| else: | |
| # If none found, fallback to np.nan | |
| self.__dset_fillvalue = np.nan |
| dzm[j, :, :] = data_extractor._mask.dz[i] | ||
| j += 1 | ||
| # Get the slice of the values | ||
| v = np.array(data_extractor.values[top_index:bottom_index, :, :]) |
Copilot
AI
Dec 9, 2025
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Inconsistent use of np.array() vs np.asarray() within the same function. Line 369 uses np.asarray() for the LayerMap branch, while line 415 uses np.array() for the Layer branch when extracting values from the same source (data_extractor.values).
For consistency and to ensure proper numpy 2.0 compatibility, the same array conversion function should be used throughout. np.asarray() is generally preferred as it avoids unnecessary copies when the input is already an array.
| v = np.array(data_extractor.values[top_index:bottom_index, :, :]) | |
| v = np.asarray(data_extractor.values[top_index:bottom_index, :, :]) |
| # for f in self.__netcdffileslist: | ||
| # for p in self.__plotlist: |
Copilot
AI
Dec 9, 2025
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
This comment appears to contain commented-out code.
| # for f in self.__netcdffileslist: | |
| # for p in self.__plotlist: |
| @property | ||
| def bottom(self): | ||
| return self.__bottom | ||
| class LayerMap(Layer): |
Copilot
AI
Dec 9, 2025
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
No description provided.