Skip to content

Update mechanism for determining forecast dataset format#107

Merged
maddenp-cu merged 15 commits intoNOAA-GSL:mainfrom
maddenp-cu:data-format-update
Feb 20, 2026
Merged

Update mechanism for determining forecast dataset format#107
maddenp-cu merged 15 commits intoNOAA-GSL:mainfrom
maddenp-cu:data-format-update

Conversation

@maddenp-cu
Copy link
Collaborator

An apparently valid HRRRCast GRIB file was seen recently that wxvx could not classify as GRIB. The method used by the python-magic library uses the same file signature (aka "magic") technique as the standard Unix utility file, and file makes the same mistake: Compare its view of one GRIB file

$ file gfs.t00z.pgrb2b.0p25.f000 
gfs.t00z.pgrb2b.0p25.f000: Gridded binary (GRIB) version 2

to that of the HRRRCast GRIB file in question:

$ file hrrrcast.avg.t09z.pgrb2.f14 
hrrrcast.avg.t09z.pgrb2.f14: DOS/MBR boot sector; partition 1 : ID=0x1a, active 0xb5, start-CHS (0x2d5,70,42), end-CHS (0x16a,171,20), startsector 3047838125, 450210374 sectors; partition 2 : ID=0x51, active 0xab, start-CHS (0x1ad,84,42), end-CHS (0x24e,170,53), startsector 2870662570, 1370319444 sectors; partition 3 : ID=0xd5, active 0xaa, start-CHS (0x1aa,181,6), end-CHS (0x254,26,43), startsector 2857479530, 3584708277 sectors

My guess is that file gets a hit on the DOS/MBR signature first and accepts that, even though the GRIB signature would have matched, too.

As far as I can tell, the HRRRCast GRIB file is valid: ecCodes' grib_dump and grib_ls report no errors processing it.

This PR replaces the python-magic mechanism with one that classifies forecast datasets by trying to open them with the zarr and netCDF4 libraries first, then inspecting the initial bytes in the file to see if it is apparently GRIB. wxvx doesn't need the generality of python-magic since it only supports three forecast-dataset formats. The updated code classifies the troublesome HRRRCast GRIB file correctly, as well as a handful of real-world netCDF and Zarr datasets.

As an optimization and as a safety override, a new config key is also provided to suppress automatic classification: If the user sets forecast.format to grib, netcdf, or zarr, wxvx will accept that designation and do no checks.

@maddenp-cu maddenp-cu marked this pull request as ready for review February 20, 2026 05:40
@maddenp-cu maddenp-cu merged commit 5cca902 into NOAA-GSL:main Feb 20, 2026
1 check passed
@maddenp-cu maddenp-cu deleted the data-format-update branch February 20, 2026 15:52
Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment

Labels

None yet

Projects

None yet

Development

Successfully merging this pull request may close these issues.

2 participants