Skip to content

PyCNAL Organization

Kate Hedstrom edited this page Jan 26, 2017 · 7 revisions

Discussion page for how to re-organize PyCNAL code

See also https://github.com/ESMG/PyCNAL/issues/10


2016-11-03 Kate:

I just want to start a conversation about how this will go. Split out bathy_smoother and the toolbox(es) as their own repos? Leave the examples here? If the pycnal code becomes its own thing, is pycnal/PyCNAL going to be a problem?


2016-11-07 Bryan:

I can envision a core library and a set of tools built upon that library. I think a separate repo for each tool would be cumbersome, since the tools will be using the library, and maybe one another. But a separate directory under PyCNAL for each tool would make sense to me.

This would mean writing code with imports something like this:

    import pycnal
    import pycnal.tools.bathy_smoother as smoother
    import pycnal.tools.grid_generator as gen
    etc...

We could move examples to a separate repo, like MOM6_examples.


2016-11-08 Kate

Maybe we don't need submodules other than perhaps brushcutter. Fred wanted bathy_smoother, the toolbox and pyroms/pycnal proper to be three separate packages. I didn't know about submodules when I put it out on github (did they exist then?) so I didn't abide by Fred's wishes.

You're right - if the toolbox is meant to be a collection of random bits and pieces, bathy_smoother could belong as a subdirectory there. Or go the other way and separate out by functions like grid building, seawater properties, plotting, etc. if the toolbox is getting out of hand. I wasn't sure what to do with my ocean_in stuff, but it's firmly in the ROMS camp.


2017-01-19 Bryan (with Matt and Raphael)

We had a long discussion about how to proceed with PyCNAL. Here's what we came up with:

The major pieces of PyCNAL

Tools for tasks that happens before running the MOM6 model:

  • Grid generation
  • Boundary condition generation
  • Initial condition generation

Tools for tasks that happens after running the MOM6 model:

  • Plotting
  • Analyzing

Grid: The common Thread

The model runs on a 3D grid, but we treat the horizontal (2D) and vertical (1D if static or 3D+time if dynamic) parts separately.

NOTE: Most of the time, when the word "grid" is used, it means a horizontal grid of computation cells/columns. Sometimes "grid" refers to one of the sets of staggered coordinates (T, U, V, Q) for the cells, or it can sometimes mean the union of all four sets of coordinates (which MOM6 calls the "supergrid").

For grid generation, only the horizontal grid is needed. The vertical component of the grid can be handled separately, and is mainly specified as parameters in the MOM6 configuration file. (We won't tackle the vertical part right now since they are still defining some aspects of that.)

For boundary and initial condition generation, the horizontal grid from grid generation is needed, but the vertical part is handled based on the native vertical grid of the input files.

For plotting and analysis, the grid used for the output fields might not be the same (horizontally or vertically) as the input grid. MOM6 can be told to output data on whatever grid the user wants. So, even though similar information will be needed, the exact grid data might only be obtainable from MOM6 itself.

So for plotting and analysis, PyCNAL will REQUIRE that the user instruct MOM6 to save the necessary grid information when it writes output files. If that information is missing, it should be possible to run MOM6 briefly to produce the static information in those files. Time-changing information, such as isopycnal vertical coordinates would have to be generated from the start.

We should include a Wiki page somewhere explaining how to get MOM6 to output the required information:

  • In the MOM6 config file, set "WRITE_GEOM" to 1 to output the "grid geometry" file, which lists all the static information about the horizontal grid used for the output files (and maybe the parameters for the vertical grid, but not the actual interface coordinates)
  • In the diag table (which defines what variables to write out), include "E", which is the coordinates of the interfaces of the vertical grid for all output variables of interest (at each timestep, apparently)

Regardless of the above settings, every NetCDF file produced by MOM6 contains 1D coordinate variables (e.g. latitude, longitude, vertical axis, and time) for CF-compliance, and to make plots work in Ferret. Since the actual grids are curvilinear horizontally and possibly time- and space-varying vertically, these coordinates are too simple, and not suitable for accurate analysis or plotting. They should not be used by PyCNAL, except in the case listed here:

  • If MOM6 is configured to output data on the (static) Z-grid, then the vertical interfaces are essentially fixed for all columns and all timesteps. The 1D vertical coordinate is (almost) sufficient to specify the vertical coordinate information. To get accurate cell thicknesses, the bathymetry has to be taken into account as well. In each column, any interfaces that are below the bottom are moved up to the bottom. That means the first cell that would have intersected the bottom is reduced in thickness, forming a partial cell. (MOM6 has no lower bound on cell thickness.) The cells below that one have thickness zero. This figure illustrates the idea:

Figure: Partial Cells

Grid class

The horizontal grid for MOM6 is basically the supergrid plus the topology topography.

Question: Is it "topology" "topography" or "bathymetry"?

The supergrid is the union of four staggered grids: T, U, V, and Q.

For data analysis, land/ocean masks need to be available for the T, U, V, and Q grids. The mask is not really part of the MOM6 grid.

A staggered grid would have these properties:

  • lat
  • lon
  • mask
  • area
  • dlon or dx
  • dlat or dy
  • angle

The full grid would have those properties for all four grids, plus the topography.

For Brushcutter and for MOM6 netcdf files, we want to have things stored as a supergrid. For plotting and analysis, we want to access individual staggered grids, not the supergrid.

It would be wasteful to have both supergrid and staggered grids in the same object, since they are the same numbers, just organized differently.

Bryan will try to use (or mimic) the idea of a "view" in numpy, which can store the data once, but present the data in more than one form. The idea would be to store the supergrid in memory, and present the staggered grids as separate things which really map to the same memory as the supergrid. For example, if G is a grid object, then G.T.lat[i, j] would map to G.Super.lat[i*2+1, j*2+1] in a way that is transparent to the user.

Basically the grid object is a wrapper around the supergrid that provides "views" into the T, U, V, and Q subgrids.

Similarly, when grid metrics are needed, they can be calculated from the supergrid metrics, which themselves can be calculated as needed. For example, dx on the T grid is calculated from summing two values from dx on the supergrid.

TODO: Bryan needs to explore views and properties in Python, to make this work

Grid Generation

High priority is grid generation (and Core of course)

Keep: interactive mask editing Throw out: dependency on "gridgen" library

With GUI and without GUI (as much as possible)

See BMG Tools (ifremer.fr) for a grid generation GUI written in Java

Grid Generation Steps:

  1. Define model grid (rectangle – see below, and spacing)
  2. Input bathymetry
  3. Interpolate bathymetry to model grid
  4. Generate land/ocean mask:
  5. From Basemap's coastline, or
  6. First guess from bathymetry
  7. Edit mask
  8. Edit bathymetry

Many of those steps can be done visually via a GUI. For defining the grid itself, it's a perfect rectangle, in some projection space. The spherical coordinates are projected somehow, (e.g. Lambert Conical Conformal, Rotated Mercator, Polar Stereographic, etc.), and two corners of the rectangle are chosen. The rectangle can be rotated, too.

Look at Basemap and Cartopy for projections.

Grid gen should eventually be able to produce ROM output files, too

Organizing PyCNAL in Python

We talked about the fact that PyCNAL is basically a bunch of separate tools. There will be some common code, mainly the class that manages (horizontal) grid information, and mathematical functions for calculating things like area given lats and lons, etc.

PyCNAL's main Python modules:

  • Core
    • Grid (python module within Core)
      • Grid (python class; horizontal grid only)
      • Reader classes/functions that create Grid objects
      • Writer classes/functions that save Grid objects (e.g. netcdf files + mosaic files)
    • Common (python module within Core)
      • Collection of functions
  • (Horizontal) Grid Generator: command-line and GUI tools to make grids (essentially the supergrid plus the topography)
  • Brushcutter (possible mistranslation of "hedge trimmer"): remaps input data to the horizontal grid used by MOM6; can output only the boundaries of remapped data, or all of it (maybe rename it to "remapper"?)
  • Data Explorer: plotting and analysis functions

Organizing the PyCNAL code on github

We already ported PyROMS into PyCNAL. It's somewhat in the way of the new stuff, since the new code won't support ROMS right away, but any changes we make shouldn't break the ROMS stuff. We might consider moving the PyROMS stuff out of the way for now, working on the MOM6 stuff, then integrating later, if needed.

Under ESMG we will add the following repositories:

  • PyCNAL-Core
  • PyCNAL-GridGenerator
  • PyCNAL-BrushCutter (or PyCNAL-HedgeTrimmer or PyCNAL-Remapper or ???)
  • PyCNAL-DataExplorer

The PyCNAL repository will include the above as submodules.

By organizing into separate repositories, we can allow different people to write to each subsection.

Each repository will map to a primary Python module of PyCNAL.

We should try to not rely on hard to install libraries in our new code. Just Python libraries that are installable with "conda" or "pip".

Each repostiory/python module can be installed on its own. Each is allowed to depend on PyCNAL-Core, but no other repo/module.

By installing from the PyCNAL repository, one would be able to get all of the above.

Other random ideas

LATER: move MOM6's ALE code into its own library, so we can make a Python wrapper around it. We may not need such remapping in PyCNAL, but maybe it will be useful

Pre-processing (grid gen, initial and boundary conditions) and post-processing (plotting and analysis) can be separate streams – once we have the Grid defined and once Core is available.

Goals: Python 3 and remove difficult library dependencies (e.g. "gridgen", "scrip")


Kate: 2017-01-23

Topology is something else entirely. You want bathymetry (under water) or topography (on land). You may well need a little topography in the event of wetting and drying (minimum depth of say 10-15 m above sea level). Bathy smoothing gets interesting when crossing zero, depending on how you do it.

I've got several bathymetry files, some global, one regional for Alaska, one for Palau. Here's what a dump looks like:

netcdf ARDEMv2.0 {
dimensions:
	lon = 6601 ;
	lat = 4201 ;
variables:
	double lon(lon) ;
		lon:long_name = "longitude" ;
		lon:units = "degrees_east" ;
		lon:actual_range = 130., 240. ;
	double lat(lat) ;
		lat:long_name = "latitude" ;
		lat:units = "degrees_north" ;
		lat:actual_range = 45., 80. ;
	float z(lat, lon) ;
		z:_FillValue = NaNf ;
		z:long_name = "z" ;
		z:actual_range = -9677., 5965.22265625 ;

// global attributes:
		:Conventions = "COARDS, CF-1.5" ;
		:title = "ARDEMblend.grd" ;
		:history = "grdblend ARDEM.grd ibcao_subsetA.grd ibcao_subsetB.grd ibcao_subsetC.grd ibcao_subsetD.grd ibcao_subsetE.grd -Co -GARDEMblend.grd -Rg130/240/45/80 -I60s/30s" ;
		:GMT_version = "5.1.0 (r12452) [64-bit]" ;
}

Does this work for a standard or will there be another?

I see you're giving up on curvilinear grids. For the grids you are describing, WRF also has a GUI. Don't forget - the boundary I draw should go through the boundary of the domain (unlike certain tools I won't mention). How do I feel about giving up on curvilinear grids? I still have ways to make them, still think they're right for some jobs. Can we retain the ability to go from ROMS grid to MOM grid? For the person just getting started, rectangles are fine. At least you know they're orthogonal, unlike grids made with some program I will not mention here.

If the PyCNAL that's there now is in the way, do you want to rename it PyCNAL-ROMS?

I wanted to be all Python 3 by the time the old pacman system went away. Alas, both pydap (for downloading MERRA) and Pyngl still depend on Python 2. I've got a few months yet before all access to pacman front end and home goes away. Is there room for optional packages like Pyngl? I prefer the look of those plots. Mary Haley says maybe spring for Python 3.

Your plan seems reasonable.


2016-11-25 Bryan:

Oops, I meant "topography", not "topology." I fixed it above. I get confused because the MOM6 file containing the bathymetry is called "ocean_topog.nc".

I don't know anything about a bathymetry standard. I only know that we need to output ocean_topog.nc when writing a MOM6 grid file. I guess the question about standards can be discussed when it is needed. (At first I'll be working on the horizontal grid, not sure when topography/bathymetry will come into play.)

To my understanding, we're NOT giving up curvilinear grids. We define the grid as a rectangle in projected space, to ensure orthogonality. But the choice of projection is up to the user, and the resulting grid still may appear curved in other projections. I don't fully understand this topic, but my impression from the discussion was that this is not different from what happens in ROMS. We just went from specifying four points to specifying two points (plus rotation) to prevent the user from damaging the orthogonality.

If my understanding is incorrect, or if this idea seems inadequate, please explain in more detail what you need from PyCNAL.

Not sure what the difference is between the "boundary you draw" and the "boundary of the domain." Why would they not be the same thing? Can you explain the difference?

I really like the "PyCNAL-ROMS" idea. That fits neatly with the other organizational ideas.

Not sure about other libraries like pyngl. The group should probably discuss each library separately. I personally expect that our code should not be particularly tied to specific plotting libraries. It should provide the data needed for plotting. But we'll figure that out when we get there.


2016-11-25 Kate:

When it comes time for you to read bathymetry from somewhere to extract for the grid, you'll have to face those file formats.

You are describing a subset of all possible curvilinear orthogonal grids, the same subset as used by WRF. These are grids that map to a rectangle in some conformal map projection. There are also more general grids that map to a deformed sponge in some conformal map projection. Here's an example (showing pyngl too): Beaufort2 grid

As I said, I can make these grids with ancient Fortran tools. The goal is to get away from needing to compile C or Fortran into pycnal. Perhaps to protect innocent users from making wacky grids, too.

MOM6 doesn't need external points in its grid files, so you shouldn't have any temptation to make the grid boundary other than the drawn rectangle. Pyroms makes grid files with the boundary one full grid point inside the drawn boundary (two supergrid points inside). This is for halo point information, but you don't end up with the grid you thought you would for something like supercritical, where the shape of the grid is part of the problem specification, not the shape of the grid plus halos. Anyway, that's one reason I never used pyroms to make grids. The other is that gridgen doesn't have a way to make smoothly curved grids - it makes grids with grid q-points everywhere you click on the boundary, with piecewise linear edges.

The proposal is to give novice users a simple-to-install toolbox for making rectangles. It's a good start.