Model Agnostic Analysis

Introduction

In this tutorial model agnostic analysis means writing your notebook so that it can easily be used with any CF compliant data source.

What are the CF Conventions?

From CF Metadata conventions:

The CF metadata conventions are designed to promote the processing and sharing of files created with the NetCDF API. The conventions define metadata that provide a definitive description of what the data in each variable represents, and the spatial and temporal properties of the data. This enables users of data from different sources to decide which quantities are comparable, and facilitates building applications with powerful extraction, regridding, and display capabilities. The CF convention includes a standard name table, which defines strings that identify physical quantities.

In most cases the model output data accessed through the COSIMA Cookbook complies with some version of the CF conventions, enough to be usable for model agnostic analysis.

Why bother?

Model agnostic means the same code can work for multiple models. This makes your code more usable by you and by others. You no longer need to have different versions of code for different models. It makes you and any one who uses your code more productive. It allows for common tasks to be abstracted into general methods that can be more easily reused, meaning less code needs to be written and maintained. This is an enormous produtivity boost.

How is model agnostic analysis achieved?

This can be achieved by using packages that enable this: - cf_xarray for generalised coordinate naming - xgcm to make grid operations generic across data - pint and pint-xarray for handling units easily and robustly

Example

Introduction

This example uses an example analysis, shows how the this might be done in a traditional, model specific, manner, and then implements the same analysis in a model agnostic way.

First step is to import necessary libaries.

[1]:
%matplotlib inline

import cosima_cookbook as cc
import matplotlib.pyplot as plt
import xarray as xr
import numpy as np
import cf_xarray as cfxr
import pint_xarray
from pint import application_registry as ureg
import cf_xarray.units
import cmocean as cm
import cartopy.crs as ccrs
import cartopy.feature as cft

cf_xarray works best when xarray keeps attributes by default.

[2]:
xr.set_options(keep_attrs=True);

Load a dataset using COSIMA Cookbook, so first open a session to the default database

[3]:
session = cc.database.create_session()

Now load surface temperature data from a 0.25\(^\circ\) global MOM5 model

[4]:
experiment = '025deg_jra55v13_iaf_gmredi6'
variable = 'surface_temp'
SST = cc.querying.getvar(experiment, variable, session, frequency='1 monthly', n=12)

This is a 3D dataset in latitude, longitude and time:

[5]:
SST
[5]:
<xarray.DataArray 'surface_temp' (time: 288, yt_ocean: 1080, xt_ocean: 1440)>
dask.array<concatenate, shape=(288, 1080, 1440), dtype=float32, chunksize=(1, 540, 720), chunktype=numpy.ndarray>
Coordinates:
  * xt_ocean  (xt_ocean) float64 -279.9 -279.6 -279.4 ... 79.38 79.62 79.88
  * yt_ocean  (yt_ocean) float64 -81.08 -80.97 -80.87 ... 89.74 89.84 89.95
  * time      (time) datetime64[ns] 1958-01-14T12:00:00 ... 1981-12-14T12:00:00
Attributes:
    long_name:      Conservative temperature
    units:          deg_C
    valid_range:    [-10. 500.]
    cell_methods:   time: mean
    time_avg_info:  average_T1,average_T2,average_DT
    coordinates:    geolon_t geolat_t
    ncfiles:        ['/g/data/hh5/tmp/cosima/access-om2-025/025deg_jra55v13_i...

Model specific

First do this as it might usually be done, in a model specific manner:

  1. Use the time coordinate name in the mean function

  2. Subtract a hard-coded value to convert the temperature degrees celcius from degrees Kelvin (the meta-data says the units are deg_C but this is clearly incorrect)

[6]:
SST_time_mean = SST.mean('time') - 273.15
SST_time_mean
[6]:
<xarray.DataArray 'surface_temp' (yt_ocean: 1080, xt_ocean: 1440)>
dask.array<sub, shape=(1080, 1440), dtype=float32, chunksize=(540, 720), chunktype=numpy.ndarray>
Coordinates:
  * xt_ocean  (xt_ocean) float64 -279.9 -279.6 -279.4 ... 79.38 79.62 79.88
  * yt_ocean  (yt_ocean) float64 -81.08 -80.97 -80.87 ... 89.74 89.84 89.95
Attributes:
    long_name:      Conservative temperature
    units:          deg_C
    valid_range:    [-10. 500.]
    cell_methods:   time: mean
    time_avg_info:  average_T1,average_T2,average_DT
    coordinates:    geolon_t geolat_t
    ncfiles:        ['/g/data/hh5/tmp/cosima/access-om2-025/025deg_jra55v13_i...

Now plot the result

[7]:
plt.figure(figsize=(12, 6))
ax = plt.axes(projection=ccrs.Robinson())

SST_time_mean.plot(ax=ax,
                   x='xt_ocean', y='yt_ocean',
                   transform=ccrs.PlateCarree(),
                   vmin=-2, vmax=30, extend='both',
                   cmap=cm.cm.thermal)

ax.coastlines();
../_images/Tutorials_Model_Agnostic_Analysis_18_0.png

Note that the arctic is not correctly respresented due to the 1D lat/lon coordinates not being correct in the tripole area. See the Making Maps with cartopy Tutorial for more information.

Model agnostic

Now do the same calculation in a model agnostic manner

For this data it is necessary to correct the units attribute first. This shouldn’t be necessary if the metadata is correct

[8]:
SST.attrs['units'] = 'K'

Now use pint to ensure this is in degrees C. Note that if the data was originally in degrees celcius this would be fine, it would do nothing. So this is a way of catering for any temperature units that are understood by pint in a transparent way. Note the call to quantify which invokes pint’s machinery to parse the units and allow unit conversions.

[9]:
SST = SST.pint.quantify().pint.to('C')
[10]:
SST
[10]:
<xarray.DataArray 'surface_temp' (time: 288, yt_ocean: 1080, xt_ocean: 1440)>
<Quantity(dask.array<truediv, shape=(288, 1080, 1440), dtype=float32, chunksize=(1, 540, 720), chunktype=numpy.ndarray>, 'degree_Celsius')>
Coordinates:
  * xt_ocean  (xt_ocean) float64 -279.9 -279.6 -279.4 ... 79.38 79.62 79.88
  * yt_ocean  (yt_ocean) float64 -81.08 -80.97 -80.87 ... 89.74 89.84 89.95
  * time      (time) datetime64[ns] 1958-01-14T12:00:00 ... 1981-12-14T12:00:00
Attributes:
    long_name:      Conservative temperature
    valid_range:    [-10. 500.]
    cell_methods:   time: mean
    time_avg_info:  average_T1,average_T2,average_DT
    coordinates:    geolon_t geolat_t
    ncfiles:        ['/g/data/hh5/tmp/cosima/access-om2-025/025deg_jra55v13_i...

Now take the time mean, but this time use the cf accessor to automatically determine the name of the time dimension. cf_xarray checks the names of variables and coordinates, and associated metadata to try and infer information about the data based on the CF conventions.

To see what cf_xarray information is available just evaluate the accessor:

[11]:
SST.cf
[11]:
Coordinates:
- CF Axes: * X: ['xt_ocean']
           * Y: ['yt_ocean']
           * T: ['time']
             Z: n/a

- CF Coordinates: * longitude: ['xt_ocean']
                  * latitude: ['yt_ocean']
                  * time: ['time']
                    vertical: n/a

- Cell Measures:   area, volume: n/a

- Standard Names:   n/a

- Bounds:   n/a

In this case it has found X, Y and T axes, and longitude, latitude and time axes. These are now accessible like a dict using the cf accessor. Note that this returns the actual coordinate, and many functions just want a simple string argument, which is the name of the coordinate.

cf_xarray also wraps many standard xarray functions allowing cf names to be used, which are automatically converted to the name in the data.

The upshot: just use the cf accessor and then append the required function and use the standard CF coordinate name (in this case they are the same, time, but that is not guaranteed)

[12]:
SST_time_mean = SST.cf.mean('time')
SST_time_mean
[12]:
<xarray.DataArray 'surface_temp' (yt_ocean: 1080, xt_ocean: 1440)>
<Quantity(dask.array<mean_agg-aggregate, shape=(1080, 1440), dtype=float32, chunksize=(540, 720), chunktype=numpy.ndarray>, 'degree_Celsius')>
Coordinates:
  * xt_ocean  (xt_ocean) float64 -279.9 -279.6 -279.4 ... 79.38 79.62 79.88
  * yt_ocean  (yt_ocean) float64 -81.08 -80.97 -80.87 ... 89.74 89.84 89.95
Attributes:
    long_name:      Conservative temperature
    valid_range:    [-10. 500.]
    cell_methods:   time: mean
    time_avg_info:  average_T1,average_T2,average_DT
    coordinates:    geolon_t geolat_t
    ncfiles:        ['/g/data/hh5/tmp/cosima/access-om2-025/025deg_jra55v13_i...

Using the cf_xarray wrapped function makes the code more legible and easier to write, i.e.

SST.cf.mean('time')

compared to

SST.mean(SST.cf['time'].name)

In the same way the cf accessor can be used in the plot and the CF names for latitude and longitude used as x and y arguments

[13]:
plt.figure(figsize=(12, 6))
ax = plt.axes(projection=ccrs.Robinson())

SST_time_mean.cf.plot(ax=ax,
                      x='longitude', y='latitude',
                      transform=ccrs.PlateCarree(),
                      vmin=-2, vmax=30, extend='both',
                      cmap=cm.cm.thermal)

ax.coastlines();
../_images/Tutorials_Model_Agnostic_Analysis_33_0.png

Putting this into practice

Above a model agnostic version of some code was demonstrated, but that doesn’t utilise the full power of what it is capable of. The model agnostic code can now be easily turned into a function that accepts an xarray DataArray:

[14]:
def plot_global_temp_in_degrees_celcius(da):
    # Take the time mean of da and plot a global temperature field in a Robinson projection
    #
    # Input DataArray (da) should be a 3D array of latitude, longitude and time.

    da = da.pint.quantify().pint.to('C')
    da_time_mean = da.cf.mean('time')

    plt.figure(figsize=(12, 6))
    ax = plt.axes(projection=ccrs.Robinson())

    da_time_mean.cf.plot(ax=ax,
                         x='longitude', y='latitude',
                         transform=ccrs.PlateCarree(),
                         vmin=-2, vmax=30, extend='both',
                         cmap=cm.cm.thermal)

    ax.coastlines();

Try it out with the SST data used above

[15]:
plot_global_temp_in_degrees_celcius(SST)
../_images/Tutorials_Model_Agnostic_Analysis_38_0.png

Ok, so now try on the output from a different experiment and model (MOM6):

[16]:
experiment = 'OM4_025.JRA_RYF'
variable = 'tos'
SST_mom6 = cc.querying.getvar(experiment, variable, session, frequency='1 monthly', n=12)
[17]:
SST_mom6
[17]:
<xarray.DataArray 'tos' (time: 144, yh: 1080, xh: 1440)>
dask.array<concatenate, shape=(144, 1080, 1440), dtype=float32, chunksize=(1, 540, 720), chunktype=numpy.ndarray>
Coordinates:
  * xh       (xh) float64 -299.7 -299.5 -299.2 -299.0 ... 59.53 59.78 60.03
  * yh       (yh) float64 -80.39 -80.31 -80.23 -80.15 ... 89.73 89.84 89.95
  * time     (time) object 1900-01-16 12:00:00 ... 1911-12-16 12:00:00
Attributes:
    units:          degC
    long_name:      Sea Surface Temperature
    cell_methods:   area:mean yh:mean xh:mean time: mean
    cell_measures:  area: areacello
    time_avg_info:  average_T1,average_T2,average_DT
    standard_name:  sea_surface_temperature
    ncfiles:        ['/g/data/ik11/outputs/mom6-om4-025/OM4_025.JRA_RYF/outpu...
    contact:        Andy Hogg
    email:          Andy.Hogg@anu.edu.au
    created:        2021-11-01
    description:    0.25 degree OM4 (MOM6+SIS2) global model configuration un...

Check to see it has correctly parsed the CF information. It is not necessary to print this out, but interesting, and note it has quite different index and coordinate names

[18]:
SST_mom6.cf
[18]:
Coordinates:
- CF Axes: * X: ['xh']
           * Y: ['yh']
           * T: ['time']
             Z: n/a

- CF Coordinates: * longitude: ['xh']
                  * latitude: ['yh']
                  * time: ['time']
                    vertical: n/a

- Cell Measures:   area, volume: n/a

- Standard Names:   n/a

- Bounds:   n/a

Use the function from above which also worked on MOM5 data with very different coordinates

[19]:
plot_global_temp_in_degrees_celcius(SST_mom6)
../_images/Tutorials_Model_Agnostic_Analysis_45_0.png

What to do when it goes wrong

The model agnostic function worked flawlessly with two different ocean data sets, after the units were corrected in the MOM5 data. What about some ice data?

Using the same experiment from which the first SST data was obtained, load the ice air temperature variable

[20]:
experiment = '025deg_jra55v13_iaf_gmredi6'
variable = 'Tair_m'
ice_air_temp = cc.querying.getvar(experiment, variable, session, frequency='1 monthly', n=12)
[21]:
ice_air_temp
[21]:
<xarray.DataArray 'Tair_m' (time: 12, nj: 1080, ni: 1440)>
dask.array<concatenate, shape=(12, 1080, 1440), dtype=float32, chunksize=(1, 1080, 1440), chunktype=numpy.ndarray>
Coordinates:
  * time     (time) datetime64[ns] 1958-02-01 1958-03-01 ... 1959-01-01
    TLON     (nj, ni) float32 dask.array<chunksize=(1080, 1440), meta=np.ndarray>
    TLAT     (nj, ni) float32 dask.array<chunksize=(1080, 1440), meta=np.ndarray>
    ULON     (nj, ni) float32 dask.array<chunksize=(1080, 1440), meta=np.ndarray>
    ULAT     (nj, ni) float32 dask.array<chunksize=(1080, 1440), meta=np.ndarray>
Dimensions without coordinates: nj, ni
Attributes:
    units:          C
    long_name:      air temperature
    cell_measures:  area: tarea
    cell_methods:   time: mean
    time_rep:       averaged
    ncfiles:        ['/g/data/hh5/tmp/cosima/access-om2-025/025deg_jra55v13_i...

And try the generic routine

[22]:
plot_global_temp_in_degrees_celcius(ice_air_temp)
---------------------------------------------------------------------------
KeyError                                  Traceback (most recent call last)
Cell In[22], line 1
----> 1 plot_global_temp_in_degrees_celcius(ice_air_temp)

Cell In[14], line 12, in plot_global_temp_in_degrees_celcius(da)
      9 plt.figure(figsize=(12, 6))
     10 ax = plt.axes(projection=ccrs.Robinson())
---> 12 da_time_mean.cf.plot(ax=ax,
     13      x='longitude', y='latitude', 
     14      transform=ccrs.PlateCarree(),
     15      vmin=-2, vmax=30, extend='both',
     16      cmap=cm.cm.thermal)
     18 ax.coastlines()

File /g/data/hh5/public/apps/miniconda3/envs/analysis3-22.07/lib/python3.9/site-packages/cf_xarray/accessor.py:907, in _CFWrappedPlotMethods.__call__(self, *args, **kwargs)
    898 """
    899 Allows .plot()
    900 """
    901 plot = _getattr(
    902     obj=self._obj,
    903     attr="plot",
    904     accessor=self.accessor,
    905     key_mappers=dict.fromkeys(self._keys, (_single(_get_all),)),
    906 )
--> 907 return self._plot_decorator(plot)(*args, **kwargs)

File /g/data/hh5/public/apps/miniconda3/envs/analysis3-22.07/lib/python3.9/site-packages/cf_xarray/accessor.py:890, in _CFWrappedPlotMethods._plot_decorator.<locals>._plot_wrapper(*args, **kwargs)
    887     kwargs = self._process_x_or_y(kwargs, "y", skip=kwargs.get("x"))
    889 # Now set some nice properties
--> 890 kwargs = self._set_axis_props(kwargs, "x")
    891 kwargs = self._set_axis_props(kwargs, "y")
    893 return func(*args, **kwargs)

File /g/data/hh5/public/apps/miniconda3/envs/analysis3-22.07/lib/python3.9/site-packages/cf_xarray/accessor.py:854, in _CFWrappedPlotMethods._set_axis_props(self, kwargs, key)
    852 if value:
    853     if value in self.accessor.keys():
--> 854         var = self.accessor[value]
    855     else:
    856         var = self._obj[value]

File /g/data/hh5/public/apps/miniconda3/envs/analysis3-22.07/lib/python3.9/site-packages/cf_xarray/accessor.py:2516, in CFDataArrayAccessor.__getitem__(self, key)
   2511 if not isinstance(key, str):
   2512     raise KeyError(
   2513         f"Cannot use a list of keys with DataArrays. Expected a single string. Received {key!r} instead."
   2514     )
-> 2516 return _getitem(self, key)

File /g/data/hh5/public/apps/miniconda3/envs/analysis3-22.07/lib/python3.9/site-packages/cf_xarray/accessor.py:668, in _getitem(accessor, key, skip)
    666 names = _get_all(obj, k)
    667 names = drop_bounds(names)
--> 668 check_results(names, k)
    669 successful[k] = bool(names)
    670 coords.extend(names)

File /g/data/hh5/public/apps/miniconda3/envs/analysis3-22.07/lib/python3.9/site-packages/cf_xarray/accessor.py:647, in _getitem.<locals>.check_results(names, key)
    645 def check_results(names, key):
    646     if scalar_key and len(names) > 1:
--> 647         raise KeyError(
    648             f"Receive multiple variables for key {key!r}: {names}. "
    649             f"Expected only one. Please pass a list [{key!r}] "
    650             f"instead to get all variables matching {key!r}."
    651         )

KeyError: "Receive multiple variables for key 'longitude': ['ULON', 'TLON']. Expected only one. Please pass a list ['longitude'] instead to get all variables matching 'longitude'."
../_images/Tutorials_Model_Agnostic_Analysis_51_1.png

The error message is

"Receive multiple variables for key 'longitude': ['TLON', 'ULON']. Expected only one. Please pass a list ['longitude'] instead to get all variables matching 'longitude'."

This suggests that cf_xarray has found multiple longitude coordinates TLON and ULON and doesn’t know how to resolve this automatically.

Inspecting the cf information doesn’t show multiple axes like it does in the documentation:

[23]:
ice_air_temp.cf
[23]:
Coordinates:
- CF Axes:   X, Y, Z, T: n/a

- CF Coordinates:   longitude: ['TLON']
                    latitude: ['TLAT']
                    vertical, time: n/a

- Cell Measures:   area, volume: n/a

- Standard Names:   n/a

- Bounds:   n/a

This is a bug, taking the mean alters the coordinates:

[24]:
ice_air_temp.cf.mean('time').cf
[24]:
Coordinates:
- CF Axes:   X, Y, Z, T: n/a

- CF Coordinates:   longitude: ['TLON', 'ULON']
                    latitude: ['TLAT', 'ULAT']
                    vertical, time: n/a

- Cell Measures:   area, volume: n/a

- Standard Names:   n/a

- Bounds:   n/a

So the solution is to drop the redundant velocity grid:

[25]:
ice_air_temp = ice_air_temp.drop_vars(['ULON', 'ULAT'])

Now trying to plot again using the generic function and there is another error:

[26]:
plot_global_temp_in_degrees_celcius(ice_air_temp)
---------------------------------------------------------------------------
ValueError                                Traceback (most recent call last)
Cell In[26], line 1
----> 1 plot_global_temp_in_degrees_celcius(ice_air_temp)

Cell In[14], line 12, in plot_global_temp_in_degrees_celcius(da)
      9 plt.figure(figsize=(12, 6))
     10 ax = plt.axes(projection=ccrs.Robinson())
---> 12 da_time_mean.cf.plot(ax=ax,
     13      x='longitude', y='latitude', 
     14      transform=ccrs.PlateCarree(),
     15      vmin=-2, vmax=30, extend='both',
     16      cmap=cm.cm.thermal)
     18 ax.coastlines()

File /g/data/hh5/public/apps/miniconda3/envs/analysis3-22.07/lib/python3.9/site-packages/cf_xarray/accessor.py:907, in _CFWrappedPlotMethods.__call__(self, *args, **kwargs)
    898 """
    899 Allows .plot()
    900 """
    901 plot = _getattr(
    902     obj=self._obj,
    903     attr="plot",
    904     accessor=self.accessor,
    905     key_mappers=dict.fromkeys(self._keys, (_single(_get_all),)),
    906 )
--> 907 return self._plot_decorator(plot)(*args, **kwargs)

File /g/data/hh5/public/apps/miniconda3/envs/analysis3-22.07/lib/python3.9/site-packages/cf_xarray/accessor.py:893, in _CFWrappedPlotMethods._plot_decorator.<locals>._plot_wrapper(*args, **kwargs)
    890 kwargs = self._set_axis_props(kwargs, "x")
    891 kwargs = self._set_axis_props(kwargs, "y")
--> 893 return func(*args, **kwargs)

File /g/data/hh5/public/apps/miniconda3/envs/analysis3-22.07/lib/python3.9/site-packages/cf_xarray/accessor.py:598, in _getattr.<locals>.wrapper(*args, **kwargs)
    594 posargs, arguments = accessor._process_signature(
    595     func, args, kwargs, key_mappers
    596 )
    597 final_func = extra_decorator(func) if extra_decorator else func
--> 598 result = final_func(*posargs, **arguments)
    599 if wrap_classes and isinstance(result, _WRAPPED_CLASSES):
    600     result = _CFWrappedClass(result, accessor)

File /g/data/hh5/public/apps/miniconda3/envs/analysis3-22.07/lib/python3.9/site-packages/xarray/plot/accessor.py:46, in DataArrayPlotAccessor.__call__(self, **kwargs)
     44 @functools.wraps(dataarray_plot.plot, assigned=("__doc__", "__annotations__"))
     45 def __call__(self, **kwargs) -> Any:
---> 46     return dataarray_plot.plot(self._da, **kwargs)

File /g/data/hh5/public/apps/miniconda3/envs/analysis3-22.07/lib/python3.9/site-packages/xarray/plot/dataarray_plot.py:312, in plot(darray, row, col, col_wrap, ax, hue, subplot_kws, **kwargs)
    308     plotfunc = hist
    310 kwargs["ax"] = ax
--> 312 return plotfunc(darray, **kwargs)

File /g/data/hh5/public/apps/miniconda3/envs/analysis3-22.07/lib/python3.9/site-packages/xarray/plot/dataarray_plot.py:1613, in _plot2d.<locals>.newplotfunc(***failed resolving arguments***)
   1609     raise ValueError("plt.imshow's `aspect` kwarg is not available in xarray")
   1611 ax = get_axis(figsize, size, aspect, ax, **subplot_kws)
-> 1613 primitive = plotfunc(
   1614     xplt,
   1615     yplt,
   1616     zval,
   1617     ax=ax,
   1618     cmap=cmap_params["cmap"],
   1619     vmin=cmap_params["vmin"],
   1620     vmax=cmap_params["vmax"],
   1621     norm=cmap_params["norm"],
   1622     **kwargs,
   1623 )
   1625 # Label the plot with metadata
   1626 if add_labels:

File /g/data/hh5/public/apps/miniconda3/envs/analysis3-22.07/lib/python3.9/site-packages/xarray/plot/dataarray_plot.py:2340, in pcolormesh(x, y, z, ax, xscale, yscale, infer_intervals, **kwargs)
   2337         y = _infer_interval_breaks(y, axis=0, scale=yscale)
   2339 ax.grid(False)
-> 2340 primitive = ax.pcolormesh(x, y, z, **kwargs)
   2342 # by default, pcolormesh picks "round" values for bounds
   2343 # this results in ugly looking plots with lots of surrounding whitespace
   2344 if not hasattr(ax, "projection") and x.ndim == 1 and y.ndim == 1:
   2345     # not a cartopy geoaxis

File /g/data/hh5/public/apps/miniconda3/envs/analysis3-22.07/lib/python3.9/site-packages/cartopy/mpl/geoaxes.py:318, in _add_transform.<locals>.wrapper(self, *args, **kwargs)
    313     raise ValueError(f'Invalid transform: Spherical {func.__name__} '
    314                      'is not supported - consider using '
    315                      'PlateCarree/RotatedPole.')
    317 kwargs['transform'] = transform
--> 318 return func(self, *args, **kwargs)

File /g/data/hh5/public/apps/miniconda3/envs/analysis3-22.07/lib/python3.9/site-packages/cartopy/mpl/geoaxes.py:1797, in GeoAxes.pcolormesh(self, *args, **kwargs)
   1794 # Add in an argument checker to handle Matplotlib's potential
   1795 # interpolation when coordinate wraps are involved
   1796 args, kwargs = self._wrap_args(*args, **kwargs)
-> 1797 result = matplotlib.axes.Axes.pcolormesh(self, *args, **kwargs)
   1798 # Wrap the quadrilaterals if necessary
   1799 result = self._wrap_quadmesh(result, **kwargs)

File /g/data/hh5/public/apps/miniconda3/envs/analysis3-22.07/lib/python3.9/site-packages/matplotlib/__init__.py:1414, in _preprocess_data.<locals>.inner(ax, data, *args, **kwargs)
   1411 @functools.wraps(func)
   1412 def inner(ax, *args, data=None, **kwargs):
   1413     if data is None:
-> 1414         return func(ax, *map(sanitize_sequence, args), **kwargs)
   1416     bound = new_sig.bind(ax, *args, **kwargs)
   1417     auto_label = (bound.arguments.get(label_namer)
   1418                   or bound.kwargs.get(label_namer))

File /g/data/hh5/public/apps/miniconda3/envs/analysis3-22.07/lib/python3.9/site-packages/matplotlib/axes/_axes.py:6064, in Axes.pcolormesh(self, alpha, norm, cmap, vmin, vmax, shading, antialiased, *args, **kwargs)
   6061 shading = shading.lower()
   6062 kwargs.setdefault('edgecolors', 'none')
-> 6064 X, Y, C, shading = self._pcolorargs('pcolormesh', *args,
   6065                                     shading=shading, kwargs=kwargs)
   6066 coords = np.stack([X, Y], axis=-1)
   6067 # convert to one dimensional array

File /g/data/hh5/public/apps/miniconda3/envs/analysis3-22.07/lib/python3.9/site-packages/matplotlib/axes/_axes.py:5539, in Axes._pcolorargs(self, funcname, shading, *args, **kwargs)
   5537 if funcname == 'pcolormesh':
   5538     if np.ma.is_masked(X) or np.ma.is_masked(Y):
-> 5539         raise ValueError(
   5540             'x and y arguments to pcolormesh cannot have '
   5541             'non-finite values or be of type '
   5542             'numpy.ma.core.MaskedArray with masked values')
   5543     # safe_masked_invalid() returns an ndarray for dtypes other
   5544     # than floating point.
   5545     if isinstance(X, np.ma.core.MaskedArray):

ValueError: x and y arguments to pcolormesh cannot have non-finite values or be of type numpy.ma.core.MaskedArray with masked values
../_images/Tutorials_Model_Agnostic_Analysis_59_1.png

This error:

ValueError: x and y arguments to pcolormesh cannot have non-finite values or be of type numpy.ma.core.MaskedArray with masked values

is because there are NaN’s in the coordinate variables, as explained in the plotting with cartopy tutorial.

By following the instructions in that tutorial and the Spatial selection with tripolar ACCESS-OM2 grid notebook the coordinates can be fixed by replacing them with coordinates from the ice grid input file. It requires some work, the dimensions must be renamed to match, and coordinates converted from radians to degrees.

[27]:
ice_grid = xr.open_dataset('/g/data/ik11/inputs/access-om2/input_eee21b65/cice_025deg/grid.nc').rename({'ny': 'nj', 'nx': 'ni'})
ice_grid = ice_grid.pint.quantify()

ice_air_temp = ice_air_temp.assign_coords({'TLON': ice_grid.tlon.pint.to('degrees_E'), 'TLAT': ice_grid.tlat.pint.to('degrees_N')})
ice_air_temp
[27]:
<xarray.DataArray 'Tair_m' (time: 12, nj: 1080, ni: 1440)>
dask.array<concatenate, shape=(12, 1080, 1440), dtype=float32, chunksize=(1, 1080, 1440), chunktype=numpy.ndarray>
Coordinates:
  * time     (time) datetime64[ns] 1958-02-01 1958-03-01 ... 1959-01-01
    TLON     (nj, ni) float64 [degrees_east] -279.9 -279.6 -279.4 ... 80.0 80.0
    TLAT     (nj, ni) float64 [degrees_north] -81.08 -81.08 ... 65.13 65.03
Dimensions without coordinates: nj, ni
Attributes:
    units:          C
    long_name:      air temperature
    cell_measures:  area: tarea
    cell_methods:   time: mean
    time_rep:       averaged
    ncfiles:        ['/g/data/hh5/tmp/cosima/access-om2-025/025deg_jra55v13_i...

Finally, the generic plotting routine works

[28]:
plot_global_temp_in_degrees_celcius(ice_air_temp)
../_images/Tutorials_Model_Agnostic_Analysis_63_0.png

One more step is to modify the original routine to take the vertical range as an argument, so it is more generally useful:

[29]:
def plot_global_temp_in_degrees_celcius(da, vmin=-2, vmax=30):
    # Take the time mean of da and plot a global temperature field in a Robinson projection
    #
    # Input DataArray (da) should be a 3D array of latitude, longitude and time.

    da = da.pint.quantify().pint.to('C')
    da_time_mean = da.cf.mean('time')

    plt.figure(figsize=(12, 6))
    ax = plt.axes(projection=ccrs.Robinson())

    da_time_mean.cf.plot(ax=ax,
                         x='longitude', y='latitude',
                         transform=ccrs.PlateCarree(),
                         vmin=vmin, vmax=vmax, extend='both',
                         cmap=cm.cm.thermal)

    ax.coastlines();

By specifying default values for the arguments it is completely backwards compatible, we have lost no functionality, but the ice air temperature can now be plotted with a range that better suits the range of the data

[30]:
plot_global_temp_in_degrees_celcius(ice_air_temp, vmin=-30)
../_images/Tutorials_Model_Agnostic_Analysis_67_0.png

Conclusion

Model specific code to take a time mean and plot the data was converted to a model agnostic function with no loss of functionality.

The same function can be used with a wide range CF compliant data.

In some cases the input data will need to be modified if it is not sufficiently compliant, or non-conforming in some way, such as the ice data above with NaN’s in the coordinate. It is better to modify the data to be more compliant and higher quality, and use generic tools, than have multiple code versions to account for the vagaries or problems with individual datasets.

Ideally those improvements can be incorporated into future versions of the data at source, in post-processing, or in some utility functions for transforming a class of non-conforming data.