Model Agnostic Analysis¶
Introduction¶
Model agnostic analysis means writing your notebook so that it can easily be used with any CF-compliant data source.
What are the CF Conventions?¶
From CF Metadata conventions:
The CF metadata conventions are designed to promote the processing and sharing of files created with the NetCDF API. The conventions define metadata that provide a definitive description of what the data in each variable represents, and the spatial and temporal properties of the data. This enables users of data from different sources to decide which quantities are comparable, and facilitates building applications with powerful extraction, regridding, and display capabilities. The CF convention includes a standard name table, which defines strings that identify physical quantities.
In most cases the model output data accessed through the ACCESS-NRI Intake Catalog complies with some version of the CF conventions; enough to be usable for model agnostic analysis.
Why bother?¶
Model agnostic means the same code can work for multiple models. This makes your code more usable by you and by others. You no longer need to have different versions of code for different models. It makes you and any one who uses your code more productive. It allows for common tasks to be abstracted into general methods that can be more easily reused, meaning less code needs to be written and maintained. This is an enormous productivity boost.
How is model agnostic analysis achieved?¶
This can be achieved by using packages that enable this:
cf_xarray for generalised coordinate naming
xgcm to make grid operations generic across data
pint and pint-xarray for handling units easily and robustly
Example¶
Introduction¶
This tutorial uses an example analysis, shows how the this might be done in a traditional, model specific, manner, and then implements the same analysis in a model agnostic way.
This tutorial is intended to be run using the conda/analysis3 or conda/access-med environment, available via the xp65 project: https://access-nri-intake-catalog.readthedocs.io/en/latest/usage/how.html#using-the-catalog-on-the-are
The first step is to import necessary packages.
[1]:
import intake
import numpy as np
import xarray as xr
import cf_xarray as cfxr
import pint_xarray
from pint import application_registry as ureg
import cf_xarray.units
import matplotlib.pyplot as plt
import cmocean as cm
import cartopy.crs as ccrs
import cartopy.feature as cft
[2]:
from dask.distributed import Client
client = Client(threads_per_worker=1)
client
[2]:
Client
Client-0c70da2d-52f7-11f0-9c9d-00000395fe80
| Connection method: Cluster object | Cluster type: distributed.LocalCluster |
| Dashboard: /proxy/8787/status |
Cluster Info
LocalCluster
77e8a985
| Dashboard: /proxy/8787/status | Workers: 14 |
| Total threads: 14 | Total memory: 63.00 GiB |
| Status: running | Using processes: True |
Scheduler Info
Scheduler
Scheduler-8c3e5a06-d867-4f42-a75a-a43ce526fa5c
| Comm: tcp://127.0.0.1:35881 | Workers: 0 |
| Dashboard: /proxy/8787/status | Total threads: 0 |
| Started: Just now | Total memory: 0 B |
Workers
Worker: 0
| Comm: tcp://127.0.0.1:46123 | Total threads: 1 |
| Dashboard: /proxy/36765/status | Memory: 4.50 GiB |
| Nanny: tcp://127.0.0.1:35213 | |
| Local directory: /jobfs/144171292.gadi-pbs/dask-scratch-space/worker-uggsb74w | |
Worker: 1
| Comm: tcp://127.0.0.1:46543 | Total threads: 1 |
| Dashboard: /proxy/34241/status | Memory: 4.50 GiB |
| Nanny: tcp://127.0.0.1:41065 | |
| Local directory: /jobfs/144171292.gadi-pbs/dask-scratch-space/worker-se0d2d53 | |
Worker: 2
| Comm: tcp://127.0.0.1:36231 | Total threads: 1 |
| Dashboard: /proxy/40805/status | Memory: 4.50 GiB |
| Nanny: tcp://127.0.0.1:43637 | |
| Local directory: /jobfs/144171292.gadi-pbs/dask-scratch-space/worker-l3oags2n | |
Worker: 3
| Comm: tcp://127.0.0.1:37323 | Total threads: 1 |
| Dashboard: /proxy/32941/status | Memory: 4.50 GiB |
| Nanny: tcp://127.0.0.1:36269 | |
| Local directory: /jobfs/144171292.gadi-pbs/dask-scratch-space/worker-biqo6d2r | |
Worker: 4
| Comm: tcp://127.0.0.1:37607 | Total threads: 1 |
| Dashboard: /proxy/43491/status | Memory: 4.50 GiB |
| Nanny: tcp://127.0.0.1:39297 | |
| Local directory: /jobfs/144171292.gadi-pbs/dask-scratch-space/worker-fp82cmtc | |
Worker: 5
| Comm: tcp://127.0.0.1:42559 | Total threads: 1 |
| Dashboard: /proxy/42343/status | Memory: 4.50 GiB |
| Nanny: tcp://127.0.0.1:36693 | |
| Local directory: /jobfs/144171292.gadi-pbs/dask-scratch-space/worker-l__eapav | |
Worker: 6
| Comm: tcp://127.0.0.1:34123 | Total threads: 1 |
| Dashboard: /proxy/33051/status | Memory: 4.50 GiB |
| Nanny: tcp://127.0.0.1:36275 | |
| Local directory: /jobfs/144171292.gadi-pbs/dask-scratch-space/worker-cxvtixw5 | |
Worker: 7
| Comm: tcp://127.0.0.1:44179 | Total threads: 1 |
| Dashboard: /proxy/39331/status | Memory: 4.50 GiB |
| Nanny: tcp://127.0.0.1:37111 | |
| Local directory: /jobfs/144171292.gadi-pbs/dask-scratch-space/worker-mx0nox70 | |
Worker: 8
| Comm: tcp://127.0.0.1:37539 | Total threads: 1 |
| Dashboard: /proxy/37951/status | Memory: 4.50 GiB |
| Nanny: tcp://127.0.0.1:41951 | |
| Local directory: /jobfs/144171292.gadi-pbs/dask-scratch-space/worker-7mh7z8br | |
Worker: 9
| Comm: tcp://127.0.0.1:46819 | Total threads: 1 |
| Dashboard: /proxy/35809/status | Memory: 4.50 GiB |
| Nanny: tcp://127.0.0.1:33215 | |
| Local directory: /jobfs/144171292.gadi-pbs/dask-scratch-space/worker-yl2h9wna | |
Worker: 10
| Comm: tcp://127.0.0.1:36703 | Total threads: 1 |
| Dashboard: /proxy/42507/status | Memory: 4.50 GiB |
| Nanny: tcp://127.0.0.1:43025 | |
| Local directory: /jobfs/144171292.gadi-pbs/dask-scratch-space/worker-5r7ln_bn | |
Worker: 11
| Comm: tcp://127.0.0.1:43375 | Total threads: 1 |
| Dashboard: /proxy/35405/status | Memory: 4.50 GiB |
| Nanny: tcp://127.0.0.1:45413 | |
| Local directory: /jobfs/144171292.gadi-pbs/dask-scratch-space/worker-4pkx2s4d | |
Worker: 12
| Comm: tcp://127.0.0.1:38131 | Total threads: 1 |
| Dashboard: /proxy/34111/status | Memory: 4.50 GiB |
| Nanny: tcp://127.0.0.1:45563 | |
| Local directory: /jobfs/144171292.gadi-pbs/dask-scratch-space/worker-fbs46wfo | |
Worker: 13
| Comm: tcp://127.0.0.1:33747 | Total threads: 1 |
| Dashboard: /proxy/39141/status | Memory: 4.50 GiB |
| Nanny: tcp://127.0.0.1:38761 | |
| Local directory: /jobfs/144171292.gadi-pbs/dask-scratch-space/worker-ufycrjks | |
cf_xarray works best when xarray keeps attributes by default:
[3]:
xr.set_options(keep_attrs=True)
[3]:
<xarray.core.options.set_options at 0x154ce6075fd0>
Load the ACCESS-NRI Intake Catalog:
[4]:
catalog = intake.cat.access_nri
Then, select one experiment from the catalog and load surface temperature data from a 0.25\(^\circ\) global MOM5 model.
[5]:
experiment = '025deg_jra55_iaf_omip2_cycle6'
cat_subset = catalog[experiment]
variable = 'sst'
var_search = cat_subset.search(variable=variable, frequency='1mon')
ds = var_search.to_dask(xarray_open_kwargs={
# This option suppresses a warning due to an upcoming default alteration in xarray
"decode_timedelta": False,
})
SST = ds[variable]
This is a 3D dataset in latitude, longitude and time:
[6]:
SST
[6]:
<xarray.DataArray 'sst' (time: 732, yt_ocean: 1080, xt_ocean: 1440)> Size: 5GB
dask.array<concatenate, shape=(732, 1080, 1440), dtype=float32, chunksize=(1, 216, 240), chunktype=numpy.ndarray>
Coordinates:
* xt_ocean (xt_ocean) float64 12kB -279.9 -279.6 -279.4 ... 79.38 79.62 79.88
* yt_ocean (yt_ocean) float64 9kB -81.08 -80.97 -80.87 ... 89.74 89.84 89.95
* time (time) datetime64[ns] 6kB 1958-01-14T12:00:00 ... 2018-12-14T12...
Attributes:
long_name: Potential temperature
units: K
valid_range: [-10. 500.]
cell_methods: time: mean
time_avg_info: average_T1,average_T2,average_DT
standard_name: sea_surface_temperatureModel specific¶
First, do our analysis as it might usually be done, in a model-specific manner:
Convert the temperature units from kelvin (K) to Celsius (C);
Use the time coordinate name in the mean function.
We use pint to ensure this is in degrees C. Note that if the data were originally in degrees Celcius, this would do nothing. This is a way of catering for any temperature units that are understood by pint in a transparent way. Note the call to quantify, which invokes pint’s machinery to parse the units and allow unit conversions.
[7]:
SST.attrs['units'] = 'K'
SST = SST.pint.quantify().pint.to('C')
SST_time_mean = SST.mean('time')
SST_time_mean
[7]:
<xarray.DataArray 'sst' (yt_ocean: 1080, xt_ocean: 1440)> Size: 6MB
<Quantity(dask.array<mean_agg-aggregate, shape=(1080, 1440), dtype=float32, chunksize=(216, 240), chunktype=numpy.ndarray>, 'degree_Celsius')>
Coordinates:
* xt_ocean (xt_ocean) float64 12kB -279.9 -279.6 -279.4 ... 79.38 79.62 79.88
* yt_ocean (yt_ocean) float64 9kB -81.08 -80.97 -80.87 ... 89.74 89.84 89.95
Attributes:
long_name: Potential temperature
valid_range: [-10. 500.]
cell_methods: time: mean
time_avg_info: average_T1,average_T2,average_DT
standard_name: sea_surface_temperatureNow, plot the result:
[9]:
plt.figure(figsize=(12, 6))
ax = plt.axes(projection=ccrs.Robinson())
SST_time_mean = SST_time_mean.pint.dequantify()
SST_time_mean.plot(ax=ax,
x='xt_ocean', y='yt_ocean',
transform=ccrs.PlateCarree(),
vmin=-2, vmax=30, extend='both',
cmap=cm.cm.thermal)
ax.coastlines()
[9]:
<cartopy.mpl.feature_artist.FeatureArtist at 0x154c8ba45810>
(Note that the Arctic is not correctly respresented due to the 1D lat/lon coordinates not being correct in the tripole area. See the Making Maps with cartopy Tutorial for more information.)
Model agnostic¶
Now, do the same calculation, but in a model agnostic manner.
We can use the cf accessor to automatically determine the name of the time dimension without knowledge of the exact model being used. cf_xarray checks the names of variables and coordinates, and associated metadata, to try and infer information about the data based on the CF conventions.
To see what cf_xarray information is available just evaluate the accessor:
[10]:
SST.cf
[10]:
Coordinates:
CF Axes: * X: ['xt_ocean']
* Y: ['yt_ocean']
* T: ['time']
Z: n/a
CF Coordinates: * longitude: ['xt_ocean']
* latitude: ['yt_ocean']
* time: ['time']
vertical: n/a
Cell Measures: area, volume: n/a
Standard Names: n/a
Bounds: n/a
Grid Mappings: n/a
In this case it has found X, Y and T axes, and longitude, latitude and time coordinates. These are now accessible like a dict using the cf accessor. Note that this returns the actual coordinate, and many functions just want a simple string argument, which is the name of the coordinate.
cf_xarray also wraps many standard xarray functions, allowing cf names to be used, which are automatically converted to the name in the data.
The upshot: just use the cf accessor, append the required function, and use the standard CF coordinate name (in this case they are the same, time, but that is not guaranteed):
[11]:
SST_time_mean = SST.cf.mean('time')
SST_time_mean
[11]:
<xarray.DataArray 'sst' (yt_ocean: 1080, xt_ocean: 1440)> Size: 6MB
<Quantity(dask.array<mean_agg-aggregate, shape=(1080, 1440), dtype=float32, chunksize=(216, 240), chunktype=numpy.ndarray>, 'degree_Celsius')>
Coordinates:
* xt_ocean (xt_ocean) float64 12kB -279.9 -279.6 -279.4 ... 79.38 79.62 79.88
* yt_ocean (yt_ocean) float64 9kB -81.08 -80.97 -80.87 ... 89.74 89.84 89.95
Attributes:
long_name: Potential temperature
valid_range: [-10. 500.]
cell_methods: time: mean
time_avg_info: average_T1,average_T2,average_DT
standard_name: sea_surface_temperatureUsing the cf_xarray-wrapped function makes the code more legible and easier to write, i.e.
SST.cf.mean('time')
compared to
SST.mean(SST.cf['time'].name)
The cf accessor can be used in the same way in the plot, with the CF names for latitude and longitude used as x and y arguments:
[12]:
plt.figure(figsize=(12, 6))
ax = plt.axes(projection=ccrs.Robinson())
SST_time_mean = SST_time_mean.pint.dequantify()
SST_time_mean.cf.plot(ax=ax,
x='longitude', y='latitude',
transform=ccrs.PlateCarree(),
vmin=-2, vmax=30, extend='both',
cmap=cm.cm.thermal)
ax.coastlines()
[12]:
<cartopy.mpl.feature_artist.FeatureArtist at 0x154c83ee9c10>
Putting this into practice¶
Above a model agnostic version of some code was demonstrated, but that doesn’t utilise the full power of what it is capable of. The model agnostic code can now be easily turned into a function that accepts an xarray DataArray:
[15]:
def map_mean_temp_in_degrees_celsius(da):
# Take the time mean of da and plot a global temperature field in a Robinson projection
#
# Input DataArray (da) should be a 3D array of latitude, longitude and time.
da = da.pint.quantify().pint.to('C')
da_time_mean = da.cf.mean('time')
plt.figure(figsize=(12, 6))
ax = plt.axes(projection=ccrs.Robinson())
da_time_mean = da_time_mean.pint.dequantify()
da_time_mean.cf.plot(ax=ax,
x='longitude', y='latitude',
transform=ccrs.PlateCarree(),
vmin=-2, vmax=30, extend='both',
cmap=cm.cm.thermal)
ax.coastlines()
Try this out with the SST data used above:
[16]:
map_mean_temp_in_degrees_celsius(SST)
Now try on the output from a different experiment and model (MOM6):
[17]:
experiment = 'OM4_025.JRA_RYF'
variable = 'tos'
cat_subset = catalog[experiment]
var_search = cat_subset.search(variable=variable, frequency='1mon')
ds = var_search.to_dask(xarray_open_kwargs={"decode_timedelta": False})
SST_mom6 = ds[variable]
Check to see it has correctly parsed the CF information. It is not necessary to print this out, but interesting, and note it has quite different index and coordinate names
[18]:
SST_mom6.cf
[18]:
Coordinates:
CF Axes: * X: ['xh']
* Y: ['yh']
* T: ['time']
Z: n/a
CF Coordinates: * longitude: ['xh']
* latitude: ['yh']
* time: ['time']
vertical: n/a
Cell Measures: area, volume: n/a
Standard Names: n/a
Bounds: n/a
Grid Mappings: n/a
Use the function from above which also worked on MOM5 data with very different coordinates
[19]:
map_mean_temp_in_degrees_celsius(SST_mom6)
What to do when it goes wrong¶
The model agnostic function worked flawlessly with two different ocean data sets, after the units were corrected in the MOM5 data. What about some ice data?
Using the same experiment from which the first SST data was obtained, load the ice air temperature variable:
[20]:
catalog.search(name=".*025deg", realm="seaIce", variable="Tair_m")
Intake dataframe catalog with 11 source(s) across 11 rows:
| model | description | realm | frequency | variable | |
|---|---|---|---|---|---|
| name | |||||
| 025deg_era5_iaf | {ACCESS-OM2} | {0.25 degree ACCESS-OM2 global model configuration with ERA5 interannual\nforcing (1980-2021)} | {seaIce} | {1mon} | {Tair_m} |
| 025deg_era5_ryf | {ACCESS-OM2} | {0.25 degree ACCESS-OM2 global model configuration with ERA5 RYF9091 repeat\nyear forcing (May 1990 to Apr 1991)} | {seaIce} | {1mon} | {Tair_m} |
| 025deg_jra55_iaf_era5comparison | {ACCESS-OM2} | {0.25 degree ACCESS-OM2 global model configuration with JRA55-do v1.5.0\ninterannual forcing (1980-2019)} | {seaIce} | {1mon} | {Tair_m} |
| 025deg_jra55_iaf_omip2_cycle1 | {ACCESS-OM2} | {Cycle 1/6 of 0.25 degree ACCESS-OM2 physics-only global configuration with JRA55-do v1.4 OMIP2 interannual forcing (1958-2019)} | {seaIce} | {1mon} | {Tair_m} |
| 025deg_jra55_iaf_omip2_cycle2 | {ACCESS-OM2} | {Cycle 1/6 of 0.25 degree ACCESS-OM2 physics-only global configuration with JRA55-do v1.4 OMIP2 interannual forcing (1958-2019)} | {seaIce} | {1mon} | {Tair_m} |
| 025deg_jra55_iaf_omip2_cycle3 | {ACCESS-OM2} | {Cycle 3/6 of 0.25 degree ACCESS-OM2 physics-only global configuration with JRA55-do v1.4 OMIP2 interannual forcing (1958-2019)} | {seaIce} | {1mon} | {Tair_m} |
| 025deg_jra55_iaf_omip2_cycle4 | {ACCESS-OM2} | {Cycle 4/6 of 0.25 degree ACCESS-OM2 physics-only global configuration with JRA55-do v1.4 OMIP2 interannual forcing (1958-2019)} | {seaIce} | {1mon} | {Tair_m} |
| 025deg_jra55_iaf_omip2_cycle5 | {ACCESS-OM2} | {Cycle 5/6 of 0.25 degree ACCESS-OM2 physics-only global configuration with JRA55-do v1.4 OMIP2 interannual forcing (1958-2019)} | {seaIce} | {1mon} | {Tair_m} |
| 025deg_jra55_iaf_omip2_cycle6 | {ACCESS-OM2} | {Cycle 6/6 of 0.25 degree ACCESS-OM2 physics-only global configuration with JRA55-do v1.4 OMIP2 interannual forcing (1958-2019)} | {seaIce} | {1mon} | {Tair_m} |
| 025deg_jra55_ryf9091_gadi | {ACCESS-OM2} | {0.25 degree ACCESS-OM2 physics-only global configuration with JRA55-do v1.3 RYF9091 repeat year forcing (May 1990 to Apr 1991)} | {seaIce} | {1mon} | {Tair_m} |
| 025deg_jra55_ryf_era5comparison | {ACCESS-OM2} | {0.25 degree ACCESS-OM2 global model configuration with JRA55-do v1.4.0\nRYF9091 repeat year forcing (May 1990 to Apr 1991)} | {seaIce} | {1mon} | {Tair_m} |
[21]:
experiment = "025deg_jra55_iaf_omip2_cycle6"
variable = 'Tair_m'
cat_subset = catalog[experiment]
var_search = cat_subset.search(variable=variable, frequency='1mon')
ds = var_search.to_dask(
xarray_open_kwargs={"decode_timedelta": False},
# xarray_combine_by_coords_kwargs is required to get the
# data from this experiment loaded in a reasonable timeframe
xarray_combine_by_coords_kwargs = dict(
compat="override",
data_vars="minimal",
coords="minimal",
),
)
ice_air_temp = ds[variable]
ice_air_temp
[21]:
<xarray.DataArray 'Tair_m' (time: 732, nj: 1080, ni: 1440)> Size: 5GB
dask.array<concatenate, shape=(732, 1080, 1440), dtype=float32, chunksize=(1, 540, 720), chunktype=numpy.ndarray>
Coordinates:
* time (time) datetime64[ns] 6kB 1958-02-01 1958-03-01 ... 2019-01-01
TLON (nj, ni) float32 6MB dask.array<chunksize=(540, 720), meta=np.ndarray>
TLAT (nj, ni) float32 6MB dask.array<chunksize=(540, 720), meta=np.ndarray>
ULON (nj, ni) float32 6MB dask.array<chunksize=(540, 720), meta=np.ndarray>
ULAT (nj, ni) float32 6MB dask.array<chunksize=(540, 720), meta=np.ndarray>
Dimensions without coordinates: nj, ni
Attributes:
units: C
long_name: air temperature
cell_measures: area: tarea
cell_methods: time: mean
time_rep: averagedWhen we try the generic routine:
[22]:
map_mean_temp_in_degrees_celsius(ice_air_temp)
---------------------------------------------------------------------------
KeyError Traceback (most recent call last)
Cell In[22], line 1
----> 1 map_mean_temp_in_degrees_celsius(ice_air_temp)
Cell In[15], line 13, in map_mean_temp_in_degrees_celsius(da)
10 ax = plt.axes(projection=ccrs.Robinson())
12 da_time_mean = da_time_mean.pint.dequantify()
---> 13 da_time_mean.cf.plot(ax=ax,
14 x='longitude', y='latitude',
15 transform=ccrs.PlateCarree(),
16 vmin=-2, vmax=30, extend='both',
17 cmap=cm.cm.thermal)
19 ax.coastlines()
File /g/data/xp65/public/apps/med_conda/envs/analysis3-25.06/lib/python3.11/site-packages/cf_xarray/accessor.py:1123, in _CFWrappedPlotMethods.__call__(self, *args, **kwargs)
1114 """
1115 Allows .plot()
1116 """
1117 plot = _getattr(
1118 obj=self._obj,
1119 attr="plot",
1120 accessor=self.accessor,
1121 key_mappers=dict.fromkeys(self._keys, (_single(_get_all),)),
1122 )
-> 1123 return self._plot_decorator(plot)(*args, **kwargs)
File /g/data/xp65/public/apps/med_conda/envs/analysis3-25.06/lib/python3.11/site-packages/cf_xarray/accessor.py:1106, in _CFWrappedPlotMethods._plot_decorator.<locals>._plot_wrapper(*args, **kwargs)
1103 kwargs = self._process_x_or_y(kwargs, "y", skip=kwargs.get("x"))
1105 # Now set some nice properties
-> 1106 kwargs = self._set_axis_props(kwargs, "x")
1107 kwargs = self._set_axis_props(kwargs, "y")
1109 return func(*args, **kwargs)
File /g/data/xp65/public/apps/med_conda/envs/analysis3-25.06/lib/python3.11/site-packages/cf_xarray/accessor.py:1070, in _CFWrappedPlotMethods._set_axis_props(self, kwargs, key)
1068 if value:
1069 if value in self.accessor.keys():
-> 1070 var = self.accessor[value]
1071 else:
1072 var = self._obj[value]
File /g/data/xp65/public/apps/med_conda/envs/analysis3-25.06/lib/python3.11/site-packages/cf_xarray/accessor.py:2977, in CFDataArrayAccessor.__getitem__(self, key)
2972 if not isinstance(key, Hashable):
2973 raise KeyError(
2974 f"Cannot use an Iterable of keys with DataArrays. Expected a single string. Received {key!r} instead."
2975 )
-> 2977 return _getitem(self, key)
File /g/data/xp65/public/apps/med_conda/envs/analysis3-25.06/lib/python3.11/site-packages/cf_xarray/accessor.py:856, in _getitem(accessor, key, skip)
854 names = _get_all(obj, k)
855 names = drop_bounds(names)
--> 856 check_results(names, k)
857 successful[k] = bool(names)
858 coords.extend(names)
File /g/data/xp65/public/apps/med_conda/envs/analysis3-25.06/lib/python3.11/site-packages/cf_xarray/accessor.py:826, in _getitem.<locals>.check_results(names, key)
824 def check_results(names, key):
825 if scalar_key and len(names) > 1:
--> 826 raise KeyError(
827 f"Receive multiple variables for key {key!r}: {names}. "
828 f"Expected only one. Please pass a list [{key!r}] "
829 f"instead to get all variables matching {key!r}."
830 )
KeyError: "Receive multiple variables for key 'longitude': ['ULON', 'TLON']. Expected only one. Please pass a list ['longitude'] instead to get all variables matching 'longitude'."
The error message is
"Receive multiple variables for key 'longitude': ['TLON', 'ULON']. Expected only one. Please pass a list ['longitude'] instead to get all variables matching 'longitude'."
This suggests that cf_xarray has found multiple longitude coordinates TLON and ULON and doesn’t know how to resolve this automatically.
Inspecting the cf information doesn’t show multiple axes like it does in the documentation:
[23]:
ice_air_temp.cf
[23]:
Coordinates:
CF Axes: X, Y, Z, T: n/a
CF Coordinates: longitude: ['TLON']
latitude: ['TLAT']
vertical, time: n/a
Cell Measures: area, volume: n/a
Standard Names: n/a
Bounds: n/a
Grid Mappings: n/a
This is a bug, taking the mean alters the coordinates:
[24]:
ice_air_temp.cf.mean('time').cf
[24]:
Coordinates:
CF Axes: X, Y, Z, T: n/a
CF Coordinates: longitude: ['TLON', 'ULON']
latitude: ['TLAT', 'ULAT']
vertical, time: n/a
Cell Measures: area, volume: n/a
Standard Names: n/a
Bounds: n/a
Grid Mappings: n/a
So the solution is to drop the redundant velocity grid:
[25]:
ice_air_temp = ice_air_temp.drop_vars(['ULON', 'ULAT'])
Now trying to plot again using the generic function and there is another error:
[26]:
map_mean_temp_in_degrees_celsius(ice_air_temp)
---------------------------------------------------------------------------
ValueError Traceback (most recent call last)
Cell In[26], line 1
----> 1 map_mean_temp_in_degrees_celsius(ice_air_temp)
Cell In[15], line 13, in map_mean_temp_in_degrees_celsius(da)
10 ax = plt.axes(projection=ccrs.Robinson())
12 da_time_mean = da_time_mean.pint.dequantify()
---> 13 da_time_mean.cf.plot(ax=ax,
14 x='longitude', y='latitude',
15 transform=ccrs.PlateCarree(),
16 vmin=-2, vmax=30, extend='both',
17 cmap=cm.cm.thermal)
19 ax.coastlines()
File /g/data/xp65/public/apps/med_conda/envs/analysis3-25.06/lib/python3.11/site-packages/cf_xarray/accessor.py:1123, in _CFWrappedPlotMethods.__call__(self, *args, **kwargs)
1114 """
1115 Allows .plot()
1116 """
1117 plot = _getattr(
1118 obj=self._obj,
1119 attr="plot",
1120 accessor=self.accessor,
1121 key_mappers=dict.fromkeys(self._keys, (_single(_get_all),)),
1122 )
-> 1123 return self._plot_decorator(plot)(*args, **kwargs)
File /g/data/xp65/public/apps/med_conda/envs/analysis3-25.06/lib/python3.11/site-packages/cf_xarray/accessor.py:1109, in _CFWrappedPlotMethods._plot_decorator.<locals>._plot_wrapper(*args, **kwargs)
1106 kwargs = self._set_axis_props(kwargs, "x")
1107 kwargs = self._set_axis_props(kwargs, "y")
-> 1109 return func(*args, **kwargs)
File /g/data/xp65/public/apps/med_conda/envs/analysis3-25.06/lib/python3.11/site-packages/cf_xarray/accessor.py:753, in _getattr.<locals>.wrapper(*args, **kwargs)
749 posargs, arguments = accessor._process_signature(
750 func, args, kwargs, key_mappers
751 )
752 final_func = extra_decorator(func) if extra_decorator else func
--> 753 result = final_func(*posargs, **arguments)
754 if wrap_classes and isinstance(result, _WRAPPED_CLASSES):
755 result = _CFWrappedClass(result, accessor)
File /g/data/xp65/public/apps/med_conda/envs/analysis3-25.06/lib/python3.11/site-packages/xarray/plot/accessor.py:48, in DataArrayPlotAccessor.__call__(self, **kwargs)
46 @functools.wraps(dataarray_plot.plot, assigned=("__doc__", "__annotations__"))
47 def __call__(self, **kwargs) -> Any:
---> 48 return dataarray_plot.plot(self._da, **kwargs)
File /g/data/xp65/public/apps/med_conda/envs/analysis3-25.06/lib/python3.11/site-packages/xarray/plot/dataarray_plot.py:310, in plot(darray, row, col, col_wrap, ax, hue, subplot_kws, **kwargs)
306 plotfunc = hist
308 kwargs["ax"] = ax
--> 310 return plotfunc(darray, **kwargs)
File /g/data/xp65/public/apps/med_conda/envs/analysis3-25.06/lib/python3.11/site-packages/xarray/plot/dataarray_plot.py:1606, in _plot2d.<locals>.newplotfunc(***failed resolving arguments***)
1602 raise ValueError("plt.imshow's `aspect` kwarg is not available in xarray")
1604 ax = get_axis(figsize, size, aspect, ax, **subplot_kws)
-> 1606 primitive = plotfunc(
1607 xplt,
1608 yplt,
1609 zval,
1610 ax=ax,
1611 cmap=cmap_params["cmap"],
1612 vmin=cmap_params["vmin"],
1613 vmax=cmap_params["vmax"],
1614 norm=cmap_params["norm"],
1615 **kwargs,
1616 )
1618 # Label the plot with metadata
1619 if add_labels:
File /g/data/xp65/public/apps/med_conda/envs/analysis3-25.06/lib/python3.11/site-packages/xarray/plot/dataarray_plot.py:2315, in pcolormesh(x, y, z, ax, xscale, yscale, infer_intervals, **kwargs)
2312 y = _infer_interval_breaks(y, axis=0, scale=yscale)
2314 ax.grid(False)
-> 2315 primitive = ax.pcolormesh(x, y, z, **kwargs)
2317 # by default, pcolormesh picks "round" values for bounds
2318 # this results in ugly looking plots with lots of surrounding whitespace
2319 if not hasattr(ax, "projection") and x.ndim == 1 and y.ndim == 1:
2320 # not a cartopy geoaxis
File /g/data/xp65/public/apps/med_conda/envs/analysis3-25.06/lib/python3.11/site-packages/cartopy/mpl/geoaxes.py:306, in _add_transform.<locals>.wrapper(self, *args, **kwargs)
301 raise ValueError(f'Invalid transform: Spherical {func.__name__} '
302 'is not supported - consider using '
303 'PlateCarree/RotatedPole.')
305 kwargs['transform'] = transform
--> 306 return func(self, *args, **kwargs)
File /g/data/xp65/public/apps/med_conda/envs/analysis3-25.06/lib/python3.11/site-packages/cartopy/mpl/geoaxes.py:1777, in GeoAxes.pcolormesh(self, *args, **kwargs)
1774 # Add in an argument checker to handle Matplotlib's potential
1775 # interpolation when coordinate wraps are involved
1776 args, kwargs = self._wrap_args(*args, **kwargs)
-> 1777 result = super().pcolormesh(*args, **kwargs)
1778 # Wrap the quadrilaterals if necessary
1779 result = self._wrap_quadmesh(result, **kwargs)
File /g/data/xp65/public/apps/med_conda/envs/analysis3-25.06/lib/python3.11/site-packages/matplotlib/__init__.py:1521, in _preprocess_data.<locals>.inner(ax, data, *args, **kwargs)
1518 @functools.wraps(func)
1519 def inner(ax, *args, data=None, **kwargs):
1520 if data is None:
-> 1521 return func(
1522 ax,
1523 *map(cbook.sanitize_sequence, args),
1524 **{k: cbook.sanitize_sequence(v) for k, v in kwargs.items()})
1526 bound = new_sig.bind(ax, *args, **kwargs)
1527 auto_label = (bound.arguments.get(label_namer)
1528 or bound.kwargs.get(label_namer))
File /g/data/xp65/public/apps/med_conda/envs/analysis3-25.06/lib/python3.11/site-packages/matplotlib/axes/_axes.py:6525, in Axes.pcolormesh(self, alpha, norm, cmap, vmin, vmax, colorizer, shading, antialiased, *args, **kwargs)
6522 shading = shading.lower()
6523 kwargs.setdefault('edgecolors', 'none')
-> 6525 X, Y, C, shading = self._pcolorargs('pcolormesh', *args,
6526 shading=shading, kwargs=kwargs)
6527 coords = np.stack([X, Y], axis=-1)
6529 kwargs.setdefault('snap', mpl.rcParams['pcolormesh.snap'])
File /g/data/xp65/public/apps/med_conda/envs/analysis3-25.06/lib/python3.11/site-packages/matplotlib/axes/_axes.py:6029, in Axes._pcolorargs(self, funcname, shading, *args, **kwargs)
6027 if funcname == 'pcolormesh':
6028 if np.ma.is_masked(X) or np.ma.is_masked(Y):
-> 6029 raise ValueError(
6030 'x and y arguments to pcolormesh cannot have '
6031 'non-finite values or be of type '
6032 'numpy.ma.MaskedArray with masked values')
6033 nrows, ncols = C.shape[:2]
6034 else:
ValueError: x and y arguments to pcolormesh cannot have non-finite values or be of type numpy.ma.MaskedArray with masked values
This error:
ValueError: x and y arguments to pcolormesh cannot have non-finite values or be of type numpy.ma.core.MaskedArray with masked values
is because there are NaN values in the coordinate variables, as explained in the plotting with cartopy tutorial.
By following the instructions in that tutorial and the Spatial selection with tripolar ACCESS-OM2 grid notebook the coordinates can be fixed by replacing them with coordinates from the ice grid input file. It requires some work, the dimensions must be renamed to match, and coordinates converted from radians to degrees.
[27]:
ice_grid = xr.open_dataset('/g/data/ik11/inputs/access-om2/input_eee21b65/cice_025deg/grid.nc').rename({'ny': 'nj', 'nx': 'ni'})
ice_grid = ice_grid.pint.quantify()
ice_air_temp = ice_air_temp.assign_coords({'TLON': ice_grid.tlon.pint.to('degrees_E'),
'TLAT': ice_grid.tlat.pint.to('degrees_N')})
ice_air_temp
WARNING:pint.util:Redefining 'degrees_north' (<class 'pint.delegates.txt_defparser.plain.UnitDefinition'>)
WARNING:pint.util:Redefining 'degrees_N' (<class 'pint.delegates.txt_defparser.plain.UnitDefinition'>)
WARNING:pint.util:Redefining 'degreesN' (<class 'pint.delegates.txt_defparser.plain.UnitDefinition'>)
WARNING:pint.util:Redefining 'degree_north' (<class 'pint.delegates.txt_defparser.plain.UnitDefinition'>)
WARNING:pint.util:Redefining 'degree_N' (<class 'pint.delegates.txt_defparser.plain.UnitDefinition'>)
WARNING:pint.util:Redefining 'degreeN' (<class 'pint.delegates.txt_defparser.plain.UnitDefinition'>)
WARNING:pint.util:Redefining 'degrees_east' (<class 'pint.delegates.txt_defparser.plain.UnitDefinition'>)
WARNING:pint.util:Redefining 'degrees_E' (<class 'pint.delegates.txt_defparser.plain.UnitDefinition'>)
WARNING:pint.util:Redefining 'degreesE' (<class 'pint.delegates.txt_defparser.plain.UnitDefinition'>)
WARNING:pint.util:Redefining 'degree_east' (<class 'pint.delegates.txt_defparser.plain.UnitDefinition'>)
WARNING:pint.util:Redefining 'degree_E' (<class 'pint.delegates.txt_defparser.plain.UnitDefinition'>)
WARNING:pint.util:Redefining 'degreeE' (<class 'pint.delegates.txt_defparser.plain.UnitDefinition'>)
WARNING:pint.util:Redefining 'gpm' (<class 'pint.delegates.txt_defparser.plain.UnitDefinition'>)
[27]:
<xarray.DataArray 'Tair_m' (time: 732, nj: 1080, ni: 1440)> Size: 5GB
dask.array<concatenate, shape=(732, 1080, 1440), dtype=float32, chunksize=(1, 540, 720), chunktype=numpy.ndarray>
Coordinates:
* time (time) datetime64[ns] 6kB 1958-02-01 1958-03-01 ... 2019-01-01
TLON (nj, ni) float64 12MB [degrees_E] -279.9 -279.6 ... 80.0 80.0
TLAT (nj, ni) float64 12MB [degrees_N] -81.08 -81.08 ... 65.13 65.03
Dimensions without coordinates: nj, ni
Attributes:
units: C
long_name: air temperature
cell_measures: area: tarea
cell_methods: time: mean
time_rep: averagedFinally, the generic plotting routine works:
[28]:
map_mean_temp_in_degrees_celsius(ice_air_temp)
One more step is to modify the original routine to take the vertical range as an argument, so it is more generally useful:
[30]:
def map_mean_temp_in_degrees_celsius(da, vmin=-2, vmax=30):
# Take the time mean of da and plot a global temperature field in a Robinson projection
#
# Input DataArray (da) should be a 3D array of latitude, longitude and time.
da = da.pint.quantify().pint.to('C')
da_time_mean = da.cf.mean('time')
plt.figure(figsize=(12, 6))
ax = plt.axes(projection=ccrs.Robinson())
da_time_mean = da_time_mean.pint.dequantify()
da_time_mean.cf.plot(ax=ax,
x='longitude', y='latitude',
transform=ccrs.PlateCarree(),
vmin=vmin, vmax=vmax, extend='both',
cmap=cm.cm.thermal)
ax.coastlines()
By specifying default values for the arguments it is completely backwards compatible, we have lost no functionality, but the ice air temperature can now be plotted with a range that better suits the range of the data
[31]:
map_mean_temp_in_degrees_celsius(ice_air_temp, vmin=-30)
Conclusion¶
Model-specific code to take a time mean and plot the data was converted to a model agnostic function with no loss of functionality.
The same function can be used with a wide range CF-compliant data.
In some cases the input data will need to be modified if it is not sufficiently compliant, or non-conforming in some way (such as the ice data above with NaN’s in the coordinate). It is better to modify the data to be more compliant and higher quality, and use generic tools, than have multiple code versions to account for the vagaries or problems of individual datasets.
Ideally, those improvements can be incorporated into future versions of the data at source, in post-processing, or in some utility functions for transforming a class of non-conforming data.
[ ]:
client.close()