Sea Ice Comparisons to Observations¶

This script shows how to load and plot sea ice concentration from CICE output and compare it to the NSIDC CDR (National Snow and Ice Data Centre, Climate Data Record) dataset

This notebook uses the ACCESS-NRI Intake Catalog following the examples in Tutorials/Using the Intake Catalog.

Requirements: The runs analysed here are only in access-nri-intake-catalog version 0.1.0 or newer. This is included in conda/analysis3-23.10 or newer modules from /g/data/hh5/public/modules.

OM2 Experiments:

These are the ACCESS-OM2 runs we are going to use, we can compare results from prototype runs forced with ERA5 against normal runs using JRA55do, as described on the ACCESS_HIVE. To compare against the observational datasets, we use IAF (Inter-Annual Forcing). N.B. The JRA55do runs used here a slightly different to the typical (e.g. 025deg_jra55_iaf_omip2_cycle6) in the model version used and the timeframes evaluated.

[1]:

RUNS = {
    "025deg_era5": ["025deg_era5_iaf"],  # (our name: run name(s))
    "025deg_jra55": ["025deg_jra55_iaf_era5comparison"],
    "1deg_era5": ["1deg_era5_iaf"],
    "1deg_jra55": ["1deg_jra55_iaf_era5comparison"],
}

We are going to look at Sea Ice Concentration and Sea Ice Volume

[2]:

VARS = ["aice_m", "hi_m" ]  # ice area fraction or sea ice concentration, ice thickness averaged by grid cell area
VARS_2D = ["area_t", "geolat_t", "geolon_t"]

Observational Data:

Sea Ice concentration is measured through passive microwave remote sensing. We are going to use the NSIDC CDR Dataset (described at nsidc.org)

[3]:

OBS_TIME_SLICE = slice("1979", "2022")
sh_obs_url = "https://polarwatch.noaa.gov/erddap/griddap/nsidcG02202v4shmday"
nh_obs_url = "https://polarwatch.noaa.gov/erddap/griddap/nsidcG02202v4nhmday"

Load depenencies:

[4]:

from intake import cat
from datatree import DataTree, map_over_subtree

from dask.distributed import Client

import xarray as xr
import numpy as np
from datetime import timedelta
import cf_xarray as cfxr
import xesmf

# plotting
import matplotlib.pyplot as plt
import cartopy.crs as ccrs
import cmocean.cm as cmo
import matplotlib.lines as mlines

A standard way to calculate climatologies. (We start in 1991 as earlier decades are influenced by model spin-up for 0.25deg runs which only start in 1980.)

[5]:

CLIMAT_TIME_SLICE = slice("1991", "2020")


def climatology(ds):
    return ds.sel(time=CLIMAT_TIME_SLICE).groupby("time.month").mean("time")

Start a dask client

[6]:

client = Client(threads_per_worker=1)

[7]:

client

[7]:

Client

Client-82739af4-af4f-11ee-a8ab-00000087fe80

Connection method: Cluster object	Cluster type: distributed.LocalCluster
Dashboard: /proxy/8787/status

Cluster Info

LocalCluster

368ce898

Dashboard: /proxy/8787/status	Workers: 12
Total threads: 12	Total memory: 46.00 GiB
Status: running	Using processes: True

Scheduler Info

Scheduler

Scheduler-5e767eb4-9bb1-4909-9836-f71918916db3

Comm: tcp://127.0.0.1:36437	Workers: 12
Dashboard: /proxy/8787/status	Total threads: 12
Started: Just now	Total memory: 46.00 GiB

Workers

Worker: 0

Comm: tcp://127.0.0.1:34807	Total threads: 1
Dashboard: /proxy/45181/status	Memory: 3.83 GiB
Nanny: tcp://127.0.0.1:35097
Local directory: /jobfs/105828095.gadi-pbs/dask-scratch-space/worker-hss2qp4d

Worker: 1

Comm: tcp://127.0.0.1:35411	Total threads: 1
Dashboard: /proxy/44133/status	Memory: 3.83 GiB
Nanny: tcp://127.0.0.1:35507
Local directory: /jobfs/105828095.gadi-pbs/dask-scratch-space/worker-jfi7ibr0

Worker: 2

Comm: tcp://127.0.0.1:45509	Total threads: 1
Dashboard: /proxy/44183/status	Memory: 3.83 GiB
Nanny: tcp://127.0.0.1:32881
Local directory: /jobfs/105828095.gadi-pbs/dask-scratch-space/worker-wteqer_y

Worker: 3

Comm: tcp://127.0.0.1:45305	Total threads: 1
Dashboard: /proxy/43281/status	Memory: 3.83 GiB
Nanny: tcp://127.0.0.1:42787
Local directory: /jobfs/105828095.gadi-pbs/dask-scratch-space/worker-qmj97pzg

Worker: 4

Comm: tcp://127.0.0.1:32899	Total threads: 1
Dashboard: /proxy/35169/status	Memory: 3.83 GiB
Nanny: tcp://127.0.0.1:37179
Local directory: /jobfs/105828095.gadi-pbs/dask-scratch-space/worker-9_dlevac

Worker: 5

Comm: tcp://127.0.0.1:46269	Total threads: 1
Dashboard: /proxy/43181/status	Memory: 3.83 GiB
Nanny: tcp://127.0.0.1:46061
Local directory: /jobfs/105828095.gadi-pbs/dask-scratch-space/worker-pz5meqx9

Worker: 6

Comm: tcp://127.0.0.1:37447	Total threads: 1
Dashboard: /proxy/39947/status	Memory: 3.83 GiB
Nanny: tcp://127.0.0.1:44547
Local directory: /jobfs/105828095.gadi-pbs/dask-scratch-space/worker-9xxdg58l

Worker: 7

Comm: tcp://127.0.0.1:40031	Total threads: 1
Dashboard: /proxy/35517/status	Memory: 3.83 GiB
Nanny: tcp://127.0.0.1:43751
Local directory: /jobfs/105828095.gadi-pbs/dask-scratch-space/worker-1zozkmca

Worker: 8

Comm: tcp://127.0.0.1:45655	Total threads: 1
Dashboard: /proxy/36103/status	Memory: 3.83 GiB
Nanny: tcp://127.0.0.1:40749
Local directory: /jobfs/105828095.gadi-pbs/dask-scratch-space/worker-9jire7j6

Worker: 9

Comm: tcp://127.0.0.1:35597	Total threads: 1
Dashboard: /proxy/33815/status	Memory: 3.83 GiB
Nanny: tcp://127.0.0.1:41195
Local directory: /jobfs/105828095.gadi-pbs/dask-scratch-space/worker-u_kzqqfh

Worker: 10

Comm: tcp://127.0.0.1:43801	Total threads: 1
Dashboard: /proxy/41293/status	Memory: 3.83 GiB
Nanny: tcp://127.0.0.1:45225
Local directory: /jobfs/105828095.gadi-pbs/dask-scratch-space/worker-qa5n7tm3

Worker: 11

Comm: tcp://127.0.0.1:45909	Total threads: 1
Dashboard: /proxy/34123/status	Memory: 3.83 GiB
Nanny: tcp://127.0.0.1:43379
Local directory: /jobfs/105828095.gadi-pbs/dask-scratch-space/worker-nxj030bj

Open the ACCESS-NRI Intake Catalog

[8]:

catalog = cat.access_nri

Load the ACCESS-OM2 results¶

For CICE data in OM2, we need to do some wrangling to make it easier to deal with. This is described in more detail in DocumentedExamples/SeaIce_Plot_Example. Its included in this function:

[9]:

def open_by_name(name, vars):
    """Return a dataset for the requested name and vars"""
    return (
        catalog[name]
        .search(variable=vars)
        .to_dask(
            xarray_open_kwargs={
                "chunks": {"time": "auto", "ni": -1, "nj": -1},
                "decode_coords": False,
            },
            xarray_combine_by_coords_kwargs={
                "compat": "override",
                "data_vars": "minimal",
                "coords": "minimal",
            },
        )
    )

[10]:

def open_by_experiment(exp_name, vars):
    """Concatenate any datasets provided for this experiment into one ds, and add area and geo coordinates"""

    # get the data for each run of this config
    cice_ds = xr.concat(
        [open_by_name(iName, vars) for iName in RUNS[exp_name]], dim="time"
    )

    # We also want the area/lat/lon fields, but these are not time dependent.
    area_ds = xr.merge(
        [
            xr.open_dataset(
                catalog[RUNS[exp_name][0]]
                .search(variable=iVar)
                .df.path[0]
                # path of the first file with the area field, the geolon field and the geolat field
            ).drop_vars("time")
            for iVar in VARS_2D
        ]
    )

    # Label the lats and lons
    cice_ds.coords["ni"] = area_ds["xt_ocean"].values
    cice_ds.coords["nj"] = area_ds["yt_ocean"].values

    # Copy attributes for cf compliance
    cice_ds.ni.attrs = area_ds.xt_ocean.attrs
    cice_ds.nj.attrs = area_ds.yt_ocean.attrs

    cice_ds = cice_ds.rename(({"ni": "xt_ocean", "nj": "yt_ocean"}))

    # Add the geolon, geolat, and area as extra co-ordinates fields from area_t

    cice_ds = cice_ds.assign_coords(
        {
            "geolat_t": area_ds.geolat_t,
            "geolon_t": area_ds.geolon_t,
            "area_t": area_ds.area_t,
        }
    )

    # cice timestamps are also misleading:
    cice_ds["time"] = cice_ds.time.to_pandas() - timedelta(minutes=1)

    return cice_ds

Because the dimensions are different for different experiments, they would not fit in a Dataset, a DataTree is required. The DataTree has a group for each experiment, which contains a xarray dataset with the data for that experiment.

[11]:

%%time

si_name_ds_pairs = [(iRun, open_by_experiment(iRun, VARS)) for iRun in RUNS.keys()]

si_dt = DataTree.from_dict(dict(si_name_ds_pairs))

0.3.0
CPU times: user 35.8 s, sys: 3.29 s, total: 39 s
Wall time: 1min 27s

[12]:

@map_over_subtree
def match_timestamps_to_CDR(ds):
    cice_ds = ds.copy()

    # we are going to use the same timestamps as NSIDC
    cice_ds["time"] = [
        np.datetime64(str(i)[0:7] + "-01T00:00:00.000000000")
        for i in cice_ds.time.values
    ]

    return cice_ds

[13]:

si_dt = match_timestamps_to_CDR(si_dt)

The result is a datatree, with a dataset for each experiment and timestamps which align with the observational timestamps

[14]:

si_dt

[14]:

<xarray.DatasetView>
Dimensions:  ()
Data variables:
    *empty*

Load the observational dataset¶

The CDR dataset has the area of each grid cell provided as a seperate file, which we need to load

[15]:

def open_cdr_dataset(path, area_file):
    ds = xr.open_dataset(path).rename(
        {'cdr_seaice_conc_monthly': 'cdr_conc', 'xgrid':'x','ygrid':'y'}
    )

    # # we also need the area of each gridcell
    areasNd = np.fromfile(area_file, dtype=np.int32).reshape(
        ds.cdr_conc.isel(time=0).shape
    )
    # # Divide by 1000 to get km2 (https://web.archive.org/web/20170817210544/http://nsidc.org/data/polar-stereo/tools_geo_pixel.html#pixel_area)
    areasKmNd_sh = areasNd / 1000

    ds["area"] = xr.DataArray(areasKmNd_sh, dims=["y", "x"])
    ds = ds.set_coords("area")

    ds = ds.cdr_conc
    ds = ds.where(ds<=1)  # convert error codes to Nan

    return ds

We can pull the datasets direct from the url, however the cell area needs to be downloaded seperately:

[16]:

!wget --ftp-user=anonymous -nc ftp://sidads.colorado.edu/DATASETS/seaice/polar-stereo/tools/pss25area_v3.dat ftp://sidads.colorado.edu/DATASETS/seaice/polar-stereo/tools/psn25area_v3.dat

File ‘pss25area_v3.dat’ already there; not retrieving.
File ‘psn25area_v3.dat’ already there; not retrieving.

We are interested in the Antarctic, but the lines for the Arctic are below and commented out.

[17]:

sh_cdr_xr = open_cdr_dataset(sh_obs_url, "pss25area_v3.dat")

# nh_cdr_xr = open_cdr_dataset(
#     nh_obs_url,
#     'psn25area_v3.dat'
# )

cdr_dt = DataTree.from_dict(
    {
        "cdr_sh": sh_cdr_xr,
        # 'cdr_nh':nh_cdr_xr
    }
)

The result is a datatree, with a datasets for the relevant hemisphere

Calculate Sea Ice Area¶

Sea ice area is the circumpolar sum of sea ice concentration multiplied by the area of each grid cell. By convention, and because lower concentrations are not accurate when measured through remote sensing, concentrations below 0.15 are not included

[18]:

def sea_ice_area(sic, area, range=[0.15, 1]):
    return (sic * area).where((sic >= range[0]) * (sic <= range[1])).cf.sum(["X", "Y"])

Calculate for observational data, and remove gaps with missing data

[19]:

@map_over_subtree
def sea_ice_area_obs(ds):
    sic = ds.cdr_conc
    result = sea_ice_area(sic, sic.area).to_dataset(name="cdr_area")

    # Theres a couple of data gaps which should be nan
    result.loc[{"time": "1988-01-01"}] = np.nan
    result.loc[{"time": "1987-12"}] = np.nan

    return result.sel(time=OBS_TIME_SLICE)

[20]:

obs_area_dt = sea_ice_area_obs(cdr_dt)

[21]:

# Theres another gap which should be nan in the arctic only
# obs_area_dt['cdr_nh'].to_dataset().loc[{'time':'1984-07'}]=np.nan

Calculate for model data, limit to southern hemisphere / Antarctica

[22]:

@map_over_subtree
def sea_ice_area_model_sh(ds):
    sic = ds.aice_m.cf.sel(Y=slice(-90, 0))
    area_km2 = ds.area_t / 1e6

    return sea_ice_area(sic, area_km2).to_dataset(name="si_area").load()

[23]:

model_area_dt = sea_ice_area_model_sh(si_dt)

Sea Ice Area Trends¶

We are going to compare the trends in the minima and maxima over time, and the monthly climatology

[24]:

@map_over_subtree
def min_and_max(ds):
    def min_and_max_year(da):
        result = xr.Dataset()
        result["min"] = da.min()
        result["max"] = da.max()
        return result

    annual_min_max_ds = ds.si_area.groupby("time.year").apply(min_and_max_year)
    return annual_min_max_ds


model_min_max_dt = min_and_max(model_area_dt)

[25]:

for iGroup in model_min_max_dt.groups[1:]:
    ds = model_min_max_dt[iGroup].ds
    ds["min"].plot(label=iGroup[1:])

obs_area_dt["cdr_sh"].ds.cdr_area.groupby("time.year").min().plot(label="CDR")

plt.title("Trends in Sea-Ice Minima")
plt.ylabel("Sea-Ice Area ($km^2$)")

_ = plt.legend()

../_images/DocumentedExamples_SeaIce_Obs_Model_Compare_49_0.png

We see that all models have sea ice area which is too low in summer. Model runs forced by JRA have more variability than those forced by ERA5 and are slightly closer to the measured sea ice area from observations.

[26]:

for iGroup in model_min_max_dt.groups[1:]:
    ds = model_min_max_dt[iGroup].ds.sel(
        year=slice(1950, 2022)
    )  # drop 2023 because its incomplete
    ds["max"].plot(label=iGroup[1:])

obs_area_dt["cdr_sh"].ds.cdr_area.groupby("time.year").max().plot(label="CDR")

plt.title("Trends in Sea-Ice Maxima")
plt.ylabel("Sea-Ice Area ($km^2$)")

_ = plt.legend()

../_images/DocumentedExamples_SeaIce_Obs_Model_Compare_51_0.png

We see that the closest runs are those forced by ERA5, and the runs forced by JRA55 are under representing Winter sea ice.

[27]:

for iGroup in model_area_dt.groups[1:]:
    climatology(model_area_dt[iGroup].ds.si_area).plot(label=iGroup[1:])

climatology(obs_area_dt["cdr_sh"].ds.cdr_area).plot(label="CDR")

plt.title("Climatology of Sea-Ice Area")
plt.ylabel("Sea-Ice Area ($km^2$)")

plt.legend()

[27]:

<matplotlib.legend.Legend at 0x14e242887850>

../_images/DocumentedExamples_SeaIce_Obs_Model_Compare_53_1.png

We see all model runs have too low sea ice in summer, and grow faster than observations in autumn and earlier that the observed maximum.

Sea Ice Concentration Anomalies¶

To examine the differences between the model results and observations, we calculate difference in each grid cell between observations and each experiment

As that data are on different grids, we need to regrid to compare the datasets

Lets simplify a little to only look at 0.25 degree results

[28]:

groups = ("/025deg_era5", "/025deg_jra55")

The lat/lon of of each cell in the observational dataset are in a different file:

[29]:

! wget -nc https://noaadata.apps.nsidc.org/NOAA/G02202_V4/ancillary/G02202-cdr-ancillary-sh.nc

File ‘G02202-cdr-ancillary-sh.nc’ already there; not retrieving.

[30]:

cdr_sps_ds = xr.open_dataset("G02202-cdr-ancillary-sh.nc")

We can now build the re-gridder. This is described in detail in DocumentedExamples/Regridding

[31]:

regridder_ACCESSOM2_025deg_sh = xesmf.Regridder(
    si_dt["025deg_era5"].ds.isel(time=0).drop_vars(["xt_ocean", "yt_ocean"]),
    cdr_sps_ds,
    "bilinear",
    periodic=True,
    unmapped_to_nan=True,
)

[32]:

aice_sh_3976_ds = xr.Dataset()
aice_sh_anom_ds = xr.Dataset()

for iG in groups:
    aice_sh_3976_ds[iG] = regridder_ACCESSOM2_025deg_sh(
        si_dt[iG].ds.aice_m.copy()
    )
    aice_sh_anom_ds[iG] = aice_sh_3976_ds[iG] - cdr_dt["cdr_sh"].ds.cdr_conc

We can now plot the difference between modelled and observed sea ice concentration

[33]:

months = [2, 9]  # february, september
month_names = ["Feb", "Sep"]
n_months = len(months)

plt.figure(figsize=(9, n_months * 3))

cdr = climatology(cdr_dt["cdr_sh"].ds.cdr_conc)

for j, iGroup in enumerate(aice_sh_anom_ds.data_vars):
    anoms = climatology(aice_sh_anom_ds[iGroup])
    aice = climatology(aice_sh_3976_ds[iGroup])

    for i, iMonth in enumerate(months):
        plt.subplot(
            n_months,
            3,
            j + 1 + i * 3,
            projection=ccrs.SouthPolarStereo(true_scale_latitude=-70),
        )

        # Filled contours with concentration anomalies in this month
        ds = anoms.sel(month=iMonth).compute()
        plt.contourf(
            ds.x, ds.y, ds, levels=np.arange(-0.85, 0.86, 0.1), cmap=cmo.balance_r
        )

        # Lines at 15% concentration (approx ice edge)
        cs_cdr = cdr.sel(month=iMonth).plot.contour(levels=[0.15])
        cs_mod = aice.sel(month=iMonth).plot.contour(levels=[0.15], colors=["black"])

        plt.title(month_names[i] + "_" + iGroup[1:])

# Messy legend creation
color_cdr = cs_cdr.collections[0].get_edgecolor()
line_cdr = mlines.Line2D([], [], color=color_cdr, label="Observed Extent")

color_mod = cs_mod.collections[0].get_edgecolor()
line_mod = mlines.Line2D([], [], color=color_mod, label="Modelled Extent")

plt.legend(handles=[line_cdr, line_mod], loc="center left", bbox_to_anchor=(1.2, 0.5))

# And colorbar
cax = plt.axes([0.7, 0.6, 0.05, 0.2])
_ = plt.colorbar(cax=cax, label="Difference in \nSea Ice Concentration")

/g/data/hh5/public/apps/miniconda3/envs/analysis3-23.10/lib/python3.10/site-packages/distributed/client.py:3162: UserWarning: Sending large graph of size 217.46 MiB.
This may cause some slowdown.
Consider scattering data ahead of time and using futures.
  warnings.warn(
/g/data/hh5/public/apps/miniconda3/envs/analysis3-23.10/lib/python3.10/site-packages/distributed/client.py:3162: UserWarning: Sending large graph of size 217.46 MiB.
This may cause some slowdown.
Consider scattering data ahead of time and using futures.
  warnings.warn(
/g/data/hh5/public/apps/miniconda3/envs/analysis3-23.10/lib/python3.10/site-packages/distributed/client.py:3162: UserWarning: Sending large graph of size 217.46 MiB.
This may cause some slowdown.
Consider scattering data ahead of time and using futures.
  warnings.warn(
/g/data/hh5/public/apps/miniconda3/envs/analysis3-23.10/lib/python3.10/site-packages/distributed/client.py:3162: UserWarning: Sending large graph of size 217.46 MiB.
This may cause some slowdown.
Consider scattering data ahead of time and using futures.
  warnings.warn(

../_images/DocumentedExamples_SeaIce_Obs_Model_Compare_65_1.png

We see that OM2 under-represents sea ice in Summer, particularly in the Weddell Sea. In Winter, trends are less clear, although ERA5 forced sea ice concentration is too high at the northern boundary.

Sea Ice Volume¶

We can calculate volume in much the same way as area, except vicen is volume (per unit area) in each of 5 thickness classes, so we can sum this for the volume irrespective of thickness.

[34]:

@map_over_subtree
def sea_ice_vol_model_sh(ds):
    vice_m = ds.hi_m.sel(yt_ocean=slice(-90, 0))
    siv_total_da = (
        (vice_m * ds.area_t)
        .where(~np.isnan(ds.area_t))
        .cf.sum(["longitude", "latitude"])
    )

    return siv_total_da.to_dataset(name="si_vol").load()

[35]:

si_vol_dt = sea_ice_vol_model_sh(si_dt)

[36]:

@map_over_subtree
def min_and_max(ds):
    def min_and_max_year(da):
        result = xr.Dataset()
        result["min"] = da.min()
        result["max"] = da.max()
        return result

    annual_min_max_ds = ds.si_vol.groupby("time.year").apply(min_and_max_year)
    return annual_min_max_ds


model_min_max_dt = min_and_max(si_vol_dt)

[37]:

for iGroup in model_min_max_dt.groups[1:]:
    ds = model_min_max_dt[iGroup].ds
    ds["min"].plot(label=iGroup[1:])

plt.title("Trends in Sea-Ice Minimum Volume")
plt.ylabel("Sea-Ice Volume ($m^3$)")

_ = plt.legend()

../_images/DocumentedExamples_SeaIce_Obs_Model_Compare_72_0.png

[38]:

for iGroup in model_min_max_dt.groups[1:]:
    ds = model_min_max_dt[iGroup].ds.sel(
        year=slice(1950, 2022)
    )  # drop the last year because its incomplete
    ds["max"].plot(label=iGroup[1:])

plt.title("Trends in Sea-Ice Maximum Volume")
plt.ylabel("Sea-Ice Volume ($m^3$)")

_ = plt.legend()

../_images/DocumentedExamples_SeaIce_Obs_Model_Compare_73_0.png

[39]:

for iGroup in si_vol_dt.groups[1:]:
    climatology(si_vol_dt[iGroup].ds.si_vol).plot(label=iGroup[1:])

plt.title("Climatology of Sea-Ice Volume")
plt.ylabel("Sea-Ice Volume ($m^3$)")

_ = plt.legend()

../_images/DocumentedExamples_SeaIce_Obs_Model_Compare_74_0.png

[40]:

client.close()

0.3.0
0.3.0
0.3.0
0.3.0
0.3.0
0.3.0
0.3.0
0.3.0
0.3.0
0.3.0
0.3.0
0.3.0

[ ]: