This tutorial demonstrates best practice to vectorise functions that want to be applied across all grid points.
When you need to compute something separately at many gridpoints, especially if it is fast at a single gridpoint, putting this computation into a for loop can be very slow. Instead, it is prefereable to vectorise a function, so that the numpy and/or dask backend can distribute the work across multiple cores.
That is, to find the mean at each location, with an xarray dataset with dimensions lon, lat, and time instead of
which xarray inteprets as to take the mean in the time dimension for every gridpoint.
Some functions, such as scipy.stats.linregress, do not have in-build vectorisation, but you might want to apply a function like this to every gridpoint, and for loops would be slow.
This tutorial provides a few examples of how to apply functions which do not natively vectorise many times to an xarray dataset, vectorised so that a dask client can speed up the calculation. We answer here a dummy question “What is sea-surface temperature trend at each gridpoint of an ocean model, and is it significant?”. Scientifically, this question mostly applies to the forcing dataset and not the ocean model, but it’s as good an example as any.
To achieve this goal, we use xarray.apply_ufunc, which is very versatile, but therefore takes many arguments that can be difficult to interpret at first glance. The aim of the example below is to give something that will work on a problem similar to what COSIMA users may encounter.
bottom pressure on T cells [Boussinesq (volume conserving) model],applied pressure on T cells,t-cell rho*thickness,t-cell thickness,effective sea level (eta_t + patm/(rho0*g)) on T cells,square of effective sea level (eta_t + patm/(rho0*g)) on T cells,Potential temperature,Conservative temperature,Potential temperature,squared Potential temperature,Conservative temperature,Practical Salinity,Practical Salinity,squared Practical Salinity,Practical Salinity,Age (global),mixed layer depth determined by density criteria,mixed layer depth determined by density criteria,mixed layer depth determined by density criteria,squared mixed layer depth determined by density criteria,quasi-barotropic strmfcn psiu (compatible with tx_trans),quasi-barotropic strmfcn psiv (compatible with ty_trans),buoy freq at T-cell centre for use in neutral physics,Squared buoyancy frequency at T-cell bottom,T-cell boundary layer depth from KPP,potential density referenced to 0 dbar,potential density referenced to 2000 dbar,in situ density,surface height on T cells [Boussinesq (volume conserving) model],i-current,j-current,dia-surface velocity T-points,T-cell i-mass transport,T-cell j-mass transport,T-cell k-mass transport,T-cell mass i-transport from GM,T-cell mass j-transport from GM,T-cell mass i-transport from submesoscale param,T-cell mass j-transport from submesoscale param,T-cell i-mass transport on pot_rho,T-cell j-mass transport on pot_rho,T-cell i-mass transport from GM on pot_rho,T-cell j-mass transport from GM on pot_rho,T-cell i-mass transport from submesoscale param on neutral rho,T-cell j-mass transport from submesoscale param on neutral rho,T-cell i-mass transport vertically summed,T-cell j-mass transport vertically summed,z-integral of cp*rho*dyt*u*temp,z-integral of cp*rho*dxt*v*temp,z-integral cp*gm_yflux*dyt*rho_dzt*temp,z-integral cp*gm_xflux*dyt*rho_dzt*temp,z-integral cp*ndiffuse_xflux*dyt*rho_dzt*temp,z-integral cp*ndiffuse_yflux*dxt*rho_dzt*temp,z-integral cp*submeso_yflux*dxt*rho_dzt*temp,z-integral cp*submeso_xflux*dyt*rho_dzt*temp,liquid precip (including ice melt/form) into ocean (>0 enters ocean),snow falling onto ocean (>0 enters ocean),mass flux from evaporation/condensation (>0 enters ocean),mass flux of liquid river runoff entering ocean,water flux transferred with sea ice form/melt (>0 enters ocean),mass flux of precip-evap+river via sbc (liquid, frozen, evaporation),water into ocean due to ice melt (>0 enters ocean),water out of ocean due to ice form (>0 enters ocean),precip-evap into ocean (total w/ restore + normalize),sfc_salt_flux_ice,sfc_salt_flux_runoff,sfc_salt_flux_restore: flux from restoring term,sfc_salt_flux_coupler: flux from the coupler,heat flux from precip transfer of water across ocean surface,heat flux from evap transfer of water across ocean surface,heat flux (relative to 0C) from liquid river runoff,heat flux to melt frozen precip (<0 cools ocean),Vertical sum of ocn frazil heat flux over time step,longwave flux into ocean (<0 cools ocean),latent heat flux into ocean (<0 cools ocean),sensible heat into ocean (<0 cools ocean),shortwave flux into ocean (>0 heats ocean),penetrative shortwave heating,heat into ocean due to melting ice (>0 heats ocean),heat into ocean due to land ice discharge-melt (>0 heats ocean),surface ocean heat flux coming through coupler and mass transfer,cp*rivermix*rho_dzt*temp,surface heat flux coming through coupler,heat flux (relative to 0C) from pme transfer of water across ocean surface,i-directed wind stress forcing u-velocity,j-directed wind stress forcing v-velocity,Bottom u-stress via bottom drag,Bottom v-stress via bottom drag,vertical piece of Ertel PV: (f+zeta)*N^2,i-current,j-current,Thickness and rho wghtd horz bih frict on u-zonal,Thickness and rho wghtd horz bih frict on v-merid,3d velocity dot product with 3d gradient of vertical piece of Ertel PV: u.grad((f+zeta)*N^2),Ekman vertical velocity averaged to wt-point,surface height including steric contribution,Potential temperature,Potential temperature,Start time for average period,End time for average period,Length of average period,time axis boundaries
dbar,Pa,(kg/m^3)*m,m,meter,m^2,K,K,K,squared K,deg_C,psu,psu,squared psu,psu,yr,m,m,m,m^2,kg/s,kg/s,1/s,1/s^2,m,kg/m^3,kg/m^3,kg/m^3,meter,m/sec,m/sec,m/sec,kg/s,kg/s,kg/s,kg/s,kg/s,kg/s,kg/s,kg/s,kg/s,kg/s,kg/s,kg/s,kg/s,kg/s,kg/s,Watts,Watts,Watt,Watt,Watt,Watt,Watt,Watt,(kg/m^3)*(m/sec),(kg/m^3)*(m/sec),(kg/m^3)*(m/sec),(kg/m^3)*(m/sec),(kg/m^3)*(m/sec),(kg/m^3)*(m/sec),(kg/m^3)*(m/sec),(kg/m^3)*(m/sec),(kg/m^3)*(m/sec),kg/(m^2*sec),kg/(m^2*sec),kg/(m^2*sec),kg/(m^2*sec),Watts/m^2,Watts/m^2,Watts/m^2,W/m^2,W/m^2,W/m^2,W/m^2,W/m^2,W/m^2,W/m^2,(W/m^2),(W/m^2),Watts/m^2,Watt/m^2,Watts/m^2,Watts/m^2,N/m^2,N/m^2,N/m^2,N/m^2,1/sec^3,m/sec,m/sec,(kg/m^3)*(m^2/s^2),(kg/m^3)*(m^2/s^2),1/sec^4,m/s,meter,K,K,days since 0001-01-01 00:00:00,days since 0001-01-01 00:00:00,days,days
intake_esm_attrs:filename :
ocean_month.nc
intake_esm_attrs:file_id :
ocean_month
intake_esm_attrs:_data_format_ :
netcdf
intake_esm_dataset_key :
ocean_month.1mon
Rechunk so that there is only one chunk in the time dimension that will be used the linear regression.
bottom pressure on T cells [Boussinesq (volume conserving) model],applied pressure on T cells,t-cell rho*thickness,t-cell thickness,effective sea level (eta_t + patm/(rho0*g)) on T cells,square of effective sea level (eta_t + patm/(rho0*g)) on T cells,Potential temperature,Conservative temperature,Potential temperature,squared Potential temperature,Conservative temperature,Practical Salinity,Practical Salinity,squared Practical Salinity,Practical Salinity,Age (global),mixed layer depth determined by density criteria,mixed layer depth determined by density criteria,mixed layer depth determined by density criteria,squared mixed layer depth determined by density criteria,quasi-barotropic strmfcn psiu (compatible with tx_trans),quasi-barotropic strmfcn psiv (compatible with ty_trans),buoy freq at T-cell centre for use in neutral physics,Squared buoyancy frequency at T-cell bottom,T-cell boundary layer depth from KPP,potential density referenced to 0 dbar,potential density referenced to 2000 dbar,in situ density,surface height on T cells [Boussinesq (volume conserving) model],i-current,j-current,dia-surface velocity T-points,T-cell i-mass transport,T-cell j-mass transport,T-cell k-mass transport,T-cell mass i-transport from GM,T-cell mass j-transport from GM,T-cell mass i-transport from submesoscale param,T-cell mass j-transport from submesoscale param,T-cell i-mass transport on pot_rho,T-cell j-mass transport on pot_rho,T-cell i-mass transport from GM on pot_rho,T-cell j-mass transport from GM on pot_rho,T-cell i-mass transport from submesoscale param on neutral rho,T-cell j-mass transport from submesoscale param on neutral rho,T-cell i-mass transport vertically summed,T-cell j-mass transport vertically summed,z-integral of cp*rho*dyt*u*temp,z-integral of cp*rho*dxt*v*temp,z-integral cp*gm_yflux*dyt*rho_dzt*temp,z-integral cp*gm_xflux*dyt*rho_dzt*temp,z-integral cp*ndiffuse_xflux*dyt*rho_dzt*temp,z-integral cp*ndiffuse_yflux*dxt*rho_dzt*temp,z-integral cp*submeso_yflux*dxt*rho_dzt*temp,z-integral cp*submeso_xflux*dyt*rho_dzt*temp,liquid precip (including ice melt/form) into ocean (>0 enters ocean),snow falling onto ocean (>0 enters ocean),mass flux from evaporation/condensation (>0 enters ocean),mass flux of liquid river runoff entering ocean,water flux transferred with sea ice form/melt (>0 enters ocean),mass flux of precip-evap+river via sbc (liquid, frozen, evaporation),water into ocean due to ice melt (>0 enters ocean),water out of ocean due to ice form (>0 enters ocean),precip-evap into ocean (total w/ restore + normalize),sfc_salt_flux_ice,sfc_salt_flux_runoff,sfc_salt_flux_restore: flux from restoring term,sfc_salt_flux_coupler: flux from the coupler,heat flux from precip transfer of water across ocean surface,heat flux from evap transfer of water across ocean surface,heat flux (relative to 0C) from liquid river runoff,heat flux to melt frozen precip (<0 cools ocean),Vertical sum of ocn frazil heat flux over time step,longwave flux into ocean (<0 cools ocean),latent heat flux into ocean (<0 cools ocean),sensible heat into ocean (<0 cools ocean),shortwave flux into ocean (>0 heats ocean),penetrative shortwave heating,heat into ocean due to melting ice (>0 heats ocean),heat into ocean due to land ice discharge-melt (>0 heats ocean),surface ocean heat flux coming through coupler and mass transfer,cp*rivermix*rho_dzt*temp,surface heat flux coming through coupler,heat flux (relative to 0C) from pme transfer of water across ocean surface,i-directed wind stress forcing u-velocity,j-directed wind stress forcing v-velocity,Bottom u-stress via bottom drag,Bottom v-stress via bottom drag,vertical piece of Ertel PV: (f+zeta)*N^2,i-current,j-current,Thickness and rho wghtd horz bih frict on u-zonal,Thickness and rho wghtd horz bih frict on v-merid,3d velocity dot product with 3d gradient of vertical piece of Ertel PV: u.grad((f+zeta)*N^2),Ekman vertical velocity averaged to wt-point,surface height including steric contribution,Potential temperature,Potential temperature,Start time for average period,End time for average period,Length of average period,time axis boundaries
dbar,Pa,(kg/m^3)*m,m,meter,m^2,K,K,K,squared K,deg_C,psu,psu,squared psu,psu,yr,m,m,m,m^2,kg/s,kg/s,1/s,1/s^2,m,kg/m^3,kg/m^3,kg/m^3,meter,m/sec,m/sec,m/sec,kg/s,kg/s,kg/s,kg/s,kg/s,kg/s,kg/s,kg/s,kg/s,kg/s,kg/s,kg/s,kg/s,kg/s,kg/s,Watts,Watts,Watt,Watt,Watt,Watt,Watt,Watt,(kg/m^3)*(m/sec),(kg/m^3)*(m/sec),(kg/m^3)*(m/sec),(kg/m^3)*(m/sec),(kg/m^3)*(m/sec),(kg/m^3)*(m/sec),(kg/m^3)*(m/sec),(kg/m^3)*(m/sec),(kg/m^3)*(m/sec),kg/(m^2*sec),kg/(m^2*sec),kg/(m^2*sec),kg/(m^2*sec),Watts/m^2,Watts/m^2,Watts/m^2,W/m^2,W/m^2,W/m^2,W/m^2,W/m^2,W/m^2,W/m^2,(W/m^2),(W/m^2),Watts/m^2,Watt/m^2,Watts/m^2,Watts/m^2,N/m^2,N/m^2,N/m^2,N/m^2,1/sec^3,m/sec,m/sec,(kg/m^3)*(m^2/s^2),(kg/m^3)*(m^2/s^2),1/sec^4,m/s,meter,K,K,days since 0001-01-01 00:00:00,days since 0001-01-01 00:00:00,days,days
intake_esm_attrs:filename :
ocean_month.nc
intake_esm_attrs:file_id :
ocean_month
intake_esm_attrs:_data_format_ :
netcdf
intake_esm_dataset_key :
ocean_month.1mon
Define a function that takes the data we have and returns what we want.
[5]:
defget_trend(time,timeseries):'''Calculate the trend through a timeseries using scipy.stats.linregress, and return just the slope and the p value as an array, for the purposes of demonstrating xarray.apply_ufunc Inputs: time: np.ndarray the times or x values of whatever the slope will go through timeseries: np.ndarray the data to calculate the slope of Outputs: stats: np.ndarray 1st element is the trend in timeseries 2nd element is the p value of this trend, indicating the significance They're lumped together into one variable, a) so .load() can be called on both at once, and b) to demonstrate some of the nuance in xr.apply_ufunc when using it for more complicated applications '''slope,intercept,r,p,se=scipy.stats.linregress(time,timeseries)returnnp.array((slope,p))# Combine into one array because it's easier to load in one go
Define a timeseries for the linear regression to work (because scipy doesn’t like datetimes).
Note: This line is specific to the function being applied - in this instance, we want to apply scipy.stats.linregress, and it needs a timeseries of x values so we make it one.
[6]:
years_since_start=xr.DataArray(np.arange(sst.time.shape[0])/12,dims=('time',),coords={'time':sst.time})# Pass data through to the `xarray.apply_ufunc`stats=xr.apply_ufunc(get_trend,# function being usedyears_since_start,# Argument 1 for functionsst["sst"],# Argument 2 for functioninput_core_dims=(('time',),('time',)),# Dimensions the function needs for each argumentoutput_core_dims=(('stat_type',),),# Dimensions of each output from the functiondask_gufunc_kwargs={"output_sizes":{'stat_type':2},# The new dimension will have size 2},vectorize=True,# The function needs to only have one lat and lon at a timedask='parallelized',# Dask is fine, but the function can't handle it so apply_ufunc needs to)stats
Plot the calculated slope, stippling all regions that are significant at \(p<0.05\). Before we plot we need to load the unmasked coordinates of geographic latitude and longitude and attach them to the dataset. If we don’t use these cooridings the regions near the poles are distorted (see the Making_Maps_with_Cartopy tutorial).