Make Your Own Database¶
The cosima-cookbook
uses a database to access information about experiments and to help with loading model output. We maintain a default database for ACCESS-OM2 experiments, but there are occasions when you might want to make your own database. This tutorial outlines the process of making your own private database.
Requirements: We recommend that you use conda/analysis3-21.04
(or later) kernel on NCI (or your own up-to-date cosima-cookbook
installation).
NOTE: We recommend for projects using ACCESS data, that you create your own intake-esm datastore instead of using to cookbook. The cookbook will be deprecated soon and this tutorial removed.
[1]:
import cosima_cookbook as cc
First, create a database session using the inbuilt create_session
function. To do this, you need to specify a path for the database - choose a location where you have write permission (that is, not the example that I have given here):
[2]:
db = 'local_cc_test.db'
session = cc.database.create_session(db)
Note that you need to create the database session every time you start up your notebook; you can then update this database however many times you like.
Now you are ready to build a database. First, select which experiments you want to include in your database. For these purposes, an experiment is a directory containing output from a single simulation. (If you use a higher level directory you won’t be able to distinguish between experiments.)
My example below constructs a list of two experiment directories; we have chosen two cases with different resolution. The database will be built to index all netcdf files in each directory.
[3]:
directory_list=['/g/data/ik11/outputs/access-om2/1deg_jra55_iaf_omip2_cycle6',
'/g/data/ik11/outputs/access-om2-025/025deg_jra55_iaf_omip2_cycle6'
]
cc.database.build_index(directory_list, session)
Indexing experiment: 1deg_jra55_iaf_omip2_cycle6
100%|██████████| 4376/4376 [05:15<00:00, 13.85it/s]
Indexing experiment: 025deg_jra55_iaf_omip2_cycle6
1%|▏ | 28/2174 [00:09<11:07, 3.21it/s]ERROR:root:Error indexing /g/data/ik11/outputs/access-om2-025/025deg_jra55_iaf_omip2_cycle6/restart354/ice/kmt.nc: 'nEdits'
10%|█ | 226/2174 [01:12<08:36, 3.77it/s]ERROR:root:Error indexing /g/data/ik11/outputs/access-om2-025/025deg_jra55_iaf_omip2_cycle6/restart329/ice/kmt.nc: 'nEdits'
18%|█▊ | 394/2174 [02:11<12:56, 2.29it/s]ERROR:root:Error indexing /g/data/ik11/outputs/access-om2-025/025deg_jra55_iaf_omip2_cycle6/restart309/ice/kmt.nc: 'nEdits'
38%|███▊ | 835/2174 [04:32<06:39, 3.35it/s]ERROR:root:Error indexing /g/data/ik11/outputs/access-om2-025/025deg_jra55_iaf_omip2_cycle6/restart314/ice/kmt.nc: 'nEdits'
49%|████▊ | 1056/2174 [05:50<07:46, 2.40it/s]ERROR:root:Error indexing /g/data/ik11/outputs/access-om2-025/025deg_jra55_iaf_omip2_cycle6/restart344/ice/kmt.nc: 'nEdits'
49%|████▉ | 1064/2174 [05:52<05:05, 3.63it/s]ERROR:root:Error indexing /g/data/ik11/outputs/access-om2-025/025deg_jra55_iaf_omip2_cycle6/restart304/ice/kmt.nc: 'nEdits'
51%|█████ | 1114/2174 [06:08<05:18, 3.33it/s]ERROR:root:Error indexing /g/data/ik11/outputs/access-om2-025/025deg_jra55_iaf_omip2_cycle6/restart319/ice/kmt.nc: 'nEdits'
53%|█████▎ | 1151/2174 [06:21<05:02, 3.38it/s]ERROR:root:Error indexing /g/data/ik11/outputs/access-om2-025/025deg_jra55_iaf_omip2_cycle6/restart339/ice/kmt.nc: 'nEdits'
61%|██████ | 1331/2174 [07:18<04:36, 3.05it/s]ERROR:root:Error indexing /g/data/ik11/outputs/access-om2-025/025deg_jra55_iaf_omip2_cycle6/restart334/ice/kmt.nc: 'nEdits'
62%|██████▏ | 1346/2174 [07:21<03:24, 4.05it/s]ERROR:root:Error indexing /g/data/ik11/outputs/access-om2-025/025deg_jra55_iaf_omip2_cycle6/restart359/ice/kmt.nc: 'nEdits'
63%|██████▎ | 1375/2174 [07:31<04:19, 3.07it/s]ERROR:root:Error indexing /g/data/ik11/outputs/access-om2-025/025deg_jra55_iaf_omip2_cycle6/restart324/ice/kmt.nc: 'nEdits'
80%|████████ | 1749/2174 [09:34<01:54, 3.70it/s]ERROR:root:Error indexing /g/data/ik11/outputs/access-om2-025/025deg_jra55_iaf_omip2_cycle6/restart365/ice/kmt.nc: 'nEdits'
96%|█████████▌| 2081/2174 [11:22<00:22, 4.17it/s]ERROR:root:Error indexing /g/data/ik11/outputs/access-om2-025/025deg_jra55_iaf_omip2_cycle6/restart349/ice/kmt.nc: 'nEdits'
97%|█████████▋| 2110/2174 [11:32<00:28, 2.28it/s]ERROR:root:Error indexing /g/data/ik11/outputs/access-om2-025/025deg_jra55_iaf_omip2_cycle6/restart364/ice/kmt.nc: 'nEdits'
100%|██████████| 2174/2174 [11:54<00:00, 3.04it/s]
[3]:
6550
Some warnings could come up. If they look worrisome then seek for advice. 🙂
Note that when you index your database for the first time it may take a little as it has to go through all output files of the experiments. However, it is relatively painless to update the database to include more output from some of the experiments.
By default, the next time you call cc.database.build_index
using the same session, the database won’t be reindexed but only new files will be updated. If you do not want to update the database but rather want to force the whole database to be re-indexed, then you need to provide the force=True
keyword argument to cc.database.build_index
. For more details on usage of cc.database.build_index
, have a look at help(cc.database.build_index)
.
You now have your own database! 🎉
Just remember to specify your own database via the cc.database.create_session()
when you load model output to do your analyses. Otherwise, the cosima-cookbook
will look for the output from the experiment you prescribed from the default database.
Using the database¶
To know how to effectively use this database, please see the companion tutorial: `COSIMA_CookBook_Tutorial.ipynb
<https://cosima-recipes.readthedocs.io/en/latest/tutorials/COSIMA_CookBook_Tutorial.html#gallery-tutorials-cosima-cookbook-tutorial-ipynb>`__. Alternatively, here is a sample that shows how you might load a variable from an experiment in your database.
[5]:
experiment = '025deg_jra55_iaf_omip2_cycle6'
variable = 'ke_tot'
dataarray = cc.querying.getvar(experiment, variable, session, ncfile='ocean_scalar.nc')
annual_average = dataarray.resample(time='A').mean(dim='time')
annual_average.plot();

To find more about the in-built functions used above, you can use the help
function. For example:
[6]:
help(cc.database.create_session)
Help on function create_session in module cosima_cookbook.database:
create_session(db=None, debug=False)
Create a session for the specified database file.
If debug=True, the session will output raw SQL whenever it is executed on the database.
[8]:
help(cc.querying.getvar)
Help on function getvar in module cosima_cookbook.querying:
getvar(expt, variable, session, ncfile=None, start_time=None, end_time=None, n=None, frequency=None, attrs={}, **kwargs)
For a given experiment, return an xarray DataArray containing the
specified variable.
expt - text string indicating the name of the experiment
variable - text string indicating the name of the variable to load
session - a database session created by cc.database.create_session()
ncfile - an optional text string indicating the pattern for filenames
to load. All filenames containing this string will match, so
be specific. '/' can be used to match the start of the
filename, and '%' is a wildcard character.
start_time - only load data after this date. specify as a text string,
e.g. '1900-01-01'
end_time - only load data before this date. specify as a text string,
e.g. '1900-01-01'
n - after all other queries, restrict the total number of files to the
first n. pass a negative value to restrict to the last n
frequency - specify frequency to disambiguate identical variables saved
at different temporal resolution
attrs - a dictionary of attribute names and their values that must be
present on the returned variables
Note that if start_time and/or end_time are used, the time range
of the resulting dataset may not be bounded exactly on those
values, depending on where the underlying files start/end. Use
dataset.sel() to exactly select times from the dataset.
Other kwargs are passed through to xarray.open_mfdataset, including:
chunks - Override any chunking by passing a chunks dictionary.
decode_times - Time decoding can be disabled by passing decode_times=False
[ ]: