Make Your Own Database

The cosima-cookbook uses a database to access information about experiments and to help with loading model output. We maintain a default database for ACCESS-OM2 experiments, but there are occasions when you might want to make your own database. This tutorial outlines the process of making your own private database.

Requirements: We recommend that you use conda/analysis3-21.04 (or later) kernel on NCI (or your own up-to-date cosima-cookbook installation).

[1]:
%matplotlib inline
%config InlineBackend.figure_format='retina'

import cosima_cookbook as cc

First, create a database session using the inbuilt create_session function. To do this, you need to specify a path for the database - choose a location where you have write permission (that is, not the example that I have given here):

[2]:
db = 'local_cc_test.db'
session = cc.database.create_session(db)

Note that you need to create the database session every time you start up your notebook; you can then update this database however many times you like.

Now you are ready to build a database. First, select which experiments you want to include in your database. For these purposes, an experiment is a directory containing output from a single simulation. (If you use a higher level directory you won’t be able to distinguish between experiments.)

My example below constructs a list of two experiment directories; we have chosen two cases with different resolution. The database will be built to index all netcdf files in each directory.

[3]:
directory_list=['/g/data/ik11/outputs/access-om2/1deg_jra55_iaf_omip2_cycle6',
                '/g/data/ik11/outputs/access-om2-025/025deg_jra55_iaf_omip2_cycle6'
               ]

cc.database.build_index(directory_list, session)
Indexing experiment: 1deg_jra55_iaf_omip2_cycle6
100%|██████████| 4376/4376 [05:15<00:00, 13.85it/s]
Indexing experiment: 025deg_jra55_iaf_omip2_cycle6
  1%|▏         | 28/2174 [00:09<11:07,  3.21it/s]ERROR:root:Error indexing /g/data/ik11/outputs/access-om2-025/025deg_jra55_iaf_omip2_cycle6/restart354/ice/kmt.nc: 'nEdits'
 10%|█         | 226/2174 [01:12<08:36,  3.77it/s]ERROR:root:Error indexing /g/data/ik11/outputs/access-om2-025/025deg_jra55_iaf_omip2_cycle6/restart329/ice/kmt.nc: 'nEdits'
 18%|█▊        | 394/2174 [02:11<12:56,  2.29it/s]ERROR:root:Error indexing /g/data/ik11/outputs/access-om2-025/025deg_jra55_iaf_omip2_cycle6/restart309/ice/kmt.nc: 'nEdits'
 38%|███▊      | 835/2174 [04:32<06:39,  3.35it/s]ERROR:root:Error indexing /g/data/ik11/outputs/access-om2-025/025deg_jra55_iaf_omip2_cycle6/restart314/ice/kmt.nc: 'nEdits'
 49%|████▊     | 1056/2174 [05:50<07:46,  2.40it/s]ERROR:root:Error indexing /g/data/ik11/outputs/access-om2-025/025deg_jra55_iaf_omip2_cycle6/restart344/ice/kmt.nc: 'nEdits'
 49%|████▉     | 1064/2174 [05:52<05:05,  3.63it/s]ERROR:root:Error indexing /g/data/ik11/outputs/access-om2-025/025deg_jra55_iaf_omip2_cycle6/restart304/ice/kmt.nc: 'nEdits'
 51%|█████     | 1114/2174 [06:08<05:18,  3.33it/s]ERROR:root:Error indexing /g/data/ik11/outputs/access-om2-025/025deg_jra55_iaf_omip2_cycle6/restart319/ice/kmt.nc: 'nEdits'
 53%|█████▎    | 1151/2174 [06:21<05:02,  3.38it/s]ERROR:root:Error indexing /g/data/ik11/outputs/access-om2-025/025deg_jra55_iaf_omip2_cycle6/restart339/ice/kmt.nc: 'nEdits'
 61%|██████    | 1331/2174 [07:18<04:36,  3.05it/s]ERROR:root:Error indexing /g/data/ik11/outputs/access-om2-025/025deg_jra55_iaf_omip2_cycle6/restart334/ice/kmt.nc: 'nEdits'
 62%|██████▏   | 1346/2174 [07:21<03:24,  4.05it/s]ERROR:root:Error indexing /g/data/ik11/outputs/access-om2-025/025deg_jra55_iaf_omip2_cycle6/restart359/ice/kmt.nc: 'nEdits'
 63%|██████▎   | 1375/2174 [07:31<04:19,  3.07it/s]ERROR:root:Error indexing /g/data/ik11/outputs/access-om2-025/025deg_jra55_iaf_omip2_cycle6/restart324/ice/kmt.nc: 'nEdits'
 80%|████████  | 1749/2174 [09:34<01:54,  3.70it/s]ERROR:root:Error indexing /g/data/ik11/outputs/access-om2-025/025deg_jra55_iaf_omip2_cycle6/restart365/ice/kmt.nc: 'nEdits'
 96%|█████████▌| 2081/2174 [11:22<00:22,  4.17it/s]ERROR:root:Error indexing /g/data/ik11/outputs/access-om2-025/025deg_jra55_iaf_omip2_cycle6/restart349/ice/kmt.nc: 'nEdits'
 97%|█████████▋| 2110/2174 [11:32<00:28,  2.28it/s]ERROR:root:Error indexing /g/data/ik11/outputs/access-om2-025/025deg_jra55_iaf_omip2_cycle6/restart364/ice/kmt.nc: 'nEdits'
100%|██████████| 2174/2174 [11:54<00:00,  3.04it/s]
[3]:
6550

Some warnings could come up. If they look worrisome then seek for advice. 🙂

Note that when you index your database for the first time it may take a little as it has to go through all output files of the experiments. However, it is relatively painless to update the database to include more output from some of the experiments.

By default, the next time you call cc.database.build_index using the same session, the database won’t be reindexed but only new files will be updated. If you do not want to update the database but rather want to force the whole database to be re-indexed, then you need to provide the force=True keyword argument to cc.database.build_index. For more details on usage of cc.database.build_index, have a look at help(cc.database.build_index).

You now have your own database! 🎉

Just remember to specify your own database via the cc.database.create_session() when you load model output to do your analyses. Otherwise, the cosima-cookbook will look for the output from the experiment you prescribed from the default database.

Using the database

To know how to effectively use this database, please see the companion tutorial: `COSIMA_CookBook_Tutorial.ipynb <https://cosima-recipes.readthedocs.io/en/latest/tutorials/COSIMA_CookBook_Tutorial.html#gallery-tutorials-cosima-cookbook-tutorial-ipynb>`__. Alternatively, here is a sample that shows how you might load a variable from an experiment in your database.

[5]:
experiment = '025deg_jra55_iaf_omip2_cycle6'
variable = 'ke_tot'
dataarray = cc.querying.getvar(experiment, variable, session, ncfile='ocean_scalar.nc')
annual_average = dataarray.resample(time='A').mean(dim='time')
annual_average.plot();
../_images/Tutorials_Make_Your_Own_Database_8_0.png

To find more about the in-built functions used above, you can use the help function. For example:

[6]:
help(cc.database.create_session)
Help on function create_session in module cosima_cookbook.database:

create_session(db=None, debug=False)
    Create a session for the specified database file.

    If debug=True, the session will output raw SQL whenever it is executed on the database.

[8]:
help(cc.querying.getvar)
Help on function getvar in module cosima_cookbook.querying:

getvar(expt, variable, session, ncfile=None, start_time=None, end_time=None, n=None, frequency=None, attrs={}, **kwargs)
    For a given experiment, return an xarray DataArray containing the
    specified variable.

    expt - text string indicating the name of the experiment
    variable - text string indicating the name of the variable to load
    session - a database session created by cc.database.create_session()
    ncfile -  an optional text string indicating the pattern for filenames
              to load. All filenames containing this string will match, so
              be specific. '/' can be used to match the start of the
              filename, and '%' is a wildcard character.
    start_time - only load data after this date. specify as a text string,
                 e.g. '1900-01-01'
    end_time - only load data before this date. specify as a text string,
               e.g. '1900-01-01'
    n - after all other queries, restrict the total number of files to the
        first n. pass a negative value to restrict to the last n
    frequency - specify frequency to disambiguate identical variables saved
                at different temporal resolution
    attrs - a dictionary of attribute names and their values that must be
            present on the returned variables

    Note that if start_time and/or end_time are used, the time range
    of the resulting dataset may not be bounded exactly on those
    values, depending on where the underlying files start/end. Use
    dataset.sel() to exactly select times from the dataset.

    Other kwargs are passed through to xarray.open_mfdataset, including:

    chunks - Override any chunking by passing a chunks dictionary.
    decode_times - Time decoding can be disabled by passing decode_times=False

[ ]: