{ "cells": [ { "cell_type": "markdown", "id": "beee3fe0-8aee-43fd-bcd8-52f1039639d9", "metadata": {}, "source": [ "# Submitting analysis jobs to gadi\n", "\n", "This notebook goes through an example of how to convert your jupyter notebook code into a PBS job you can submit to Gadi, NCI.\n", "\n", "Sometimes, once you have your analysis working in a jupyter notebook and you are using a lot of data, it can be helpful to submit the job directly to Gadi. This uses the PBS queue (see NCI website for more details, https://opus.nci.org.au/display/Help/4.+PBS+Jobs). It means the job is not interactive, but it also means you can submit many jobs at the same time if say you want to break up your analysis and run one year/month at a time.\n", "\n", "One way to set up this analysis is to convert your `.ipynb` notebooks into `.py` files, and then run on gadi with a wrapping `.sh` bash script. Functions and packages that work in your `.ipynb` notebooks, such as `cosima_cookbook` should also work on gadi. `dask`, however, requires some special treatment; see the CLEX CMS team's article on this for more information (https://coecms-training.github.io/parallel/dask-batch.html).\n", "\n", "This notebook shows how to \n", "1. set up your `.py` script to use `dask` stably\n", "2. set up your PBS script\n", "3. Optional: add an environment variable for easy submission of different months/years/variables" ] }, { "cell_type": "markdown", "id": "743d04d5-e4ca-4235-8be5-94bc6e59e8cf", "metadata": {}, "source": [ "## Step 1: `.py` script\n", "\n", "```python \n", "\"\"\"\n", "Title - e.g. Computing EKE for one year\n", "\"\"\"\n", "\n", "# Load modules as normal\n", "\n", "import matplotlib.pyplot as plt\n", "import xarray as xr\n", "from xgcm import Grid\n", "import numpy as np\n", "import pandas as pd\n", "import cftime\n", "from xhistogram.xarray import histogram\n", "import cosima_cookbook as cc\n", "\n", "import IPython.display\n", "import cmocean as cm\n", "import cartopy.crs as ccrs\n", "import cartopy.feature as cft\n", "\n", "import sys, os\n", "import warnings\n", "warnings.simplefilter(\"ignore\")\n", "from dask.distributed import Client\n", "\n", "import climtas.nci\n", "\n", "if __name__ == '__main__':\n", "\n", " climtas.nci.GadiClient()\n", " session = cc.database.create_session()\n", "\n", " ## Your jupyter notebook code goes here inside the `main` loop\n", "```\n", "\n", "Add your python code from jupyter inside the main loop. **Be very careful of the indentation!**. Also note that at the end of the PBS script you need to explicitly save things, otherwise all the results of your analyses will get deleted. So, be sure to save any figures, data you need! For example, you could save out the variable EKE as a `netcdf` using the following code.\n", "\n", "```python\n", " EKE.load() ## Loading first speeds up saving\n", " \n", " ##saving\n", " save_dir = '/path/...'\n", " ds = xr.Dataset({'EKE': EKE})\n", " ds.to_netcdf(save_dir+'EKE_expt0.nc', \n", " encoding={'EKE': {'shuffle': True, 'zlib': True, 'complevel': 5}}) \n", "\n", "```\n", "The encoding line helps with compression (see https://climate-cms.org/posts/2018-10-12-create-netcdf.html). You can add extra attributes to the DataSet e.g. `long_name`, `units` etc. too." ] }, { "cell_type": "markdown", "id": "085b9787-5f58-4faf-983c-9c1b7013115f", "metadata": {}, "source": [ "## Step 2: Make a PBS script\n", "The NCI website has information about all the different options you have for a PBS script. However, the following script should provide a starting point to modify the time, memory for your needs. If using dask make sure you set a value for `jobfs`, as the dask settings in the previous python script will set the dask to dump it computation in the `jobfs` local memory, which is very efficient compared to dumping in `/scratch` or `/g/data`. \n", "\n", "```\n", "#!/bin/bash\n", "#PBS -P a01 #replace this with your NCI project, e.g., x77, e14, ...\n", "#PBS -q normalbw \n", "#PBS -l mem=120gb \n", "#PBS -l ncpus=8\n", "#PBS -l walltime=0:15:00\n", "#PBS -l storage=gdata/ik11+gdata/hh5+gdata/a01 # add all the storage flags you need\n", "#PBS -l jobfs=100gb\n", "#PBS -N save_EKE\n", "#PBS -j oe\n", "\n", "module use /g/data3/hh5/public/modules\n", "module load conda/analysis3\n", "\n", "cd /g/data/XXX/uu1234/analysis/scripts/ # replace here with directory that your .py script lives in\n", "\n", "# call python\n", "python3 save_EKE.py &>> output_save_EKE.txt ## This will output any python errors (e.g. dask output, print statements) into a .txt file for easy debugging. \n", "\n", "exit\n", "\n", "```\n", "\n", "If you are satisfied with your code you can comment the `output_save_EKE.txt` out before the `&` symbol to avoid making unnecessary files. You will still always have a PBS error file with the walltime used and any errors in your PBS script (such as folders/files not being called correctly) - some of these `.txt` files can get very large with `dask` statements if it's a big job.\n", "\n", "#### *To submit this script (saved as `run_EKE.sh`) in gadi, use `qsub run_EKE.sh`.*" ] }, { "cell_type": "markdown", "id": "d8794fc5-e8c2-46e0-bcfb-38e173e44241", "metadata": {}, "source": [ "## (Optional) Step 3: Loop\n", "\n", "Sometimes, you just want to run the same script for many months, variables or years. This is when the `-v` environment variables command is useful. For example, let's say our script calculates the EKE for 1 year, and we want to run it for 10 years. We make some small modifications to the PBS script:\n", "\n", "\n", "```\n", "#!/bin/bash\n", "#PBS -P ### #(write your project here, e.g. x77,e14)\n", "#PBS -q normalbw\n", "#PBS -l mem=120gb\n", "#PBS -l ncpus=8\n", "#PBS -l walltime=0:15:00\n", "#PBS -l storage=gdata/ik11+gdata/hh5+gdata/x77+gdata/e14+scratch/e14\n", "#PBS -l jobfs=100gb\n", "#PBS -N save_EKE\n", "#PBS -j oe\n", "#PBS -v year\n", "\n", "module use /g/data3/hh5/public/modules\n", "module load conda/analysis3\n", "\n", "cd /g/data/XXX/uu1234/analysis/scripts/ ## point to where your .py script is saved\n", "\n", "# call python\n", "python3 save_EKE.py $year &>> output_save_EKE_$year.txt \n", "\n", "exit\n", "\n", "```\n", "\n", "Now, in order to run the script you need to specify a year, say 2000, using `qsub -v year=2000 run_EKE.sh`. \n", "\n", "You also need to tell python how to use this number, which you can do using `sys`. Inside the `'main'` loop, write\n", "\n", "```python \n", " #### get run count argument that was passed to python script ####\n", " import sys\n", " year = int(sys.argv[1])\n", "\n", " start_time = str(year)+'-01-01'\n", " end_time = str(year)+'-12-31'\n", "```\n", "\n", "And then you can go along with the python code/cosima recipes, selecting that time period using the cosima cookbook or `.sel(time =slice(start_time,end_time))`. \n", "\n", "Then, you can save the file out at the end of the code with a file name of a particular year\n", "```python\n", " EKE.load() ## Loading first speeds up saving\n", " \n", " ##saving\n", " save_dir = '/path/...'\n", " ds = xr.Dataset({'EKE': EKE})\n", " ds.to_netcdf(save_dir+'EKE_year_'+str(year)+'.nc', \n", " encoding={'EKE': {'shuffle': True, 'zlib': True, 'complevel': 5}}) \n", "\n", "```\n", "You can add more environment variables in the same way.\n", "\n", "Finally, for even faster command lines, you can write a small bash script to call the `qsub` command for each year. This will submit 10 of the same job with year argument 2000 to 2009 simulataneously (so be careful -- try not to submit hundreds of jobs at the same time which blocks up the NCI queues!)\n", "\n", "```\n", "#!/bin/bash\n", "\n", "## loop over count, submit job to gadi with count that gets communicated to python\n", "\n", "for i in {2000..2009}\n", "do\n", " echo \"creating job for year $i\"\n", " qsub -v year=i run_EKE.sh\n", "done\n", "\n", "```\n", "\n", "\n", "## A different looping strategy\n", "\n", "Rather than submitting 10 years at the same time, you could also add a counter in your PBS script so that when it gets to the end of the script it will resubmit but update the environment variable to be say the next year. \n", "\n", "```\n", "#!/bin/bash\n", "#PBS -P a01 #replace this with your NCI project, e.g., x77, e14, ...\n", "#PBS -q normalbw\n", "#PBS -l mem=120gb\n", "#PBS -l ncpus=8\n", "#PBS -l walltime=0:15:00\n", "#PBS -l storage=gdata/ik11+gdata/hh5+gdata/a01 # add all the storage flags you need\n", "#PBS -l jobfs=100gb\n", "#PBS -N save_EKE\n", "#PBS -j oe\n", "#PBS -v year\n", "\n", "module use /g/data3/hh5/public/modules\n", "module load conda/analysis3\n", "\n", "cd /g/data/XXX/uu1234/analysis/scripts/ # replace here with directory that your .py script lives in\n", "\n", "# set max number of time loops to run:\n", "n_loops = 10\n", "\n", "# call python\n", "python3 save_EKE.py year &>> output_save_EKE_{year}.txt \n", "\n", "# increment count and resubmit:\n", "year = $((year+1))\n", "if [ $year -lt $n_loops ]; then\n", "cd $script_dir\n", "qsub -v year=$year run_EKE.sh\n", "fi\n", "\n", "exit\n", "```\n", "\n", "Then running `qsub -v year=2000 run_EKE.sh` will run the code for year 2000, then once that has finished submit a job for year 2001 and so on for `n_loops = 10` iterations." ] }, { "cell_type": "markdown", "id": "86d3d252-4c31-4b74-b66a-6f4a507b6eb5", "metadata": {}, "source": [ "## Other links\n", "\n", "The CLEX CMS blog https://climate-cms.org/ and wiki http://climate-cms.wikis.unsw.edu.au/Home are great resources with lots of information!" ] } ], "metadata": { "kernelspec": { "display_name": "Python 3 (ipykernel)", "language": "python", "name": "python3" }, "language_info": { "codemirror_mode": { "name": "ipython", "version": 3 }, "file_extension": ".py", "mimetype": "text/x-python", "name": "python", "nbconvert_exporter": "python", "pygments_lexer": "ipython3", "version": "3.9.13" }, "thumbnail_figure": "assets/gadi.png" }, "nbformat": 4, "nbformat_minor": 5 }