1.3.4. Creating Surface Datasets
1.3.4.1. mksurfdata_esmf purpose
This tool is intended to generate fsurdat files (surface datasets) and landuse files for the CTSM. It can generate global, regional, and single-point fsurdat files, as long as a mesh file is available for the grid.
The subset_data tool allows users to make fsurdat files from existing fsurdat files when a mesh file is unavailable. Generally, users are encouraged to use the subset_data tool for generating regional and single-point fsurdat files.
1.3.4.2. Build Requirements
mksurfdata_esmf is a distributed memory parallel program (using Message Passing Interface -- MPI) that utilizes both ESMF (Earth System Modelling Framework) for regridding as well as PIO (Parallel I/O) and NetCDF output. As such, libraries must be built for the following:
MPI
NetCDF
PIO
ESMF
In addition for the build: python, bash-shell, CMake and GNU-Make are required
These libraries need to be built such that they can all work together in the same executable. Hence, the above order may be required in building them.
CTSM submodules cime and ccs_config are required, and we will show how these come in. A python environment that includes particular packages is also required. We demonstrate how to use the ctsm_pylib environment that we support in CTSM.
Note, PNETCDF is an optional library that can be used, but is NOT required.
Use cime to manage the build requirements
Important
CURRENTLY WORKS ONLY ON DERECHO IN CTSM (not CESM) CHECKOUTS
For users working on cime machines you can use the build script to build the tool. On other machines you'll need to do a port to cime and tell how to build for that machine. That's talked about in the cime documentation. And you'll have to make some modifications to the build script.
https://github.com/ESMCI/cime/wiki/Porting-Overview
Machines that already run CTSM or CESM have been ported to cime. So if you can run the model on your machine, you will be able to build the tool there.
To get a list of the machines that have been ported to cime:
# Assuming pwd is your CTSM or CESM checkout
cd cime/scripts
./query_config --machines
Note
In addition to having a port to cime, the machine also needs to have PIO built and able to be referenced with the env variable PIO which will need to be in the porting instructions for the machine. An independent PIO library is available on supported CESM machines.
Important
Currently we have run and tested mksurfdata_esmf on Derecho. Please see this github issue about mksurfdata_esmf on other CESM machines:
1.3.4.3. The complete process
If you have read the previous section, you are ready to proceed. The $CTSMROOT/tools/README.md goes through the complete process for creating input files needed to run CLM. The $CTSMROOT/tools/mksurfdata_esmf/README.md specifically goes through the complete process of generating surface and landuse datasets. We repeat those files here:
# CTSM Tools for Preprocessing of Input Datasets or Postprocessing of History Output
#### $CTSMROOT/tools/README.md
CTSM tools for analysis of CTSM history files -- or for creation or
modification of CTSM input files.
I. General directory structure:
`$CTSMROOT/tools`
mksurfdata_esmf -- Create surface datasets.
crop_calendars --- Regrid and process GGCMI sowing and harvest date files for use in CTSM.
site_and_regional Scripts for handling input datasets for site and regional cases.
These scripts both help with creation of datasets using the
standard process as well as subsetting existing datasets and overwriting
some aspects for a specific case.
modify_input_files Scripts to modify CTSM input files. Specifically modifying the surface
datasets and mesh files.
contrib ---------- Miscellaneous tools for pre or post processing of CTSM.
Typically these are contributed by anyone who has something
they think might be helpful to the community. They may not
be as well tested or supported as other tools.
II. Notes on building/running for each of the above tools:
mksurfdata_esmf has a cime configure and CMake based build using the following files:
gen_mksurfdata_build ---- Build mksurfdata_esmf
src/CMakeLists.txt ------ Tells CMake how to build the source code
Makefile ---------------- GNU makefile to link the program together
cmake ------------------- CMake macros for finding libraries
mkmapgrids and site_and_regional only contain scripts that do not need build files.
Some tools have copies of files from other directories -- see the README.filecopies.md
file for more information on this.
Tools may also have files with the directory name followed by namelist to provide sample namelists.
<directory>.namelist ------ Namelist to create a global file.
These files are also used by the test scripts to test the tools (see the
README.testing.md) file.
> [!NOTE]
> Be sure to change the path of the datasets referenced by these namelists to
> point to where you have exported your CESM inputdata datasets.
III. Process sequence to create input datasets needed to run CTSM
1. Create ESMF MESH grid files (if needed)
a. For standard resolutions these files will already be created. (done)
b. Run `tools/site_and_regional/subset_data point` to create single-point datasets
This creates just the fsurdat file as MESH files are NOT needed for single-point cases.
c. Run `tools/site_and_regional/subset_data region` to create regional datasets subset from a global dataset
This creates both the fsurdat file and MESH file needed to run.
d. General custom grid
You'll need to convert or create MESH grid files on your own (using scripts
or other tools) for the general case where you have an unstructured grid, or
a grid that is not regular in latitude and longitude, and that grid is custom
and not merely subset from one of the global grids.
2. Create surface datasets with mksurfdata_esmf on Derecho
(See mksurfdata_esmf/README.md for more help on doing this)
- gen_mksurfdata_build to build
- gen_mksurfdata_namelist to build the namelist
- gen_mksurfdata_jobscript_single to build a batch script to run on Derecho
- Submit the batch script just created above
- This step uses the results of step (1) entered into the XML database.
- If datasets were NOT entered into the XML database, set the resolution
by entering the mesh file using the options: --model-mesh --model-mesh-nx --model-mesh-ny
Example: for 0.9x1.25 resolution for 1850
``` shell
# On Derecho
cd mksurfdata_esmf
./gen_mksurfdata_build
./gen_mksurfdata_namelist --res 0.9x1.25 --start-year 1850 --end-year 1850
./gen_mksurfdata_jobscript_single --number-of-nodes 2 --tasks-per-node 128 --namelist-file target.namelist
qsub mksurfdata_jobscript_single.sh
```
3. Add new files to XML data or using user_nl_clm (optional)
See notes on doing this in step (1) above.
IV. Notes on which input datasets are needed for CTSM
global or regional grids
- need fsurdata
- need mesh files in env_run.xml ATM_DOMAIN_MESH and LND_DOMAIN_MESH
single-point grids
- Just need fsurdata
# Instructions for Using mksurfdata_esmf to Create Surface Datasets
#### $CTSMROOT/tools/mksurfdata_esmf/README.md
## Table of contents
1. Purpose NOW IN THE USER'S GUIDE https://escomp.github.io/CTSM/users_guide/using-clm-tools/creating-surface-datasets.html#mksurfdata-esmf-purpose
2. Build Requirements NOW IN THE USER'S GUIDE https://escomp.github.io/CTSM/users_guide/using-clm-tools/creating-surface-datasets.html#build-requirements
3. [Building the executable](#building-the-executable)
4. [Running a Single Submission](#running-for-a-single-submission)
5. [Running for Multiple Datasets](#running-for-the-generation-of-multiple-datasets)
6. [Notes](#notes)
<!-- ================== -->
### Building the executable
<!-- ================== -->
Before starting, be sure that you have run
``` shell
# Assuming pwd is the tools/mksurfdata_esmf directory
./bin/git-fleximod update # Assuming at the top level of the CTSM/CESM checkout
```
This will bring in CIME and ccs_config which are required for building.
``` shell
# Assuming pwd is the tools/mksurfdata_esmf directory
setenv DEBUG TRUE # only if debugging and your shell is tcsh (in bash use: export DEBUG=TRUE)
./gen_mksurfdata_build # For machines with a cime build
```
Note: The pio_iotype value gets set and written to a simple .txt file
by this build script. The value depends on your machine. If not running
on derecho, casper, or izumi, you may need to update this, though
a default value does get set for other machines.
<!-- ========================= -->
## Running for a single submission
<!-- ========================= -->
### Setup ctsm_pylib
Work in the ctsm_pylib environment, which requires the following steps when
running on Derecho. On other machines it will be similar but might be different
in order to get conda in your path and activate the ctsm_pylib environment.
``` shell
# Assuming pwd is the tools/mksurfdata_esmf directory
module load conda
cd ../.. # or ../../../.. for a CESM checkout)
./py_env_create # Assuming at the top level of the CTSM/CESM checkout
conda activate ctsm_pylib
```
to generate your target namelist:
``` shell
# Assuming pwd is the tools/mksurfdata_esmf directory
./gen_mksurfdata_namelist --help
```
for example try --res 1.9x2.5 --start-year 1850 --end-year 1850:
``` shell
# Assuming pwd is the tools/mksurfdata_esmf directory
./gen_mksurfdata_namelist --res <resolution> --start-year <year1> --end-year <year2>
```
> [!TIP]
> **IF FILES ARE MISSING FROM** /inputdata, a target namelist will be generated
> but with a generic name and with warning to run `./download_input_data` next.
> **IF A SMALLER SET OF FILES IS STILL MISSING AFTER RUNNING** `./download_input_data`
> and rerunning `./gen_mksurfdata_namelist`, then rerun
> `./gen_mksurfdata_namelist with your options needed.
> and rerun `./download_input_data` until
> `./gen_mksurfdata_namelist` finds all files.
Example, to generate your target jobscript (again use --help for instructions):
``` shell
# Assuming pwd is the tools/mksurfdata_esmf directory
./gen_mksurfdata_jobscript_single --number-of-nodes 2 --tasks-per-node 128 --namelist-file target.namelist
qsub mksurfdata_jobscript_single.sh
```
Read note about regional grids at the end.
<!-- ========================================= -->
## Running for the generation of multiple datasets
<!-- ========================================= -->
Work in the ctsm_pylib environment, as explained in earlier section.
gen_mksurfdata_jobscript_multi runs `./gen_mksurfdata_namelist` for you
``` shell
# Assuming pwd is the tools/mksurfdata_esmf directory
./gen_mksurfdata_jobscript_multi --number-of-nodes 2 --scenario global-present
qsub mksurfdata_jobscript_multi.sh
```
If you are looking to generate all (or a large number of) the datasets or the
single-point (1x1) datasets, you are best off using the Makefile. For example
``` shell
# Assuming pwd is the tools/mksurfdata_esmf directory
make all # ...or
make all-subset
```
As of 2024/9/12 one needs to generate NEON and PLUMBER2 fsurdat files by
running ./neon_surf_wrapper and ./plumber2_surf_wrapper manually in the
/tools/site_and_regional directory.
<!-- = -->
## NOTES
<!-- = -->
# Guidelines for input datasets to mksurfdata_esmf
> [!TIP]
> ALL raw datasets \*.nc **FILES MUST NOT BE NetCDF4**.
Example to convert to CDF5
``` shell
nccopy -k cdf5 oldfile newfile
```
> [!TIP]
> The LAI raw dataset \*.nc **FILE MUST HAVE** an "unlimited" time dimension
Example to change time to unlimted dimension using the NCO operator ncks.
``` shell
ncks --mk_rec_dmn time file_with_time_equals_12.nc -o file_with_time_unlimited.nc
```
### IMPORTANT THERE HAVE BEEN PROBLEMS with REGIONAL grids!!
> [!CAUTION]
> See
>
> https://github.com/ESCOMP/CTSM/issues/2430
In general we recommend using subset_data and/or fsurdat_modifier
for regional grids.