1. Introduction¶
1.1. Overview¶
The Community Data Models for Earth Predictive Systems (CDEPS) contains a set of NUOPC-compliant data components along with ESMF-based “stream” code that enables new capabilities in selectively removing feedbacks in coupled model systems. The CDEPS data models perform the basic function of reading external data files, modifying those data, and then sending the data back to the CMEPS mediator. The fields sent to the mediator are the same as those that would be sent by an active component. This takes advantage of the fact that the mediator and other CMEPS-compliant model components have no fundamental knowledge of whether another component is fully active or just a data component.
CDEPS is completely ESMF based. As a result, regridding between the stream resolution and the model resolution can occur at run time for any regridding option that ESMF supports. In addition, the PIO library is used so that all of the stream data is read in parallel by the CDEPS stream code.
The CDEPS data models typically read gridded data from a variety of sources including observations, reanalysis products or output data from a previous model simulation. Out of the box, they often provide a few possible data sources and/or time periods that you can choose from when setting up a case.
In some cases, data models have prognostic functionality, that is, they also receive and use data sent by the mediator. However, in most cases, the data models are not running prognostically and have no need to receive any data from the mediator.
The CIME data models have parallel capability and share significant amounts of source code. Methods for reading and interpolating data have been established and can easily be reused: The data model calls strdata (“stream data”) methods which then call stream methods. The stream methods are responsible for managing lists of input data files and their time axes. The information is then passed up to the strdata methods where the data is read and interpolated in space and time. The interpolated data is passed up to the data model where final fields are derived, packed, and returned to the mediator.
1.2. Code Tree¶
CDEPS contains the following code tree:
Directory |
Function |
---|---|
cime_config |
CIME Case Control System |
cmake |
Build (can be used with or without CIME) |
datm |
Data atmosphere component |
dice |
Data sea-ice component |
dlnd |
Data land component |
docn |
Data ocean component |
drof |
Data river component |
dwav |
Data wave component |
dshr |
Shared NUOPC cap code |
share |
Shared utility code |
streams |
Code to handle streams |
doc |
Sphinx documentation source |
1.3. Design¶
Data models function by reading in different streams
of input
data. A stream
is defined as a set of data files containing a set
of fields, where all the fields are on the same stream mesh and have
the same time coordinates. Data models input falls into two
categories: stream-independent and stream-dependent data.
- stream-dependent-data
Stream-dependent input is contained in the input XML file
d{model_name}.streams.xml
, wheremodel_name
can beatm
,ice
,lnd
,ocn
,rof
orwav
. Multiple streams can be specified in the this XML file (see streams). In turn, each stream in the xml file can be associated with multiple stream input files. The data across all the stream input files must all be on the same stream mesh and share the same time coordinates.In this case, the input XML file is parsed by CDEPS using the third-party FoX library. In addition to the XML format, it is also possible to use ESMF config format to define stream dependent namelist options (see streams). This option is mainly used by the NOAA’s UFS Weather Model while XML format used by NCAR’s CESM.
- stream-independent-data
Stream-independent input is contained in the input namelist file file
d{model_name}_in
. This file specifies a data model mesh file, a data model mask file along with other stream-independent data model specific configuration variables. In addition, eachd{model_name}_in
namelist file contains a namelist variabledatamode
which specifies the additional operations that need to be performed on the input streams to create the data model export state.
Data models leverage the CDEPS stream code to spatially interpolate the stream data to the model resolution and temporarlly interpolate the data to the model time. The CDEPS stream code carries this out as follows:
The two timestamps of input data that bracket the present model time are read first. These are called the lower and upper bounds of data and will change as the model advances.
The lower and upper bound data are then spatially mapped to the model grid based upon the in the
d{model_name}.streams[.xml]
nodemapalgo
. Spatial interpolation only occurs if the input data grid and model grid are not identical, and this is determined in the strdata module automatically.Time interpolation is the final step and is done using a time interpolation method specified in the
d{model_name}.streams[.xml]
nodetintalgo
.A final set of fields is then available to the data model on the model grid and for the current model time.
Each data model component communicates with the NUOPC mediator and exchanges fields on only the data model mesh.
1.4. CDEPS and CIME Control System (CCS)¶
If the CDEPS data models are used in conjunction with the CIME Case Control System (CCS) then the following will also hold:
Each data model has an xml variable in env_run.xml
that specifies the data model mode.
These are: DATM_MODE
, DICE_MODE
, DLND_MODE
, DOCN_MODE
, DROF_MODE
, DWAV_MODE
.
Each data model mode specifies the streams that are associated with that data model.
More details of the data model design are covered in design details.
1.5. NUOPC Cap¶
1.5.1. Initialization phases¶
The CDEPS data component has two initialization phases for each data model: (1) advertise and, (2) realize phases. In the advertise phase, the data component queries namelist files and specifies a data model mesh and mask files along with other stream-independent data model specific configuration variables. Then, the advertise phase initializes PIO for reading and writing netCDF files under CDEPS. As a last step, the top level advertise phase calls the stream specific one since advertised fields are changed based on used data mode. This will allow CDEPS to specialize based on the selected data mode and list of exported fields. In the realize phase, the data model reads the stream definition file and runs the data component to prepare initial data for other components.
1.5.2. Run phase¶
The CDEPS data component is designed to have a different run phase for each data mode, which is controlled by a top-level data component specific NUOPC “cap”. In the first advance step, the data model specific run phase initializes the export fields that have a corresponding stream field. Then, initializes the data mode specific stream and export field pointers. If it is required, the data model also reads the restart files in this initial step. The spatial and temporal interpolation is performed internally using ESMF provided spatial interpolation types and custom temporal interpolation routines if the data model and stream meshes are not identical. After interpolating (or transferring) stream to data model mesh, the top-level advance routine calls data mode specific routines, which are responsible to calculate added value fields (i.e., wind speed from wind components) and convert units of the data stream based on the convention used in CDEPS.
1.5.3. Finalization phase¶
The data model just returns a message that indicates the end of the main integration loop.
1.5.4. Integration clock¶
The CDEPS data component run time is set through the shared dshr_set_runclock
routine. In this case, the driver configuration dictates the model start and stop
times (through use of ESMF config file, nuopc.runconfig
) and coupling interval to
call the data component through the use of ESMF/NUOPC run sequence (nuopc.runseq
).
The dshr_set_runclock
call also sets up ESMF alarms for restart and stop times
that are used internally in the model and create internal clock representations.
1.5.5. Grid type, decomposition, mapping to internal grid¶
As it mentioned previously, CDEPS includes two programming layers to support flexible
data components: (1) data model and (2) streams. In this design, the data model stays
on top and interacts with the other active model components or mediator. Unlike the
data models, streams do not directly interact with other components but are used by
the data models to create export states. In this case, the streams could have different
meshes but they are spatially mapped to data model mesh before passing to the other
components. This step also includes temporal interpolation to calculate the data
in a certain time and has ability to perform different temporal interpolation
types for each variable such as coszen
, which scale the data according to the
cosine of the solar zenith angle and can be used to represent the diurnal
cycle for solar radiation. The decomposition of the data is handled by the ESMF.