Base Target

Base Target#

The base target is the typical representation of gridded weather or climate data. It is designed to mirror the structure of the chosen data source (e.g. HRRR, GFS). This is the target to chose if you are looking for a user-friendly and familiar data structure that preseves the layout seen in model outputs. All chosen dates and variables are loaded into the standard multidimensional array structure (time, latitude, longitude, level). The dataset will usually be compatible with any typical scientific Python workflow.

This target is best used for general purpose use, exploratory data analysis, and any workflow that expects conventional multidimensional arrays.

Below is a minimal example for GFS data.

target:
  name: base
  rename:
    level: pressure
  chunks:
    t0: 1
    fhr: 1
    pressure: -1
    latitude: -1
    longitude: -1

Note that because this target layout more or less mirrors the original source, there is very little to specify here. The main changes are rename, which in this case renames the vertical level dimension from “level” to “pressure”.

More importantly, the user needs to specify the chunking scheme. This determines the individual file size stored to disk for each chunk of data. Note the shorthand: -1 means that the entire dimension is used for a single chunk. In this example, there is a single file for each initial condition (t0) and forecast hour (fhr), which contains all points in the vertical (pressure), latitude, and longitude dimensions.

Note

It is currently not possible to use chunksizes larger than 1 for the data source’s sample_dims. The sample_dims are ufs2arco’s internally recognized dimensions that determine a single “sample” of data. For most datasets, this will be the time dimension(s), so for ERA5 data this is simply “time”. For the GFS archives, this is initial condition (t0) and forecast hour (fhr). For ensemble datasets, e.g. GEFS, the sample dims also include the ensemble member dimension.