ufs2arco#
ufs2arco is a python package that is designed to make NOAA forecast, reanalysis, and reforecast datasets more accessible for scientific analysis and machine learning model development. The name stems from its original intent, which was to transform output from the Unified Forecast System (UFS) into Analysis Ready, Cloud Optimized (ARCO; Abernathey et al., (2021)) format. However, the package now pulls data from a number of non-UFS sources, including GFS/GEFS before UFS was created, and even ECMWF’s ERA5 dataset.
Warning
The documentation is very much a work in progress. If you have any questions, feel free to raise an issue on the GitHub repo (see “Getting Support” in the Table of Contents).
Capability Overview#
The package is specifically designed to let users run an Extract, Transform, Load (ETL) pipeline which converts NOAA datasets into a format suitable for their uses. There are three general concepts that are important for the workload:
Data Source: the original source of data, for example the archives from NOAA’s Global Ensemble Forecast System (GEFS)
Transforms: any operations that the user needs to perform on the data, for example regridding via xesmf
Target: the resulting dataset is stored in zarr format, and the “target” defines its layout. Right now there are two target layouts:
base: which puts the dataset in a familiar form where all variables and dimensions are easily exposed for analysis. See NOAA’s UFS Replay on Google Cloud Storage for an example of this output format.
anemoi: which makes the dataset ready for machine learning model development using the anemoi framework. Documentation regarding this layout is coming, but in the meantime, check out the anemoi-datsets documentation for more information on this data layout.
The capabilities and datasets available are illustrated by the schematic below:
Table of Contents#
Getting Started
Data Sources
Targets
Examples
Community
References