Quickstart¶

Important

This page documents the uncoupled model configuration. For information on using the coupled atmosphere-ocean model, see Coupled Emulation.

Install¶

To install the latest release directly from PyPI, use:

pip install fme

If desired, see the installation page for more information on installing from source or using conda.

Commands¶

The following commands are available, and can be run with --help for more information:

python3 -m fme.ace.validate_config - Validate a configuration file
python3 -m fme.ace.train - Train a model
python3 -m fme.ace.inference - Run a saved model checkpoint
python3 -m fme.ace.evaluator - Run a saved model checkpoint and compare to target data

Accessing ACE checkpoints and datasets¶

We have made multiple versions of ACE publicly available and citable via its Hugging Face collection. This is the recommended way of downloading ACE checkpoints and datasets, and the collection is updated with new checkpoints as they become available. For a given model checkpoint, we generally provide the checkpoint, and (as described below) initial conditions, forcing, and training/evaluation data appropriate for that version of the ACE model.

Checkpoints and datasets can be downloaded from Hugging Face either via the web interface or using the huggingface_hub Python package. Installing the package allows downloading checkpoints via the command line or programmatically, which can be helpful for large data files.

In addition to the methods described above, ACE checkpoints and datasets may be accessed through other means, though these may not be comprehensive:

Zenodo: Selected checkpoints and data subsets are archived and citable via Zenodo. For example, see the ACE-climSST Zenodo repository.
Google Cloud Storage: Some checkpoints and datasets are hosted in a public requester pays GCS bucket; see Accessing data via Google Cloud Storage for more information.
Globus guest collection: Some datasets are available via this method; see Hugging Face collection for more information.

Running a Checkpoint (Inference)¶

The minimum requirements for running inference with ACE are:

a model checkpoint
an initial conditions file containing all prognostic variables
a forcing dataset containing all input-only variables

The initial conditions and forcing files may include more variables than the minimum required, but only the required variables will be used. The code will run an ensemble of predictions starting from each time specified in the initial conditions file, or a subset of these times can be specified in the configuration file. The forcing dataset must contain data for the times specified in the initial conditions file, as well as all timesteps required for the prediction period.

For example, for the ACE2-ERA5 model, the initial conditions and forcing files can be downloaded via the ACE2-ERA5 Hugging Face page

Save a config-inference.yaml file based on the example config with updated initial conditions and forcing paths for the downloaded data. Specifically, initial_condition.path should be the local initial conditions file, and forcing_loader.dataset.data_path should be the local directory containing the forcing data files.

Then in the fme conda environment, run inference with:

python -m fme.ace.inference config-inference.yaml

See the Inference Config section for more information on the configuration.

If you run into configuration issues, you can validate your configuration with

python -m fme.ace.validate_config config-evaluator.yaml --config_type inference

Tip

While inference can be performed without a GPU, it may be very slow. If running on a Mac, set the environmental variable export FME_USE_MPS=1 to enable using the Metal Performance Shaders framework for GPU acceleration. Note this backend is not fully featured and it may not work with all inference features or for training. It is recommended to use the latest version of torch if using MPS.

Evaluating a Checkpoint¶

When target data is available, it is possible to evaluate the model using the fme.ace.evaluator module. This requires a dataset, referred to as target data or alternatively training and validation data, that includes all input and output variables for the prediction period.

For example, for the ACE2-ERA5 model, a 1-year (1940) subsample of the target data is available via the ACE2-ERA5 Hugging Face page.

Alternatively, the entire 1940-2022 dataset is available via the public requester pays Google Cloud Storage bucket; see Accessing data via Google Cloud Storage for more information. Note the dataset is large, meaning it may take a long time to download and may result in significant transfer costs.

Save a config-evaluator.yaml file based on the example config with updated paths for the downloaded data. Then in the fme conda environment, run evaluation with:

python -m fme.ace.evaluator config-evaluator.yaml

If you run into configuration issues, you can validate your configuration with

python -m fme.ace.validate_config config-evaluator.yaml --config_type evaluator

Training a Model¶

Like evaluation, training a model requires datasets with all input and output variables.

For the ACE2-ERA5 model, 1-year (1940) subsample of the target dataset is available via the ACE2-ERA5 Hugging Face page.

You will also require scaling files (centering.nc, scaling-full-field.nc, and scaling-residual.nc in the example training config) containing scalar values for the mean and standard deviation of each input and output variable. These files are available in the ACE2-ERA5 Hugging Face page under training_validation_data/normalization. They can also be generated using the script located at scripts/data_process/get_stats.py.

Save a config-train.yaml file based on the example config with updated paths for the downloaded data. Then in the fme conda environment, run evaluation with:

torchrun --nproc_per_node RANK_COUNT -m fme.ace.train config-train.yaml

where RANK_COUNT is how many processors you want to run on. This will typically be the number of GPUs you have available. If running on a single GPU, you can omit the torchrun command and use python -m instead.

If you run into configuration issues, you can validate your configuration with

python -m fme.ace.validate_config config-train.yaml --config_type train

Wandb Integration¶

For the optional Weights and Biases (wandb) integration, you will need to set the API key:

export WANDB_API_KEY=wandb-api-key

where wandb-api-key is created and retrieved from the “API Keys” section of the Wandb settings page. See also fme.ace.LoggingConfig for configuration of logging to wandb.