API Reference¶
fme¶
- class fme.Packer(names)[source]¶
Responsible for packing tensors into a single tensor.
- class fme.StandardNormalizer(means, stds, fill_nans_on_normalize=False, fill_nans_on_denormalize=False)[source]¶
Responsible for normalizing tensors.
- Parameters:
- fme.get_device()[source]¶
If CUDA is available, return a CUDA device. Otherwise, return a CPU device unless FME_USE_MPS is set, in which case return an MPS device if available.
- Return type:
device
- fme.gradient_magnitude(tensor, dim=())[source]¶
Compute the magnitude of gradient across the specified dimensions.
- fme.gradient_magnitude_percent_diff(truth, predicted, weights=None, dim=())[source]¶
Compute the percent difference of the weighted mean gradient magnitude across the specified dimensions.
- fme.rmse_of_time_mean(truth, predicted, weights=None, time_dim=0, spatial_dims=(-2, -1))[source]¶
Compute the RMSE of the time-average given truth and predicted.
- Parameters:
- Return type:
Tensor- Returns:
- The RMSE between the time-mean of the two input tensors. The time and
spatial dims are reduced.
- fme.root_mean_squared_error(truth, predicted, weights=None, dim=())[source]¶
Compute a weighted root mean square error between truth and predicted.
Namely:
sqrt((weights * ((xhat - x) ** 2)).mean(dims))
- Parameters:
truth (
Tensor) – torch.Tensor whose last dimensions are to be weightedpredicted (
Tensor) – torch.Tensor whose last dimensions are to be weightedweights (
Optional[Tensor], default:None) – torch.Tensor to apply to the squared bias.dim (
int|Iterable[int], default:()) – Dimensions to average over.
- Return type:
Tensor- Returns:
A tensor of weighted RMSEs.
- fme.spherical_area_weights(lats, num_lon)[source]¶
Computes area weights given the latitudes of a regular lat-lon grid.
- Parameters:
lats (
ndarray|Tensor) – tensor of shape (…, num_lat,) with the latitudes of the cell centers.num_lon (
int) – Number of longitude points.
- Return type:
Tensor- Returns:
a torch.tensor of shape (num_lat, num_lon).
- fme.time_and_global_mean_bias(truth, predicted, weights=None, time_dim=0, spatial_dims=(-2, -1))[source]¶
Compute the global- and time-mean bias given truth and predicted.
- Parameters:
- Return type:
Tensor- Returns:
- The global- and time-mean bias between the predicted and truth tensors. The
time and spatial dims are reduced.
- fme.weighted_mean(tensor, weights=None, dim=(), keepdim=False)[source]¶
Computes the weighted mean across the specified list of dimensions.
- Parameters:
- Return type:
Tensor- Returns:
a tensor of the weighted mean averaged over the specified dimensions dim.
- fme.weighted_mean_bias(truth, predicted, weights=None, dim=())[source]¶
Computes the mean bias across the specified list of dimensions assuming that the weights are applied to the last dimensions, e.g. the spatial dimensions.
- Parameters:
- Return type:
Tensor- Returns:
a tensor of the mean biases averaged over the specified dimensions dim.
- fme.weighted_mean_gradient_magnitude(tensor, weights=None, dim=())[source]¶
Compute weighted mean of gradient magnitude across the specified dimensions.
- fme.weighted_std(tensor, weights=None, dim=())[source]¶
Computes the weighted standard deviation across the specified list of dimensions.
Computed by first computing the weighted variance, then taking the square root.
weighted_variance = weighted_mean((tensor - weighted_mean(tensor)) ** 2)) ** 0.5
- Parameters:
- Return type:
Tensor- Returns:
- a tensor of the weighted standard deviation over the
specified dimensions dim.
fme.ace¶
- class fme.ace.AtmosphereCorrectorConfig(conserve_dry_air=False, zero_global_mean_moisture_advection=False, moisture_budget_correction=None, force_positive_names=<factory>, total_energy_budget_correction=None)[source]¶
Configuration for the post-step state corrector.
conserve_dry_airenforces the constraint that:\[global\_dry\_air = global\_mean(ps - sum_k((ak\_diff + bk\_diff \* ps) \* wat_k))\]in the generated data is equal to its value in the input data. This is done by adding a globally-constant correction to the surface pressure in each column. As per-mass values such as mixing ratios of water are unchanged, this can cause changes in total water or energy. Note all global means here are area-weighted.
zero_global_mean_moisture_advectionenforces the constraint that:\[global\_mean(tendency\_of\_total\_water\_path\_due\_to\_advection) = 0\]in the generated data. This is done by adding a globally-constant correction to the moisture advection tendency in each column.
moisture_budget_correctionenforces closure of the moisture budget equation:\[\begin{split}tendency\_of\_total\_water\_path = (evaporation\_rate - precipitation\_rate \\\\ + tendency\_of\_total\_water\_path\_due\_to\_advection)\end{split}\]in the generated data, where
tendency_of_total_water_pathis the difference between the total water path at the current timestep and the previous timestep divided by the time difference. This is done by modifying the precipitation, evaporation, and/or moisture advection tendency fields as described in themoisture_budget_correctionattribute. When advection tendency is modified, this budget equation is enforced in each column, while when only precipitation or evaporation are modified, only the global mean of the budget equation is enforced.When enforcing moisture budget closure, we assume the global mean moisture advection is zero. Therefore
zero_global_mean_moisture_advectionmust be True if using amoisture_budget_correctionoption other thanNone.- Parameters:
conserve_dry_air (
bool, default:False) – If True, force the generated data to conserve dry air by subtracting a constant offset from the surface pressure of each column. This can cause changes in per-mass values such as total water or energy.zero_global_mean_moisture_advection (
bool, default:False) – If True, force the generated data to have zero global mean moisture advection by subtracting a constant offset from the moisture advection tendency of each column.moisture_budget_correction (
Optional[Literal['precipitation','evaporation','advection_and_precipitation','advection_and_evaporation']], default:None) –If not “None”, force the generated data to conserve global or column-local moisture by modifying budget fields. Options are:
precipitation: multiply precipitation by a scale factor to close the global moisture budget.evaporation: multiply evaporation by a scale factor to close the global moisture budget.advection_and_precipitation: after applying the “precipitation” global-mean correction above, recompute the column-integrated advective tendency as the budget residual, ensuring column budget closure.advection_and_evaporation: after applying the “evaporation” global-mean correction above, recompute the column-integrated advective tendency as the budget residual, ensuring column budget closure.
force_positive_names (
list[str], default:<factory>) – Names of fields that should be forced to be greater than or equal to zero. This is useful for fields like precipitation.total_energy_budget_correction (
Optional[EnergyBudgetConfig], default:None) – If not None, force the generated data to conserve an idealized version of total energy using the provided configuration.
- class fme.ace.AugmentationConfig(rotate_probability=0.0, additional_directional_names=<factory>)[source]¶
Configuration for data augmentation.
- rotate_probability¶
The probability of rotating the sphere by 180 degrees, as a value between 0.0 and 1.0.
- additional_directional_names¶
Names of variables whose sign is flipped when the poles are reversed. By default this includes known directional names as stored in RotateModifier.FLIP_NAMES.
- class fme.ace.CappedGELUConfig(cap_value=10, enable_nhwc=False, enable_healpixpad=False)[source]¶
Configuration for the CappedGELU activation function.
- Parameters:
- class fme.ace.ConcatDatasetConfig(concat, strict=True)[source]¶
Configuration for concatenating multiple datasets. :type concat:
Sequence[XarrayDataConfig] :param concat: List of XarrayDataConfig objects to concatenate. :type strict:bool, default:True:param strict: Whether to enforce that the datasets to be concatenatedhave the same dimensions and coordinates.
- Parameters:
concat (Sequence[XarrayDataConfig]) –
strict (bool) –
- class fme.ace.ConstantConfig(amplitude=1.0)[source]¶
Configuration for a constant perturbation.
- Parameters:
amplitude (float) –
- class fme.ace.ConvBlockConfig(in_channels=3, out_channels=1, kernel_size=3, dilation=1, n_layers=1, stride=2, upscale_factor=4, latent_channels=None, upsampling=None, activation=None, enable_nhwc=False, enable_healpixpad=False, block_type='BasicConvBlock')[source]¶
Configuration for the convolutional block.
- Parameters:
in_channels (
int, default:3) – Number of input channels, default is 3.out_channels (
int, default:1) – Number of output channels, default is 1.kernel_size (
int, default:3) – Size of the kernel, default is 3.dilation (
int, default:1) – Dilation rate, default is 1.n_layers (
int, default:1) – Number of layers, default is 1.upsampling (
Optional[UpsamplingBlockConfig], default:None) – Upsampling factor for TransposedConvUpsample, default is 2.upscale_factor (
int, default:4) – Upscale factor for ConvNeXtBlock and SymmetricConvNeXtBlock, default is 4.latent_channels (
Optional[int], default:None) – Number of latent channels, default is None.activation (
Optional[CappedGELUConfig], default:None) – Activation configuration, default is None.enable_nhwc (
bool, default:False) – Flag to enable NHWC data format, default is False.enable_healpixpad (
bool, default:False) – Flag to enable HEALPix padding, default is False.block_type (
Literal['BasicConvBlock','ConvNeXtBlock','SymmetricConvNeXtBlock','ConvThenUpsample','TransposedConvUpsample'], default:'BasicConvBlock') – Type of block, default is “BasicConvBlock”.stride (int) –
- class fme.ace.CopyWeightsConfig(include=<factory>, exclude=<factory>)[source]¶
Configuration for copying weights from a base model to a target model.
Used during training to overwrite weights after every batch of data, to have the effect of “freezing” the overwritten weights. When the target parameters have longer dimensions than the base model, only the initial slice is overwritten.
This is used to achieve an effect of freezing model parameters that can freeze a subset of each weight that comes from a smaller base weight. This is less efficient than true parameter freezing, but layer freezing is all-or-nothing for each parameter.
All parameters must be covered by either the include or exclude list, but not both.
- Parameters:
- class fme.ace.CorrectorSelector(type, config)[source]¶
A dataclass containing all the information needed to build a CorrectorConfigProtocol, including the type of the CorrectorConfigProtocol and the data needed to build it.
This is helpful as CorrectorSelector can be serialized and deserialized without any additional information, whereas to load a CorrectorConfigProtocol you would need to know the type of the CorrectorConfigProtocol being loaded.
It is also convenient because CorrectorSelector is a single class that can be used to represent any CorrectorConfigProtocol, whereas CorrectorConfigProtocol is a protocol that can be implemented by many different classes.
- Parameters:
- class fme.ace.DataLoaderConfig(dataset, batch_size, num_data_workers=0, prefetch_factor=None, augmentation=<factory>, sample_with_replacement=None, time_buffer=0)[source]¶
Configuration for the data loader.
- Note: Setting time_buffer to a value greater than 0 results in pre-loading
samples of length time_buffer + n_timesteps_required, where n_timesteps_required is the number of timesteps required for training the model (initial condition(s) plus forward step(s)). These pre-loaded samples become a window from which samples of the required length are drawn without replacement. The windows will overlap by an amount such that no samples are skipped, with exception of the last window, which is dropped if incomplete. This is useful for improving data loading throughput and reducing the number of reads. There must be enough pre-loaded samples in the dataset to produce at least one batch at the configured batch size. Independent data will be seen every time_buffer + 1 batches, i.e., this is the number of samples in each pre-loaded window.
- Parameters:
dataset (
ConcatDatasetConfig|MergeDatasetConfig|XarrayDataConfig|Sequence[XarrayDataConfig]) – Could be a single dataset configuration, or a sequence of datasets to be concatenated using the keyword concat, or datasets from different sources to be merged using the keyword merge. For backwards compatibility, it can also be a sequence of datasets, which will be concatenated. During merge, if multiple datasets contain the same data variable, the version from the first source is loaded and other sources are ignored.batch_size (
int) – Number of samples per batch.num_data_workers (
int, default:0) – Number of parallel workers to use for data loading.prefetch_factor (
Optional[int], default:None) – how many batches a single data worker will attempt to hold in host memory at a given time.augmentation (
AugmentationConfig, default:<factory>) – Configuration for data augmentation.sample_with_replacement (
Optional[int], default:None) – If provided, the dataset will be sampled randomly with replacement to the given size each period, instead of retrieving each sample once (either shuffled or not).time_buffer (
int, default:0) – How many more continuous timesteps to load in memory than the required number of timesteps for a single batch. Setting this to greater than 0 should improve data loading performance, however, it also decreases the independence of subsequent batches if shuffled batches are desired.
- class fme.ace.DataWriterConfig(log_extended_video_netcdfs=False, save_prediction_files=True, save_monthly_files=True, names=None, save_histogram_files=False, time_coarsen=None)[source]¶
Configuration for inference data writers.
- Parameters:
log_extended_video_netcdfs (
bool, default:False) – Whether to enable writing of netCDF files containing video metrics.save_prediction_files (
bool, default:True) – Whether to enable writing of netCDF files containing the predictions and target values.save_monthly_files (
bool, default:True) – Whether to enable writing of netCDF files containing the monthly predictions and target values.names (
Optional[Sequence[str]], default:None) – Names of variables to save in the prediction, histogram, and monthly netCDF files.save_histogram_files (
bool, default:False) – Enable writing of netCDF files containing histograms.time_coarsen (
Optional[TimeCoarsenConfig], default:None) – Configuration for time coarsening of written outputs.
- class fme.ace.DownsamplingBlockConfig(block_type, pooling=2, enable_nhwc=False, enable_healpixpad=False)[source]¶
Configuration for the downsampling block. Generally, either a pooling block or a striding conv block.
- Parameters:
block_type (
Literal['MaxPool','AvgPool']) – Type of recurrent block, either “MaxPool” or “AvgPool”pooling (
int, default:2) – Pooling sizeenable_nhwc (
bool, default:False) – Flag to enable NHWC data format, default is False.enable_healpixpad (
bool, default:False) – Flag to enable HEALPix padding, default is False.
- class fme.ace.EMAConfig(decay=0.9999)[source]¶
Configuration for exponential moving average of model weights.
- Parameters:
decay (
float, default:0.9999) – decay rate for the moving average
- class fme.ace.ExistingStepperConfig(checkpoint_path)[source]¶
Configuration for an existing stepper. This allows loading a serialized stepper from a checkpoint without loading its configuration of the training and optimization schedule, i.e., this allows for specifying a new schedule in fine-tuning. Not used for training resumption.
- Parameters:
checkpoint_path (
str) – The path to the serialized checkpoint; should be differentdirectory. (than the experiment output) –
- class fme.ace.FillNaNsConfig(method='constant', value=0.0)[source]¶
Configuration to fill NaNs with a constant value or others.
- class fme.ace.ForcingDataLoaderConfig(dataset, num_data_workers=0, perturbations=None, persistence_names=None)[source]¶
Configuration for the forcing data.
- Parameters:
dataset (
XarrayDataConfig|MergeNoConcatDatasetConfig) – Configuration to define the dataset.num_data_workers (
int, default:0) – Number of parallel workers to use for data loading.perturbations (
Optional[SSTPerturbation], default:None) – Configuration for SST perturbations used in forcing data.persistence_names (
Optional[Sequence[str]], default:None) – Names of variables for which all returned values will be the same as the initial condition. When evaluating initial condition predictability, set this to forcing variables that should not be updated during inference (e.g. surface temperature).
- class fme.ace.FrozenParameterConfig(include=<factory>, exclude=<factory>)[source]¶
Configuration for freezing parameters in a model.
Parameter names can include wildcards, e.g. “encoder.*” will select all parameters in the encoder, while “encoder.*.bias” will select all bias parameters in the encoder. All parameters must be specified in either the include or exclude list, or an exception will be raised.
An exception is raised if a parameter is included by both lists.
- class fme.ace.GreensFunctionConfig(amplitude=1.0, lat_center=0.0, lon_center=0.0, lat_width=10.0, lon_width=10.0)[source]¶
Configuration for a single sinusoidal patch of a Green’s function perturbation. See equation 1 in Bloch‐Johnson, J., et al. (2024).
- Parameters:
amplitude (
float, default:1.0) – The amplitude of the perturbation, maximum is reached at (lat_center, lon_center).lat_center (
float, default:0.0) – The latitude at the center of the patch in degrees.lon_center (
float, default:0.0) – The longitude at the center of the patch in degrees.lat_width (
float, default:10.0) – latitudinal width of the patch in degrees.lon_width (
float, default:10.0) – longitudinal width of the patch in degrees.
- class fme.ace.GriddedOperations[source]¶
- classmethod from_state(state)[source]¶
Given a dictionary with a “type” key and a “state” key, return the GriddedOperations it describes.
The “type” key should be the name of a subclass of GriddedOperations, and the “state” key should be a dictionary specific to that subclass.
- class fme.ace.HEALPixRecUNetBuilder(encoder, decoder, presteps=1, input_time_size=0, output_time_size=0, delta_time='6h', reset_cycle='24h', n_constants=2, decoder_input_channels=1, prognostic_variables=7, enable_nhwc=False, enable_healpixpad=False)[source]¶
Configuration for the HEALPixRecUNet architecture used in DLWP.
- Parameters:
presteps (
int, default:1) – Number of pre-steps, by default 1.input_time_size (
int, default:0) – Input time dimension, by default 0.output_time_size (
int, default:0) – Output time dimension, by default 0.delta_time (
str, default:'6h') – Delta time interval, by default “6h”.reset_cycle (
str, default:'24h') – Reset cycle interval, by default “24h”.input_channels – Number of input channels, by default 8.
output_channels – Number of output channels, by default 8.
n_constants (
int, default:2) – Number of constant input channels, by default 2.decoder_input_channels (
int, default:1) – Number of input channels for the decoder, by default 1.enable_nhwc (
bool, default:False) – Flag to enable NHWC data format, by default False.enable_healpixpad (
bool, default:False) – Flag to enable HEALPix padding, by default False.encoder (UNetEncoderConfig) –
decoder (UNetDecoderConfig) –
prognostic_variables (int) –
- class fme.ace.InferenceAggregatorConfig(time_mean_reference_data=None, log_global_mean_time_series=True)[source]¶
Configuration for inference aggregator.
- class fme.ace.InferenceConfig(experiment_dir, n_forward_steps, checkpoint_path, logging, initial_condition, forcing_loader, forward_steps_in_memory=10, data_writer=<factory>, aggregator=<factory>, stepper_override=None, allow_incompatible_dataset=False)[source]¶
Configuration for running inference.
- Parameters:
experiment_dir (
str) – Directory to save results to.n_forward_steps (
int) – Number of steps to run the model forward for.checkpoint_path (
str) – Path to stepper checkpoint to load.logging (
LoggingConfig) – Configuration for logging.initial_condition (
InitialConditionConfig) – Configuration for initial condition data.forcing_loader (
ForcingDataLoaderConfig) – Configuration for forcing data.forward_steps_in_memory (
int, default:10) – Number of forward steps to complete in memory at a time.data_writer (
DataWriterConfig, default:<factory>) – Configuration for data writers.aggregator (
InferenceAggregatorConfig, default:<factory>) – Configuration for inference aggregator.stepper_override (
Optional[StepperOverrideConfig], default:None) – Configuration for overriding select stepper configuration options at inference time (optional).allow_incompatible_dataset (
bool, default:False) – If True, allow the dataset used for inference to be incompatible with the dataset used for stepper training. This should be used with caution, as it may allow the stepper to make scientifically invalid predictions, but it can allow running inference with incorrectly formatted or missing grid information.
- class fme.ace.InferenceDataLoaderConfig(dataset, start_indices, num_data_workers=0, perturbations=None, persistence_names=None)[source]¶
Configuration for inference data.
This is like the DataLoaderConfig class, but with some additional constraints. During inference, we have only one batch, so the number of samples directly determines the size of that batch.
- Parameters:
dataset (
XarrayDataConfig|MergeNoConcatDatasetConfig) – Configuration to define the dataset.start_indices (
InferenceInitialConditionIndices|ExplicitIndices|TimestampList) – Configuration of the indices for initial conditions during inference. This can be a list of timestamps, a list of integer indices, or a slice configuration of the integer indices. Values following the initial condition will still come from the full dataset.num_data_workers (
int, default:0) – Number of parallel workers to use for data loading.perturbations (
Optional[SSTPerturbation], default:None) – Configuration for SST perturbations.persistence_names (
Optional[Sequence[str]], default:None) – Names of variables for which all returned values will be the same as the initial condition. When evaluating initial condition predictability, set this to forcing variables that should not be updated during inference (e.g. surface temperature).
- class fme.ace.InferenceEvaluatorAggregatorConfig(log_histograms=False, log_video=False, log_extended_video=False, log_zonal_mean_images=True, log_seasonal_means=False, log_global_mean_time_series=True, log_global_mean_norm_time_series=True, monthly_reference_data=None, time_mean_reference_data=None)[source]¶
Configuration for inference evaluator aggregator.
- Parameters:
log_histograms (
bool, default:False) – Whether to log histograms of the targets and predictions.log_video (
bool, default:False) – Whether to log videos of the state evolution.log_extended_video (
bool, default:False) – Whether to log wandb videos of the predictions with statistical metrics, only done if log_video is True.log_zonal_mean_images (
bool, default:True) – Whether to log zonal-mean images (hovmollers) with a time dimension.log_seasonal_means (
bool, default:False) – Whether to log seasonal mean metrics and images.log_global_mean_time_series (
bool, default:True) – Whether to log global mean time series metrics.log_global_mean_norm_time_series (
bool, default:True) – Whether to log the normalized global mean time series metrics.monthly_reference_data (
Optional[str], default:None) – Path to monthly reference data to compare against.time_mean_reference_data (
Optional[str], default:None) – Path to reference time means to compare against.
- class fme.ace.InferenceEvaluatorConfig(experiment_dir, n_forward_steps, checkpoint_path, logging, loader, prediction_loader=None, forward_steps_in_memory=1, data_writer=<factory>, aggregator=<factory>, stepper_override=None, allow_incompatible_dataset=False)[source]¶
Configuration for running inference including comparison to reference data.
- Parameters:
experiment_dir (
str) – Directory to save results to.n_forward_steps (
int) – Number of steps to run the model forward for.checkpoint_path (
str) – Path to stepper checkpoint to load.logging (
LoggingConfig) – configuration for logging.loader (
InferenceDataLoaderConfig) – Configuration for data to be used as initial conditions, forcing, and target in inference.prediction_loader (
Optional[InferenceDataLoaderConfig], default:None) – Configuration for prediction data to evaluate. If given, model evaluation will not run, and instead predictions will be evaluated. Model checkpoint will still be used to determine inputs and outputs.forward_steps_in_memory (
int, default:1) – Number of forward steps to complete in memory at a time, will load one more step for initial condition.data_writer (
DataWriterConfig, default:<factory>) – Configuration for data writers.aggregator (
InferenceEvaluatorAggregatorConfig, default:<factory>) – Configuration for inference evaluator aggregator.stepper_override (
Optional[StepperOverrideConfig], default:None) – Configuration for overriding select stepper configuration options at inference time (optional).allow_incompatible_dataset (
bool, default:False) – If True, allow the forcing dataset used for inference to be incompatible with the dataset used for stepper training. This should be used with caution, as it may allow the stepper to make scientifically invalid predictions, but it can allow running inference with incorrectly formatted or missing grid information.
- class fme.ace.InferenceInitialConditionIndices(n_initial_conditions, first=0, interval=1)[source]¶
Configuration of the indices for initial conditions during inference.
- class fme.ace.InitialConditionConfig(path, engine='netcdf4', start_indices=None)[source]¶
Configuration for initial conditions.
Note
The data specified under path should contain a time dimension of at least length 1. If multiple times are present in the dataset specified by
path, the inference will start an ensemble simulation using each IC along a leading sample dimension. Specific times can be selected from the dataset by usingstart_indices.- Parameters:
path (
str) – The path to the initial conditions dataset.engine (
Literal['netcdf4','h5netcdf','zarr'], default:'netcdf4') – The engine used to open the dataset.start_indices (
UnionType[InferenceInitialConditionIndices,ExplicitIndices,TimestampList,None], default:None) – optional specification of the subset of initial conditions to use.
- class fme.ace.InlineInferenceConfig(loader, n_forward_steps=2, forward_steps_in_memory=2, epochs=<factory>, aggregator=<factory>)[source]¶
- Parameters:
loader (
InferenceDataLoaderConfig) – configuration for the data loader used during inferencen_forward_steps (
int, default:2) – number of forward steps to takeforward_steps_in_memory (
int, default:2) – number of forward steps to take before re-reading data from diskepochs (
Slice, default:<factory>) – epochs on which to run inference. By default runs inference every epoch.aggregator (
InferenceEvaluatorAggregatorConfig, default:<factory>) – configuration of inline inference aggregator.
- class fme.ace.LoggingConfig(project='ace', entity='ai2cm', log_to_screen=True, log_to_file=True, log_to_wandb=True, log_format='%(asctime)s - %(name)s - %(levelname)s - %(message)s', level=20)[source]¶
Configuration for logging.
- Parameters:
project (
str, default:'ace') – Name of the project in Weights & Biases.entity (
str, default:'ai2cm') – Name of the entity in Weights & Biases.log_to_screen (
bool, default:True) – Whether to log to the screen.log_to_file (
bool, default:True) – Whether to log to a file.log_to_wandb (
bool, default:True) – Whether to log to Weights & Biases.log_format (
str, default:'%(asctime)s - %(name)s - %(levelname)s - %(message)s') – Format of the log messages.
- class fme.ace.MergeDatasetConfig(merge)[source]¶
Configuration for merging multiple datasets. :type merge:
Sequence[ConcatDatasetConfig|XarrayDataConfig] :param merge: List of ConcatDatasetConfig or XarrayDataConfig to merge.- Parameters:
merge (Sequence[ConcatDatasetConfig | XarrayDataConfig]) –
- class fme.ace.MergeNoConcatDatasetConfig(merge)[source]¶
Configuration for merging multiple datasets. No concatenation is allowed. :type merge:
Sequence[XarrayDataConfig] :param merge: List of XarrayDataConfig to merge.- Parameters:
merge (Sequence[XarrayDataConfig]) –
- class fme.ace.ModuleSelector(type, config)[source]¶
A dataclass containing all the information needed to build a ModuleConfig, including the type of the ModuleConfig and the data needed to build it.
This is helpful as ModuleSelector can be serialized and deserialized without any additional information, whereas to load a ModuleConfig you would need to know the type of the ModuleConfig being loaded.
It is also convenient because ModuleSelector is a single class that can be used to represent any ModuleConfig, whereas ModuleConfig is a protocol that can be implemented by many different classes.
- Parameters:
- class fme.ace.MultiCallConfig(forcing_name, forcing_multipliers, output_names)[source]¶
Configuration for doing ‘multi-call’ predictions where an input variable (e.g. CO2) is varied by multiplying by floats and then certain output variables (e.g. radiative heating or fluxes) are predicted.
- Parameters:
forcing_name (
str) – name of the variable to perturb in the forcing data, e.g. “co2”.forcing_multipliers (
dict[str,float]) – mapping from a label suffix to a multiplier that is applied to the ‘forcing_name’ variable. For example, could be {“_quadrupled_co2”: 4, “_halved_co2”: 0.5}. The suffixes will be appended to the output_names below.output_names (
list[str]) – names of the variables to predict given perturbed forcing. For example, [“ULWRFtoa”, “USWRFsfc”].
- class fme.ace.NormalizationConfig(global_means_path=None, global_stds_path=None, means=<factory>, stds=<factory>, fill_nans_on_normalize=False, fill_nans_on_denormalize=False)[source]¶
Configuration for normalizing data.
Either global_means_path and global_stds_path or explicit means and stds must be provided.
- Parameters:
global_means_path (
UnionType[str,Path,None], default:None) – Path to a netCDF file containing global means.global_stds_path (
UnionType[str,Path,None], default:None) – Path to a netCDF file containing global stds.means (
Mapping[str,float], default:<factory>) – Mapping from variable names to means.stds (
Mapping[str,float], default:<factory>) – Mapping from variable names to stds.fill_nans_on_normalize (
bool, default:False) – Whether to fill NaNs during normalization. If true, on normalization NaNs in the denormalized input become zeros in the normalized output.fill_nans_on_denormalize (
bool, default:False) – Whether to fill NaNs during denormalization. If true, on denormalization NaNs in the normalized input become global means in the denormalized output.
- class fme.ace.OceanConfig(surface_temperature_name, ocean_fraction_name, interpolate=False, slab=None)[source]¶
Configuration for determining sea surface temperature from an ocean model.
- Parameters:
surface_temperature_name (
str) – Name of the sea surface temperature field.ocean_fraction_name (
str) – Name of the ocean fraction field.interpolate (
bool, default:False) – If True, interpolate between ML-predicted surface temperature and ocean-predicted surface temperature according to ocean_fraction. If False, only use ocean-predicted surface temperature where ocean_fraction>=0.5.slab (
Optional[SlabOceanConfig], default:None) – If provided, use a slab ocean model to predict surface temperature.
- class fme.ace.OceanCorrectorConfig(force_positive_names=<factory>, sea_ice_fraction_correction=None, masking=None, ocean_heat_content_correction=False)[source]¶
- class fme.ace.OneStepAggregatorConfig(log_snapshots=True, log_mean_maps=True)[source]¶
Configuration for the validation OneStepAggregator.
- class fme.ace.OptimizationConfig(optimizer_type='Adam', lr=0.001, kwargs=<factory>, enable_automatic_mixed_precision=False, scheduler=<factory>, use_gradient_accumulation=False, checkpoint=<factory>)[source]¶
Configuration for optimization.
- Parameters:
optimizer_type (
Literal['Adam','FusedAdam'], default:'Adam') – The type of optimizer to use.lr (
float, default:0.001) – The learning rate.kwargs (
Mapping[str,Any], default:<factory>) – Additional keyword arguments to pass to the optimizer.enable_automatic_mixed_precision (
bool, default:False) – Whether to use automatic mixed precision.scheduler (
SchedulerConfig, default:<factory>) – The type of scheduler to use. If none is given, no scheduler will be used.use_gradient_accumulation (
bool, default:False) – Whether to use gradient accumulation. This must be supported by the stepper being optimized, which may accumulate gradients from separate losses to reduce memory consumption. The stepper may choose to accumulate gradients differently when this is enabled, such as by detaching the computational graph between steps. See the documentation of your stepper (e.g. Stepper) for more details.checkpoint (CheckpointConfig) –
- class fme.ace.OverwriteConfig(constant=<factory>, multiply_scalar=<factory>)[source]¶
Configuration to overwrite field values in XarrayDataset.
- class fme.ace.ParameterInitializationConfig(weights_path=None, parameters=<factory>, alpha=0.0, beta=0.0, exclude_parameters=None, frozen_parameters=None)[source]¶
A class which applies custom initialization to module parameters.
Assumes the module weights have already been randomly initialized.
Supports overwriting the weights of the built model with weights from a pre-trained model. If the built model has larger weights than the pre-trained model, only the initial slice of the weights is overwritten.
- Parameters:
weight_path – path to a Stepper checkpoint containing weights to load
parameters (
list[ParameterClassification], default:<factory>) – list of ParameterClassification objects, each specifying whether parameters are excluded from initialization or frozen. By default modules are unfrozen and all parameters are included. Must be provided in the same order as provided by the stepper’s .modules attribute.alpha (
float, default:0.0) – L2 regularization coefficient keeping initialized weights close to their intiial valuesbeta (
float, default:0.0) – L2 regularization coefficient keeping uninitialized weights close to zeroexclude_parameters (
Optional[list[str]], default:None) – deprecated, kept for backwards compatibilityfrozen_parameters (
Optional[FrozenParameterConfig], default:None) – deprecated, kept for backwards compatibilityweights_path (str | None) –
- class fme.ace.RecurrentBlockConfig(in_channels=3, kernel_size=1, enable_nhwc=False, enable_healpixpad=False, block_type='ConvGRUBlock')[source]¶
Configuration for the recurrent block.
- Parameters:
in_channels (
int, default:3) – Number of input channels, default is 3.kernel_size (
int, default:1) – Size of the kernel, default is 1.enable_nhwc (
bool, default:False) – Flag to enable NHWC data format, default is False.enable_healpixpad (
bool, default:False) – Flag to enable HEALPix padding, default is False.block_type (
Literal['ConvGRUBlock','ConvLSTMBlock'], default:'ConvGRUBlock') – Type of recurrent block, either “ConvGRUBlock” or “ConvLSTMBlock”,"ConvGRUBlock". (default is) –
- class fme.ace.RepeatedInterval(interval_length, start, block_length)[source]¶
Configuration for a repeated interval within a block. This configuration is used to generate a boolean mask for a dataset that will return values within the interval and repeat that throughout the dataset.
- Parameters:
Note
The interval_length, start, and block_length can be provided as either all integers or all strings representing timedeltas of the block. If provided as strings, the timestep must be provided when calling get_boolean_mask.
Examples
To return values from the first 3 items of every 6 items, use:
>>> fme.ace.RepeatedInterval(interval_length=3, block_length=6, start=0)
To return a days worth of values starting after 2 days from every 7-day block, use:
>>> fme.ace.RepeatedInterval(interval_length="1d", block_length="7d", start="2d")
- class fme.ace.SFNO_V0_1_0(spectral_transform='sht', filter_type='linear', operator_type='dhconv', scale_factor=16, embed_dim=256, num_layers=12, repeat_layers=1, hard_thresholding_fraction=1.0, normalization_layer='instance_norm', use_mlp=True, activation_function='gelu', encoder_layers=1, pos_embed='direct', big_skip=True, rank=1.0, factorization=None, separable=False, complex_activation='real', spectral_layers=1, checkpointing=0, data_grid='legendre-gauss')[source]¶
Configuration for the SFNO architecture in modulus-makani version 0.1.0.
- Parameters:
spectral_transform (str) –
filter_type (Literal['linear']) –
operator_type (str) –
scale_factor (int) –
embed_dim (int) –
num_layers (int) –
repeat_layers (int) –
hard_thresholding_fraction (float) –
normalization_layer (str) –
use_mlp (bool) –
activation_function (str) –
encoder_layers (int) –
pos_embed (Literal['none', 'direct', 'frequency']) –
big_skip (bool) –
rank (float) –
factorization (str | None) –
separable (bool) –
complex_activation (str) –
spectral_layers (int) –
checkpointing (int) –
data_grid (Literal['legendre-gauss', 'equiangular', 'healpix']) –
- class fme.ace.SSTPerturbation(sst)[source]¶
Configuration for sea surface temperature perturbations applied to initial condition and forcing data. Currently, this is strictly applied to both.
- Parameters:
sst (
list[PerturbationSelector]) – List of perturbation selectors for SST perturbations.
- class fme.ace.SchedulerConfig(type=None, kwargs=<factory>)[source]¶
Configuration for a scheduler to use during training.
- Parameters:
- class fme.ace.SingleModuleStepperConfig(builder, in_names, out_names, normalization, parameter_init=<factory>, ocean=None, loss=<factory>, corrector=<factory>, next_step_forcing_names=<factory>, loss_normalization=None, residual_normalization=None, multi_call=None, include_multi_call_in_loss=False, crps_training=False, residual_prediction=False)[source]¶
Configuration for a single module stepper.
- Parameters:
builder (
ModuleSelector) – The module builder.normalization (
NormalizationConfig) – The normalization configuration.parameter_init (
ParameterInitializationConfig, default:<factory>) – The parameter initialization configuration.ocean (
Optional[OceanConfig], default:None) – The ocean configuration.loss (
WeightedMappingLossConfig, default:<factory>) – The loss configuration.corrector (
AtmosphereCorrectorConfig|CorrectorSelector, default:<factory>) – The corrector configuration.next_step_forcing_names (
list[str], default:<factory>) – Names of forcing variables for the next timestep.loss_normalization (
Optional[NormalizationConfig], default:None) – The normalization configuration for the loss.residual_normalization (
Optional[NormalizationConfig], default:None) – Optional alternative to configure loss normalization. If provided, it will be used for all prognostic variables in loss scaling.multi_call (
Optional[MultiCallConfig], default:None) – The configuration of multi-called diagnostics.include_multi_call_in_loss (
bool, default:False) – Whether to include multi-call diagnostics in the loss. The same loss configuration as specified in ‘loss’ is used.crps_training (
bool, default:False) – Whether to use CRPS training for stochastic models.residual_prediction (
bool, default:False) – Whether to have ML module predict tendencies for prognostic variables.
- property all_names¶
Names of all variables required, including auxiliary ones.
- get_parameter_initializer()[source]¶
Get the parameter initializer for this stepper configuration.
- Return type:
ParameterInitializer
- get_stepper(dataset_info, apply_parameter_init=True)[source]¶
- Parameters:
dataset_info (
DatasetInfo) – Information about the training dataset.apply_parameter_init (
bool, default:True) – Whether to apply parameter initialization.
- Return type:
Stepper
- property normalize_names¶
Names of variables which require normalization. I.e. inputs/outputs.
- to_stepper_config(normalizer, loss_normalizer)[source]¶
Convert the current config to a stepper config.
Overwriting normalization configuration is needed to avoid a checkpoint trying to load normalization data from netCDF files which are no longer present when running inference.
- Parameters:
normalizer (
StandardNormalizer) – overwrite the normalization config with data from this normalizerloss_normalizer (
StandardNormalizer) – overwrite the loss normalization config with data from this normalizer
- Return type:
- Returns:
A stepper config.
- class fme.ace.SlabOceanConfig(mixed_layer_depth_name, q_flux_name)[source]¶
Configuration for a slab ocean model.
- class fme.ace.Slice(start=None, stop=None, step=None)[source]¶
Configuration of a python slice built-in.
Required because slice cannot be initialized directly by dacite.
- class fme.ace.SphericalFourierNeuralOperatorBuilder(spectral_transform='sht', filter_type='linear', operator_type='diagonal', scale_factor=1, residual_filter_factor=1, embed_dim=256, num_layers=12, hard_thresholding_fraction=1.0, normalization_layer='instance_norm', use_mlp=True, activation_function='gelu', encoder_layers=1, pos_embed=True, big_skip=True, rank=1.0, factorization=None, separable=False, complex_network=True, complex_activation='real', spectral_layers=1, checkpointing=0, data_grid='legendre-gauss')[source]¶
Configuration for the SFNO architecture used in FourCastNet-SFNO.
- Parameters:
spectral_transform (str) –
filter_type (str) –
operator_type (str) –
scale_factor (int) –
residual_filter_factor (int) –
embed_dim (int) –
num_layers (int) –
hard_thresholding_fraction (float) –
normalization_layer (str) –
use_mlp (bool) –
activation_function (str) –
encoder_layers (int) –
pos_embed (bool) –
big_skip (bool) –
rank (float) –
factorization (str | None) –
separable (bool) –
complex_network (bool) –
complex_activation (str) –
spectral_layers (int) –
checkpointing (int) –
data_grid (Literal['legendre-gauss', 'equiangular']) –
- class fme.ace.StepperConfig(step, loss=<factory>, n_ensemble=-1, crps_training=False, parameter_init=<factory>, input_masking=None)[source]¶
Configuration for a stepper.
- Parameters:
step (
StepSelector) – The step configuration.loss (
WeightedMappingLossConfig, default:<factory>) – The loss configuration.n_ensemble (
int, default:-1) – The number of ensemble members evaluated for each training batch member. Default is 2 if the loss type is EnsembleLoss, otherwise the default is 1. Must be 2 for EnsembleLoss to be valid.crps_training (
bool, default:False) – Deprecated, kept for backwards compatibility. Use n_ensemble=2 with a CRPS loss instead.parameter_init (
ParameterInitializationConfig, default:<factory>) – The parameter initialization configuration.input_masking (
Optional[StaticMaskingConfig], default:None) – Config for masking step inputs.
- classmethod from_stepper_state(state)[source]¶
Initialize a StepperConfig from a stepper state.
This is required for backwards compatibility with older steppers, whose configuration did not provide normalization constants, but rather pointed to files on disk. Newer stepper configurations load these constants into the configuration before checkpoints are saved.
- Parameters:
state – The state of the stepper.
- Return type:
- Returns:
The stepper config.
- get_parameter_initializer()[source]¶
Get the parameter initializer for this stepper configuration.
- Return type:
ParameterInitializer
- property loss_names¶
Names of variables to include in loss.
- property next_step_forcing_names: list[str]¶
Names of variables which are given as inputs but taken from the output timestep.
An example might be solar insolation taken during the output window period.
- replace_multi_call(multi_call, state)[source]¶
Replace the multi-call configuration of self.step and ensure the associated state can be loaded as a multi-call step.
A value of None for multi_call will remove the multi-call configuration.
If the selected type supports it, the multi-call configuration will be updated in place. Otherwise, it will be wrapped in the multi_call step configuration with the given multi_call config or None.
Note this updates self.step in place, but returns a new state dictionary.
- Parameters:
multi_call (
Optional[MultiCallConfig]) – MultiCallConfig for the resulting self.step.state (
dict[str,Any]) – state dictionary associated with the loaded step.
- Return type:
- Returns:
The state dictionary updated to ensure consistency with that of a serialized multi-call step.
- class fme.ace.StepperOverrideConfig(ocean='keep', multi_call='keep')[source]¶
Configuration for overriding stepper configuration options.
The default value for each parameter is
"keep", which denotes that the serialized stepper’s configuration will not be modified when loaded. Passing other values will override the configuration of the loaded stepper.- Parameters:
ocean (
Union[Literal['keep'],OceanConfig,None], default:'keep') – Ocean configuration to override that used in producing a serialized stepper.multi_call (
Union[Literal['keep'],MultiCallConfig,None], default:'keep') – MultiCall configuration to override that used in producing a serialized stepper.
- class fme.ace.TimeCoarsenConfig(coarsen_factor)[source]¶
Config for inference data time coarsening.
- Parameters:
coarsen_factor (
int) – Factor by which to coarsen in time, an integer 1 or greater. The resulting time labels will be coarsened to the mean of the original labels.
- class fme.ace.TimeSlice(start_time=None, stop_time=None, step=None)[source]¶
Configuration of a slice of times. Step is an integer-valued index step.
- Note: start_time and stop_time may be provided as partial time strings and the
stop_time will be included in the slice. See more details in Xarray docs.
- class fme.ace.TimestampList(times, timestamp_format='%Y-%m-%dT%H:%M:%S')[source]¶
Configuration for a list of timestamps.
- class fme.ace.TrainConfig(train_loader, validation_loader, stepper, optimization, logging, max_epochs, save_checkpoint, experiment_dir, inference, n_forward_steps, copy_weights_after_batch=<factory>, ema=<factory>, weather_evaluation=None, validate_using_ema=False, checkpoint_save_epochs=None, ema_checkpoint_save_epochs=None, log_train_every_n_batches=100, segment_epochs=None, save_per_epoch_diagnostics=False, validation_aggregator=<factory>, evaluate_before_training=False)[source]¶
Configuration for training a model.
- Parameters:
train_loader (
DataLoaderConfig) – Configuration for the training data loader.validation_loader (
DataLoaderConfig) – Configuration for the validation data loader.stepper (
SingleModuleStepperConfig|ExistingStepperConfig|StepperConfig) – Configuration for the stepper. SingleModuleStepperConfig is deprecated and will be removed in a future version. Use StepperConfig instead.optimization (
OptimizationConfig) – Configuration for the optimization.logging (
LoggingConfig) – Configuration for logging.max_epochs (
int) – Total number of epochs to train for.save_checkpoint (
bool) – Whether to save checkpoints.experiment_dir (
str) – Directory where checkpoints and logs are saved.inference (
Optional[InlineInferenceConfig]) – Configuration for inline inference. If None, no inline inference is run, and no “best_inline_inference” checkpoint will be saved.weather_evaluation (
Optional[WeatherEvaluationConfig], default:None) – Configuration for weather evaluation. If None, no weather evaluation is run. Weather evaluation is not used to select checkpoints, but is used to provide metrics.n_forward_steps (
int) – Number of forward steps to take gradient over.copy_weights_after_batch (
list[CopyWeightsConfig], default:<factory>) – Configuration for copying weights from the base model to the training model after each batch.ema (
EMAConfig, default:<factory>) – Configuration for exponential moving average of model weights.validate_using_ema (
bool, default:False) – Whether to validate and perform inference using the EMA model.checkpoint_save_epochs (
Optional[Slice], default:None) – How often to save epoch-based checkpoints, if save_checkpoint is True. If None, checkpoints are only saved for the most recent epoch (and the best epochs if validate_using_ema == False).ema_checkpoint_save_epochs (
Optional[Slice], default:None) – How often to save epoch-based EMA checkpoints, if save_checkpoint is True. If None, EMA checkpoints are only saved for the most recent epoch (and the best epochs if validate_using_ema == True).log_train_every_n_batches (
int, default:100) – How often to log batch_loss during training.segment_epochs (
Optional[int], default:None) – Exit after training for at most this many epochs in current job, without exceeding max_epochs. Use this if training must be run in segments, e.g. due to wall clock limit.save_per_epoch_diagnostics (
bool, default:False) – Whether to save per-epoch diagnostics from training, validation and inline inference aggregators.validation_aggregator (
OneStepAggregatorConfig, default:<factory>) – Configuration for the validation aggregator.evaluate_before_training (
bool, default:False) – Whether to run validation and inline inference before any training is done.
- class fme.ace.UNetDecoderConfig(conv_block, up_sampling_block, output_layer, recurrent_block=None, n_channels=<factory>, n_layers=<factory>, output_channels=1, dilations=None, enable_nhwc=False, enable_healpixpad=False)[source]¶
Configuration for the UNet Decoder.
- Parameters:
conv_block (
ConvBlockConfig) – Configuration for the convolutional block.up_sampling_block (
ConvBlockConfig) – Configuration for the up-sampling block.output_layer (
ConvBlockConfig) – Configuration for the output layer block.recurrent_block (
Optional[RecurrentBlockConfig], default:None) – Configuration for the recurrent block, by default None.n_channels (
List[int], default:<factory>) – Number of channels for each layer, by default (34, 68, 136).n_layers (
List[int], default:<factory>) – Number of layers in each block, by default (1, 2, 2).output_channels (
int, default:1) – Number of output channels, by default 1.dilations (
Optional[list], default:None) – List of dilation rates for the layers, by default None.enable_nhwc (
bool, default:False) – Flag to enable NHWC data format, by default False.enable_healpixpad (
bool, default:False) – Flag to enable HEALPix padding, by default False.
- class fme.ace.UNetEncoderConfig(conv_block, down_sampling_block, input_channels=3, n_channels=<factory>, n_layers=<factory>, dilations=None, enable_nhwc=False, enable_healpixpad=False)[source]¶
Configuration for the UNet Encoder.
- Parameters:
conv_block (
ConvBlockConfig) – Configuration for the convolutional block.down_sampling_block (
DownsamplingBlockConfig) – Configuration for the down-sampling block.input_channels (
int, default:3) – Number of input channels, by default 3.n_channels (
List[int], default:<factory>) – Number of channels for each layer, by default (136, 68, 34).n_layers (
List[int], default:<factory>) – Number of layers in each block, by default (2, 2, 1).dilations (
Optional[list], default:None) – List of dilation rates for the layers, by default None.enable_nhwc (
bool, default:False) – Flag to enable NHWC data format, by default False.enable_healpixpad (
bool, default:False) – Flag to enable HEALPix padding, by default False.
- class fme.ace.WeightedMappingLossConfig(type='MSE', kwargs=<factory>, global_mean_type=None, global_mean_kwargs=<factory>, global_mean_weight=1.0, weights=<factory>)[source]¶
Loss configuration class that has the same fields as LossConfig but also has additional weights field. The build method will apply the weights to the inputs of the loss function. The loss returned by build will be a MappingLoss, which takes Dict[str, tensor] as inputs instead of packed tensors.
- Parameters:
type (
Literal['LpLoss','MSE','AreaWeightedMSE','EnsembleLoss'], default:'MSE') – the type of the loss functionkwargs (
Mapping[str,Any], default:<factory>) – data for a loss function instance of the indicated typeglobal_mean_type (
Optional[Literal['LpLoss']], default:None) – the type of the loss function to apply to the global mean of each sample, by default no loss is appliedglobal_mean_kwargs (
Mapping[str,Any], default:<factory>) – data for a loss function instance of the indicated type to apply to the global mean of each sampleglobal_mean_weight (
float, default:1.0) – the weight to apply to the global mean loss relative to the main lossweights (
dict[str,float], default:<factory>) – A dictionary of variable names with individual weights to apply to their normalized losses
- class fme.ace.XarrayDataConfig(data_path, file_pattern='*.nc', n_repeats=1, engine='netcdf4', spatial_dimensions='latlon', subset=<factory>, infer_timestep=True, dtype='float32', overwrite=<factory>, fill_nans=None, isel=<factory>)[source]¶
- Parameters:
data_path (
str) – Path to the data.file_pattern (
str, default:'*.nc') – Glob pattern to match files in the data_path.n_repeats (
int, default:1) – Number of times to repeat the dataset (in time). It is up to the user to ensure that the input dataset to repeat results in data that is reasonably continuous across repetitions.engine (
Literal['netcdf4','h5netcdf','zarr'], default:'netcdf4') – Backend used in xarray.open_dataset call.spatial_dimensions (
Literal['healpix','latlon'], default:'latlon') – Specifies the spatial dimensions for the grid, default is lat/lon. If ‘latlon’, it is assumed that the last two dimensions are latitude and longitude, respectively. If ‘healpix’, it is assumed that the last three dimensions are face, height, and width, respectively.subset (
Slice|TimeSlice|RepeatedInterval, default:<factory>) – Slice defining a subset of the XarrayDataset to load. This can either be a Slice of integer indices or a TimeSlice of timestamps. This feature is applied directly to the dataset samples. For example, if the file(s) have the time coordinate (t0, t1, t2, t3) and requirements.n_timesteps=2, then subset=Slice(stop=2) will provide two samples: (t0, t1), (t1, t2).infer_timestep (
bool, default:True) – Whether to infer the timestep from the provided data. This should be set to True (the default) for ACE training. It may be useful to toggle this to False for applications like downscaling, which do not depend on the timestep of the data and therefore lack the additional requirement that the data be ordered and evenly spaced in time. It must be set to True if n_repeats > 1 in order to be able to infer the full time coordinate.dtype (
Optional[str], default:'float32') – Data type to cast the data to. If None, no casting is done. It is required that ‘torch.{dtype}’ is a valid dtype.overwrite (
OverwriteConfig, default:<factory>) – Optional OverwriteConfig to overwrite loaded field values.fill_nans (
Optional[FillNaNsConfig], default:None) – Optional FillNaNsConfig to fill NaNs with a constant value.isel (
Mapping[str,Slice|int], default:<factory>) – Optional xarray isel arguments to be passed to the dataset. Will raise ValueError if time is included here, since the subset argument is used specifically for selecting times. Horizontal dimensions are also not currently supported.
Examples
If data is stored in a directory with multiple netCDF files which can be concatenated along the time dimension, use:
>>> fme.ace.XarrayDataConfig(data_path="/some/directory", file_pattern="*.nc")
If data is stored in a single zarr store at
/some/directory/dataset.zarr, use:>>> fme.ace.XarrayDataConfig( ... data_path="/some/directory", ... file_pattern="dataset.zarr", ... engine="zarr" ... )