Spherical Fourier Neural Operator(SFNO) is a deep learning model that learns operators on spherical coordinates. Because of its awareness of the spherical geometry, it is capable of producing stable auto-regressive rollouts for a longer period of time comparing to its Fourier neural operator counterparts.
In the paper, the authors show the improved stability by two trained models that learn the time advance operator. A smaller one for the shallow water equations and a bigger one for the atmospheric dynamics using 40 years of ERA5 data at the sample frequency of 6 hours.
At NCI, we made the environment NCI-ai-ml compatible to run this model, and prepared three notebooks to show both the training and the inference. We use the released code to train the time advance operator for the shallow water equations but modified the SFNO model so that it takes the full advantage of the resolution invariant property of the neural operators.
When training the weather forecasting model, we add some more variations. We borrow the idea of using prognostic, forcing and diagnostic variables from the ACE paper, in which the authors achieved stable autoregressive prediction for 10 years at 100km resolution using 100 years of simulation data. We also applied some training strategies to address the difference between variables, such as tendency variances and dynamic timescales, so that no variables are overfitted while some other variables are not even converged.
Use the Environment
$ module use /g/data/dk92/apps/Modules/modulefiles/ $ module load NCI-ai-ml/24.08 $ python3 >>> import torch_harmonics >>> torch_harmonics.__version__ '0.6.5' >>> import torch >>> torch.__version__ '2.3.1'
Run the tested notebook
- Go to ARE site: are.nci.org.au
- Fill out the JupyterLab request form
- Walltime (hours): 1
- Queue: dgxa100 or gpuvolta, see notes below.
- Compute Size: 1gpu
- Project: <xy01>
- Storage: gdata/dk92+scratch/<xy01>
- Click Advanced options and fill in the following fields
- Module directories: /g/data/dk92/apps/Modules/modulefiles/
- Modules: NCI-ai-ml/24.08 or NCI-ai-ml/23.05, see notes below
- Jobfs size: 10GB
- Launch the session to run the tested notebook
Note on the tested notebook
- copy the tested notebook from any/all of the following path to your own working directory. If your working directory is different from "/scratch/<xy01>", remember to change the storage directive in the JupyterLab request form.
- /g/data/dk92/notebooks/examples-aiml/sfno/shallow_water_model.ipynb
- /g/data/dk92/notebooks/examples-aiml/sfno/res_invar.shallow_water_model.ipynb
- /g/data/dk92/notebooks/examples-aiml/sfno/weather_era5.ipynb
- all notebooks are shipped with their own dataset: shallow_water_model.ipynb and res_invar.shallow_water_model.ipynb run the online dataset generator, and weather_era5.ipynb uses a derived dataset with the timestamp between 1 Jan. and 31 Mar. 2020 and a specific test dataset for autoregressive prediction in Dec. 2020.
- if to run weather_era5.ipynb, the JupyterLab session has to request a A100 GPU (job submitted to the
dgxa100
queue) as the minimum memory usage to train on the full resolution ERA5 is about 65GiB - if to run either of the shallow water model notebook, the JupyterLab session has to run under the older environment module `NCI-ai-ml/23.05`.
Resolution Invariance
At NCI, we trained a SFNO model for shallow water equations at the resolution (nlat, nlon) = (256, 512) and did the inference at six resolutions. It shows the area-weighted RMSE doesn't shift systematically as the inference resolution moving away from the training resolution, over all the auto-regressive prediction steps examined.
Improved Stability
At NCI, we replicate the training for shallow water equations and clearly see the stability improvement around polar regions.