Introduction

This is the NCI implementation of the Artificial Intelligence Forecasting System (AIFS) model from the European Centre for Medium-Range Weather Forecasts (ECMWF).
It is a new AI-based data-driven weather forecasting model that uses a graph neural network and a sliding window transformer for prediction.
The model is first trained on the ERA5 reanalysis data, then fine-tuned with ECMWF’s operational numerical weather prediction (NWP) data.
ECMWF has analysed the results against their NWP and direct observational data and concluded that AIFS forecasts are high-quality for both the pressure-level and surface-level variables.

Model and Dataset

The AIFS is based on the graph neural network (GNN), which works in three stages: encoding, decoding, and processing, and it uses a sliding window transformer as the processor. The encoder state aggregates information from the input data grids and sends it to the processor. To reduce the computational overhead, a cutoff radius limits the number of adjacent gird points from which the data is collected. In the processing layer, a shifted window transformer is used to learn the similarities between the grid points; the processor is based on an O96 octahedral reduced Gaussian grid. There are 16 processor layers in the model itself.

The input and output data are based on ERA5 spatial grid resolution internally using N320 reduced Gaussian grid for processing. The Gaussian grid has the added advantage of having a more uniform resolution globally compared to other formats. Another benefit is that N320 has fewer grid points compared to similar resolution longitude-latitude grids and reduces the number of grid points and edges for both encoder and decoder, in turn reducing the computational cost. The N320 reduced Gaussian grid consists of 542,080 points compared to 1,038,240 grid points for a 0.25-degree longitude-latitude grid.

More details of the model can be found in the following 2024 paper: https://arxiv.org/pdf/2406.01465.

Use of A100 GPU

To reduce GPU memory footprint, the AIFS uses flash-attention, which has a linear memory requirement, and the flash-attention requires A100 or higher level GPUs.
The instructions required to run the flash-attention mechanism are not present in previous GPU architecture like the V100. Therefore, an A100 GPU is required to run the inference server.
It is possible to replace flash attention with a simpler (dot-product-based) attention mechanism; however, in this case, the memory consumption will increase quadratically.
Given that flash attention can already consume upward of 55GB of GPU memory, simpler/previous attention operations may introduce out-of-memory error and/or other complications.

Training

The AIFS has been trained with ERA5 data 6 hours apart, hence capable of forecasting 6 hours ahead with respect to the input data point. The model is pre-trained with ERA5 data from 1979 to 2020 and then fine-tuned with IFS operational analysis data from 2019 to 2020. The area-weighted mean squared error (MSE) is used as the objective function. The full list of input and output variables is shown in the table below.

Field	Level type	Input/Output
Geopotential, horizontal and vertical wind components, specific humidity, temperature	Pressure level: 50,100, 150, 200, 250,300, 400, 500, 600,700, 850, 925, 1000	Both
Surface pressure, mean sea-level pressure, skin temperature, 2 m temperature, 2 m dew point temperature, 10 m horizontal wind components, total column water	Surface	Both
Soil moisture and soil temperature (layers 1 & 2)	Surface	Both
100m horizontal wind components, solar radiation (Surface short-wave (solar) radiation downwards and Surface long-wave (thermal) radiation downwards), cloud variables (tcc, hcc, mcc, lcc), runoff and snow fall	Surface	Output
Total precipitation, convective precipitation	Surface	Output
Land-sea mask, orography, standard deviation of sub-grid orography, slope of sub-scale orography, insolation, latitude/longitude, time of day/day of year	Surface	Input

Training takes about one week on 64 A100 GPU. Due to the scarcity of resources, training can not be done in NCI at this moment.

Web Inference Server (using Voila)

The NCI implementation of the AIFS Single v1.0 model runs on the Voila Web App.
It is a complete app that requires no input from the user. One can perform all the actions related to inference by just selecting items and clicking the buttons.

To launch the app, one needs to perform the following two steps.

Login to the ARE and go to the URL: https://are.nci.org.au/pun/sys/dashboard/batch_connect/sys/voila/
Then, fill out the web form with the information below (do not leave any space) and launch the app.

Path to notebook: /g/data/dk92/data/aifs-single/NCI_AIFS_v1.ipynb
Walltime (hours): 1 
Queue: dgxa100
Compute Size: 1gpu
Project: Your project code
Storage: gdata/dk92+gdata/rt52
Module directories: /g/data/dk92/apps/Modules/modulefiles/
Modules: aifs-single-v1/2025.05.16

Jobfs size: 380GB

That's it. Now, wait for the app to start on Gadi (as shown below), and then click the blue button to open the Voila session on a web browser.
Afterwards, follow the instructions in the section below.

Interface

Once the blue button is clicked, you will see the following interface. You can select a date and time to run the inference; no manual input is required from the user.
The interface is divided into steps, each containing a short description and a button. Each step performs a different function. Read the descriptions and follow the steps.
There is an output window below the panes that will show the results of each step.

Output Visualization:

Once you run the first four steps, then you are ready for output visualisation, like the examples given below.
Select a datetime, variable, or label from the drop-down list, and the corresponding graph will be rendered.
There are three visualisation moods available, which can be selected with the radio button at the bottom left corner of the above interface.

Gaussian grid

The first visualization is for the Gaussian Grid data points, which is the direct output of the model.

Example 1: Prediction, ground truth, and the difference between the two in Gaussian grid format.

Latitude-Longitude spatial grid

It is also possible to visualize the data in Latitude-Longitude spatial grid format. For this, select the second radio button in the visualization mode pane of the interface. An example is shown below.

Example 2: Lat/Lon format visualization of prediction, ground truth, and the difference between the two.

RMSE Metric

The Root Mean Squared Error (RMSE) is one of the well-known performance indicators for a prediction model. It can be also selected from the visualization mode pane.

Example 3: One can also see statistical information like the Root Mean Squared Error (RMSE) among the predictions and ground truths by selecting the RMSE radio button.

ECMWF ecChart

The ecCharts are the web-based visualisation system developed by ECMWF; it is a widely used tool for exploring forecasting products. The front end uses JQuery & AJAX-based web applications; while the Django web framework is used by the backend. In NCI, the functionality of the ecCharts is replicated on JupyterLab. It is possible to render a visualisation by selecting a prediction date and area. All the visualisations are rendered on demand, meaning each visualisation is recreated each time one changes the prediction date or area. One must be registered and logged in to see the ECMWF ecCharts. However, at the NCI the charts can be freely accessed without any restriction.

Example 4: The ecCharts visualisation of 2-meter dew-point temperature in the Australasia region.

Contact

For further information, please get in touch with the NCI Data and Software Modernization Team (or email: maruf.ahmed@anu.edu.au)

Page tree

NCI-AIFS Single v1.0 : Inference Server (New AI model from ECMWF)