Artificial intelligence weather models can now rival traditional forecasts on many everyday tests. A new study reveals that when the weather breaks records, a classic physics-based system still gives more trustworthy predictions.

Researchers examined global weather data from recent decades to isolate hundreds of thousands of local record breaking heat, cold, and wind events. 


EarthSnap

The team compared how several AI systems and a European high resolution forecast model handled those rare extremes up to ten days ahead.

Why extreme weather is hard to predict

The work was led by Zhongwei Zhang, a statistician at the University of Geneva (UG) who focuses on rare climate events. His team uses advanced statistics to understand risk and uncertainty in weather and climate records.

Scientists call these record-breaking extremes, weather events that set a new all time high or low at a given place and season. They matter because communities often plan around past records, so events far beyond those levels can cause outsized damage.

Modern forecasts still rely on numerical weather prediction, computer models that simulate how air, moisture, and energy move through the atmosphere using physical equations. 

These models demand large computing power but return detailed maps of temperature, wind, pressure, and rainfall for several days into the future.

Most tests simply check whether a model can mimic conditions it has already seen in its training data.

The harder task, called extrapolation, is making reliable predictions when the atmosphere shifts into combinations of heat, cold, or wind outside that past experience.

Putting AI to the test

To study extrapolation, the team built a benchmark using reanalysis, a long running data set that blends observations with one consistent model.

They searched data from 1979 to 2020 for each time local temperature or wind speed jumped past the previous record for a grid cell.

For 2020 alone, they found 162,751 heat records, 32,991 cold records, and 53,345 wind records across land worldwide. Because records were counted at each grid point and season, they could test models on rare extremes instead of relying on famous events.

The comparison lined up several AI systems against the European Centre for Medium Range Weather Forecasts (ECMWF) High Resolution Forecast (HRES). That global model represents the atmosphere on a fine grid and moves forward in time using physical laws.

For each record, the team checked forecasts from lead times between half a day and ten days and calculated errors for temperature and wind.

Forecast accuracy was summarized with root mean square error, a statistic that grows as predictions stray farther from observed values.

How AI handles the extremes

When Zhang’s team looked at record breaking heat, cold, and wind, the traditional forecast model beat every examined AI system at most lead times.

That advantage was largest for short lead times of a few days and shrank somewhat for forecasts that reached a week or more ahead.

For heat records, the AI systems tended to predict temperatures that were too low compared with what actually happened.

For cold records, they often made the chill sharper than it was, and these errors grew as new records moved above the previous ones.

Forecasts of record-breaking weather

The AI systems also underestimated how often record-breaking events occurred, missing many records that were present in the observational data set.

In contrast, the numerical model captured about the right number of records for heat, cold, and wind, and better matched where records actually occurred.

These patterns suggest an out-of-distribution issue. The model performs reliably under familiar conditions, but its accuracy drops when inputs fall outside the range it was trained on.

Because these systems learn from past maps instead of explicit physical laws, they may place a limit on how intense rare events appear.

Why classic models still matter

One broad overview notes that machine learning has already reshaped modern weather forecasting and delivers strong results for many routine tasks.

At the same time, many researchers caution that rare extremes often remain a weak spot for purely data driven systems.

In Zhang’s work, the physics based model kept accuracy even as records grew larger, showing the value of equations at a forecast system’s core.

Those equations help enforce balances of mass, energy, and momentum, letting the model explore unusual combinations of conditions beyond what has already happened.

This difference really matters most for high impact decisions. For heat planning, power grid reliability, or storm surge preparation, relying only on AI forecasts when records might fall risks underestimating dangerous extremes.

The future of AI and extreme weather

Other evaluations of recent disasters find that AI and traditional models trade wins depending on the event, including some heat waves and winter storms.

One assessment examined several such cases and still saw important limitations for AI when impacts were measured directly.

There are several clear paths to make AI weather tools safer for extremes. One option is to train them on simulated data from physical climate models, so they see more examples of rare events than history provides.

Another approach blends AI with traditional forecasting by replacing parts of a model with learned components but keeping the system tied to physical laws. 

Such hybrid designs aim to keep the speed and flexibility of machine learning while preserving the ability to handle events outside the training range.

For now, one strategy is to run AI forecasts and numerical models side by side when record breaking heat, cold, or wind is possible.

This gives forecasters global guidance from AI and physically grounded checks from traditional models before issuing warnings that affect millions of people.

The study is published in arXiv.

—–

Like what you read? Subscribe to our newsletter for engaging articles, exclusive content, and the latest updates. 

Check us out on EarthSnap, a free app brought to you by Eric Ralls and Earth.com.

—–