I've never studied weather forecasting, but I can't say I'm surprised. All of these models, AFAICT, are based on the "state" of the weather, but "state" deserves massive scare quotes: it's a bunch of 2D fields (wind speed, pressure, etc) -- note the <i>2D</i>. Actual weather dynamics happen in three dimensions, and three dimensional land features, buildings, etc as well as gnarly 2D surface phenomena (ocean surface temperature, ground surface temperature, etc) surely have strong effects.<p>On top of this, surely the actual observations that feed into the model are terrible -- they come from weather stations, sounding rockets, balloons, radar, etc, none of which seem likely to be especially accurate in all locations. Except that, where a weather station exists, the output of that station <i>is</i> the observation that people care about -- unless you're in an airplane, you don't personally care about the geopotential, but you do care about how windy it is, what the temperature and humidity are, and how much precipitation there is.<p>ISTM these dynamics ought to be better captured by learning them from actual observations than from trying to map physics both ways onto the rather limited datasets that are available. And a trained model could also learn about the idiosyncrasies of the observation and the extra bits of forcing (buildings, etc) that simply are not captured by the inputs.<p>(Heck, my personal in-my-head neural network can learn a mapping from NWS forecasts to NWS observations later in the same day that seems better than what the NWS itself produces. Surely someone could train a very simple model that takes NWS forecasts as inputs and produces its estimates of NWS observations during the forecast period as outputs, thus handling things like "the NWS consistently underestimates the daily high temperature at such-and-such location during a summer heat wave.")