On the Stochasticity of Reanalysis Outputs of 4D-Var
This work is motivated by the ECMWF CAMS reanalysis data, a valuable resource for researchers in environmental-related areas, as they contain the most updated atmospheric composition information on a global scale. Unlike observational data obtained from monitoring equipment, such reanalysis data are produced by computers via a 4D-Var data assimilation mechanism, thus their stochastic property remains largely unclear. Such lack of knowledge in turn limits their utility scope and hinders them from wider and more flexible statistical usages, especially spatio-temporal modelling except for uncertainty quantification and data fusion. Therefore, this paper studies the stochastic property of these reanalysis outputs data. We used measure theory and proved the tangible existence of spatial and temporal stochasticity associated with these reanalysis data and revealed that they are essentially realisations from digitised versions of real-world hidden spatial and/or temporal stochastic processes. This means we can treat the reanalysis outputs data the same as observational data in practice and thus ensures more flexible spatio-temporal stochastic methodologies apply to them. We also objectively analysed different types of errors in the reanalysis data and deciphered their mutual dependence/independence, which together give clear and definite guidance on the modelling of error terms. The results of this study also serve as a solid stepping stone for spatio-temporal modellers and environmental AI researchers to embark on their research directly with these reanalysis outputs data using stochastic models.
READ FULL TEXT