SIAM News Blog
Research

Deep Learning-Augmented Data Assimilation for Next-Generation Predictive Models

Direct observations of real-world systems (e.g., from sensors or satellites) are often noisy and sparse, and the dynamical models that represent these systems are often imperfect (e.g., due to missing physics). Data assimilation (DA) provides a framework that combines these observations and model predictions to produce better estimates of the system. It is thus an essential component of predictive modeling across engineering and the physical sciences, from weather forecasting to cardiovascular simulation [1, 5]. However, DA for high-dimensional, nonlinear systems is challenging; for instance, the widely-used ensemble Kalman filter (EnKF) requires many forward integrations of a computationally expensive dynamical model to produce an ensemble of background forecasts for estimating covariance matrices [4]. Here we leverage the power of deep learning-based, data-driven surrogates that efficiently and accurately produce covariance matrices to enhance ensemble-based DA methods—such as EnKF—for the next generation of predictive models. 

Challenges of DA in High-Dimensional Systems 

A popular choice for DA in high-dimensional, nonlinear systems is ensemble-based methods like EnKF [4]. Unlike variational DA algorithms, ensemble-based methods do not require the dynamical model’s adjoint. And since the EnKF estimates the background covariance matrix from an ensemble of forward integrations, the covariance can evolve in time with the system.

However, challenges arise when the number of ensemble members is much smaller than the system’s dimension. This is typically the case for many applications—including weather forecasting—in which the affordable ensemble size may be \(O(10)\) and the dimension may be \(O(10^7)\). Consequently, the estimated covariance matrix is often highly rank-deficient and might exhibit spurious long-range correlations that arise from sampling error. This outcome necessitates the use of remedies such as localization, during which one explicitly removes long-range correlations [1]. But since localization can also suppress real long-range correlations [6], alternative approaches are needed. We introduce one such approach in the next section.

<strong>Figure 1.</strong> The hybrid ensemble Kalman filter (H-EnKF) framework. <strong>1a.</strong> Combining the data-driven surrogate generating \(n_D\) ensemble members and the numerical model generating \(n_N\) ensemble members enhances the accuracy of EnKF. <strong>1b.</strong> Data-driven U-NET is an example surrogate model for short-term (one-day) forecasting. Here, \(\alpha=200\Delta t \approx 1\) day, where \(\Delta t\) is the time step of the numerical solver. Figure courtesy of Ashesh Chattopadhyay.
Figure 1. The hybrid ensemble Kalman filter (H-EnKF) framework. 1a. Combining the data-driven surrogate generating \(n_D\) ensemble members and the numerical model generating \(n_N\) ensemble members enhances the accuracy of EnKF. 1b. Data-driven U-NET is an example surrogate model for short-term (one-day) forecasting. Here, \(\alpha=200\Delta t \approx 1\) day, where \(\Delta t\) is the time step of the numerical solver. Figure courtesy of Ashesh Chattopadhyay.

Deep Learning-Augmented Hybrid Ensemble Kalman Filter

Fully data-driven forecasting models that are based on deep learning techniques have shown promising results at short timescales [2, 7]. For example, these kinds of predictions in weather forecasting are accurate up to a few days, though they are not yet competitive with numerical dynamical models at longer timescales [8]. We propose to leverage the short-term accuracy of data-driven, deep learning-based surrogates to efficiently generate a very large number of ensemble members [3] to estimate an evolving, accurate covariance matrix. In this hybrid ensemble Kalman filter (H-EnKF) approach, we train a data-driven surrogate for one-day predictions with a deep convolutional network (U-NET); we then utilize the surrogate to produce a large ensemble with \(n_D\) members (see Figure 1). This large ensemble helps to estimate the covariance matrix \(\boldsymbol{P}_D\) and the Kalman gain \(\boldsymbol{K}_D\). A smaller number of ensemble members (\(n_N \ll n_D\))—generated from the expensive, numerically integrated dynamical model—estimates the background forecast \(X^k_N\). This hybrid use of deep learning-based surrogates to approximate the Kalman gain and the dynamical model’s numerical solution to produce the background forecast leads to accurate, stable predictions and DA cycles. Note that conventional EnKF computes the covariance matrix and Kalman gain via the small number of ensemble members that are obtained from numerical integration of the dynamical model. This use of the surrogate to also compute the background forecast leads to inaccurate predictions in the long term. 

We tested H-EnKF on a two-layer quasi-geostrophic (QG) test case that consists of a set of nonlinear partial differential equations that represent turbulence in the atmosphere and ocean. Here we use \(n_D \approx O(10^3)\)—comparable to the system’s dimension—and \(n_N \approx O(10)\). The large \(n_D\) is affordable because of the low computational cost of the trained data-driven surrogate (cost of \(1 \: n_N=200 \: n_D\)). From this ensemble of forecasts, we estimate the background covariance matrix \(\boldsymbol{P_D}\) (equation (1) in Figure 1) and the Kalman gain \(\boldsymbol{K_D}\) (equation (2) in Figure 1). Due to the large ensembles, this covariance matrix is less susceptible to spurious correlations and does not require localization or any other modification. Next, we use the \(n_N\) ensembles from numerical integrations of the QG equations to obtain the background forecast. Finally, we compute an analysis state \(\boldsymbol{\overline{X}_a}\) that serves as the improved initial condition for forecasting by the dynamical model. For the same computational cost, H-EnKF is accurate and stable while EnKF suffers from filter divergence in the absence of localization (see Figure 2).

<strong>Figure 2.</strong> Performance of hybrid ensemble Kalman filter (H-EnKF) over 60 data assimilation (DA) cycles. <strong>2a.</strong> Root-mean-square error (RMSE) and anomaly correlation coefficient (ACC) over 60 DA cycles for H-EnKF \((n_D=2000,\) \(n_N=10)\), H-EnKF \((n_D=1000,\) \(n_N=10)\), and EnKF \((n_D=0,\) \(n_N=20)\). <strong>2b.</strong> Same as 2a but for the ACC metric. <strong>2c.</strong> Magnified view of 2a between \(0 \le \textit{RMSE} \le 0.15\). <strong>2d.</strong> Magnified view of 2b between \(0.9 \le \textit{ACC} \le 1.0\). Shading shows standard deviation over 30 initial conditions. H-EnKF is accurate and stable while EnKF is not. The computational costs of H-EnKF \((n_D=2000,\) \(n_N=10)\) and EnKF \((n_D=0,\) \(n_N=20)\) are the same, while that of H-EnKF \((n_D=1000,\) \(n_N=10)\) is 1/10<sup>th</sup> of EnKF \((n_D=0,\) \(n_N=20)\). Figure courtesy of Ashesh Chattopadhyay.
Figure 2. Performance of hybrid ensemble Kalman filter (H-EnKF) over 60 data assimilation (DA) cycles. 2a. Root-mean-square error (RMSE) and anomaly correlation coefficient (ACC) over 60 DA cycles for H-EnKF \((n_D=2000,\) \(n_N=10)\), H-EnKF \((n_D=1000,\) \(n_N=10)\), and EnKF \((n_D=0,\) \(n_N=20)\). 2b. Same as 2a but for the ACC metric. 2c. Magnified view of 2a between \(0 \le \textit{RMSE} \le 0.15\). 2d. Magnified view of 2b between \(0.9 \le \textit{ACC} \le 1.0\). Shading shows standard deviation over 30 initial conditions. H-EnKF is accurate and stable while EnKF is not. The computational costs of H-EnKF \((n_D=2000,\) \(n_N=10)\) and EnKF \((n_D=0,\) \(n_N=20)\) are the same, while that of H-EnKF \((n_D=1000,\) \(n_N=10)\) is 1/10th of EnKF \((n_D=0,\) \(n_N=20)\). Figure courtesy of Ashesh Chattopadhyay.

Although we have limited our discussion of H-EnKF to the context of weather forecasting, this hybrid approach can be useful in a broad range of applications in engineering and natural systems, thus making the implementation of ensemble-based DA affordable and straightforward. The method is nonintrusive in that no modification of the numerical model is necessary; the main component to build is a data-driven surrogate model (an emulator, also known as a digital twin) that ultimately performs short-term forecasts with reasonable accuracy.

Ashesh Chattopadhyay presented this research during a minisymposium at the 2021 SIAM Annual Meeting, which took place virtually in July 2021.

References

[1] Carrassi, A., Bocquet, M. Bertino, L., & Evensen, G. (2018). Data assimilation in the geosciences: An overview of methods, issues, and perspectives. WIREs Clim. Change, 9(5), e535. 
[2] Chattopadhyay, A., Hassanzadeh, P., & Subramanian, D. (2020). Data-driven predictions of a multiscale Lorenz 96 chaotic system using machine-learning methods: Reservoir computing, artificial neural network, and long short-term memory network. Nonlin. Proc. Geophys., 27(3), 373-389. 
[3] Chattopadhyay, A., Mustafa, M., Hassanzadeh, P., Bach, E., & Kashinath, K. (2021). Towards physically consistent data-driven weather forecasting: Integrating data assimilation with equivariance-preserving spatial transformers in a case study with ERA5. Geoscien. Model Develop. Under review. 
[4] Evensen, G. (1994). Sequential data assimilation with a nonlinear quasi-geostrophic model using Monte Carlo methods to forecast error statistics. J. Geophys. Res.: Oceans, 99(C5), 10143-10162. 
[5] Habibi, M., D’Souza, R.M., Dawson, S.T., & Arzani, A. (2021). Integrating multi-fidelity blood flow data with reduced-order data assimilation. Comput. Biol. Med., 135, 104566.
[6] Miyoshi, T., Kondo, K., & Imamura, T. (2014). The 10,240‐member ensemble Kalman filtering with an intermediate AGCM. Geophys. Res. Lett., 41(14), 5264-5271.
[7] Pathak, J., Hunt, B., Girvan, M., Lu, Z., & Ott, E. (2018). Model-free prediction of large spatiotemporally chaotic systems from data: A reservoir computing approach. Phys. Rev. Lett., 120(2), 024102.
[8] Weyn, J.A., Durran, D.R., & Caruana, R. (2020). Improving data‐driven global weather prediction using deep convolutional neural networks on a cubed sphere. J. Advan. Model. Earth Syst., 12(9), e2020MS002109. 

About the Authors