A Sequential Monte Carlo Method for Bayesian Analysis of Massive Datasets Greg Ridgeway We present a method for making Bayesian analysis of massive datasets computationally feasible. The algorithm simulates from a posterior distribution that conditions on a smaller, more manageable portion of the dataset. The remainder of the dataset may be incorporated by reweighting the initial draws using importance sampling. Computation of the importance weights requires a single scan of the remaining observations. While importance sampling increases efficiency in data access, it comes at the expense of estimation efficiency. A simple modification, based on the rejuvenation step used in particle filters for dynamic systems models, sidesteps the loss of efficiency with only a slight increase in the number of data accesses. Examples demonstrating proof-of-concept show more than a 98% reduction in the number of data accesses over the standard implementation of the Metropolis-Hastings algorithm. |
|
|