Course description

Bayesian hierarchical models have been widely deployed for analyzing spatial and spatio-temporal datasets commonly encountered in forestry, ecology, agriculture, and climate sciences. However, with rapid development of remote sensing and environmental monitoring systems, statisticians and data analysts frequently encounter massive spatial and spatio-temporal data that cannot be analyzed using traditional approaches due to their heavy computing demands. In this course, we will present scalable Bayesian models and related estimation methods that provide fast analysis of big spatial and spatio-temporal data using modest computing resources and standard statistical software environments such as R. We will begin with an introduction to the common types of geo-referenced spatial data, then survey software packages for exploratory and subsequent statistical analysis. We will briefly cover exploratory data analysis techniques like variogram fitting, basics of geo-statistical approaches like kriging, and Gaussian Processes. We will then highlight key computational issues experienced by Gaussian Process models when confronted with large datasets. In this context, we will introduce scalable Bayesian models that can deliver fully model-based inference for massive spatial data. This discussion will focus on the Nearest Neighbor Gaussian Process (NNGP) that yields computational gains while providing rich Bayesian inference for analyzing large univariate and multivariate spatial data. We will also present a comparative assessment of other related methods and strategies for large spatial data including low-rank models. We will demonstrate practical implementation of these models using newly developed spNNGP and spOccupancy R packages. All topics will be motivated using real data and participants will be encouraged to follow along with the analyses on their own laptops. Motivating data will come from forestry, agriculture, and wildlife monitoring applications. The workshop will close with a short focused session on occupancy modeling to assess wildlife species distributions while explicitly accounting for measurement errors common in detection-nondetection data. We will not assume any significant previous exposure to spatial or spatio-temporal methods or Bayesian inference, although participants with basic knowledge of these areas will experience a gentler learning curve.


Prior to the course getting started

This course offers lecture, discussion, and hands-on exercises on topics about efficient computing for spatial data models. We encourage you to work along with us on the exercises. To participate fully in the exercises, you’ll need a recent version of R (\(\geqslant\) 4.2).

Installing spBayes, spNNGP, and spOccupancy

We will use the spBayes, spNNGP and spOccupancy packages to fit spatial models, which can be installed from CRAN using install.packages(c('spBayes', 'spNNGP', 'spOccupancy'))

Installing additional R packages

In the exercises, we will use additional R packages for exploratory data analysis and visualizations. To fully participate in the exercises, we encourage you to install the packages below if you do not have them. The code below can be run in R to only install those packages that don’t currently exist on your system.

required.packages <- c('coda', 'MCMCvis', 'ggplot2', 'pals', 'sf', 'maps', 'stars', 
                       'MBA', 'geoR', 'raster', 'leaflet', 'sp', 'fields', 'classInt')
new.packages <- required.packages[!(required.packages %in% installed.packages()[, 'Package'])]
if (length(new.packages) > 0) {
  install.packages(new.packages)
}

Course schedule (download full zip on the course Github page)

Full PDF of all course slides