Colorado spring temperature data

These data were originally part of the fields package’s COmonthlyMet dataset.

rm(list = ls())
load("CO-temp-data.RData")
ls()
[1] "coords"      "elev"        "pred.coords" "pred.elev"   "temp"       

Our goal is to create a complete prediction surface of minimum spring temperature with associated estimates of uncertainty.

We begin by loading the necessary packages.

library(spBayes)
library(MBA)
library(geoR)
library(raster)
library(leaflet)
library(sp)

Next, set up a leaflet basemap to help visualize the data and model output. We’ll make heavy use of the pipe operator %>% in the leaflet code to reduce clutter.

blue.red <- c("#2c7bb6", "#abd9e9", "#ffffbf", "#fdae61", "#d7191c")

base.map <- leaflet(width = "100%") %>%
    addProviderTiles("Stamen.Terrain", group = "Terrain") %>%
    addProviderTiles("Esri.WorldImagery", group = "Satellite") %>%
    addLayersControl(baseGroup = c("Terrain", "Satellite"), options = layersControlOptions(collapsed = FALSE))

Take a look at station locations and mean minimum spring temperatures across Colorado. This code below produces a clickable dynamic map.

pal <- colorNumeric(blue.red, domain = temp)

base.map %>%
    addCircleMarkers(lng = coords[, 1], lat = coords[, 2], col = pal(temp), stroke = FALSE,
        radius = 5, fillOpacity = 0.9, popup = paste("Mean min temp:", round(temp,
            1))) %>%
    addLegend("bottomright", pal = pal, values = temp, opacity = 0.9, title = "Temperature C")

Fit a non-spatial regression

Consider the non-spatial regression and a little exploratory data analysis (EDA).

lm.obj <- lm(temp ~ lon + lat, data = coords)
summary(lm.obj)

Call:
lm(formula = temp ~ lon + lat, data = coords)

Residuals:
    Min      1Q  Median      3Q     Max 
-9.1462 -2.1591  0.6497  1.9584  8.1021 

Coefficients:
            Estimate Std. Error t value Pr(>|t|)    
(Intercept) 79.46995   12.28571   6.468 6.84e-10 ***
lon          0.63528    0.09277   6.848 8.13e-11 ***
lat         -0.35593    0.15576  -2.285   0.0233 *  
---
Signif. codes:  0 '***' 0.001 '**' 0.01 '*' 0.05 '.' 0.1 ' ' 1

Residual standard error: 3.191 on 210 degrees of freedom
Multiple R-squared:  0.1873,    Adjusted R-squared:  0.1796 
F-statistic:  24.2 on 2 and 210 DF,  p-value: 3.478e-10

We’ll reproject the geographic coordinates (i.e., longitude and latitude) to a coordinate system that provides more familiar distance units and reduces spatial distortion. Here we selected Universal Transverse Mercator (UTM) coordinate system (with km distance units).

## Convert coords to spatial points and reproject.
coordinates(coords) <- ~lon + lat
proj4string(coords) <- "+proj=longlat +datum=WGS84"

coords.utm <- spTransform(coords, CRS("+proj=utm +zone=13 +ellps=WGS84 +datum=WGS84 +units=km +no_defs"))
coords.utm <- coordinates(coords.utm)

## Convert spatial points to matrix for later mapping.
coords <- coordinates(coords)

Next let’s take a look at the regression model residuals, assess their spatial independence, and start thinking about variogram and covariance parameters.

d.max <- max(iDist(coords.utm))
d.max
[1] 841.6322
v.temp <- variog(coords = coords.utm, data = temp, uvec = (seq(0, 0.75 * d.max, length = 20)))
variog: computing omnidirectional variogram
v.resid <- variog(coords = coords.utm, data = resid(lm.obj), uvec = (seq(0, 0.75 *
    d.max, length = 20)))
variog: computing omnidirectional variogram
par(mfrow = c(1, 2))
plot(v.temp, xlab = "Distance (km)")
plot(v.resid, xlab = "Distance (km)")

It is also very helpful to create an interpolated surface of the model residuals to further assess spatial structure and potentially identify missing covariates.

resid.surf <- mba.surf(cbind(coords, resid(lm.obj)), no.X = 200, no.Y = 200, extend = TRUE,
    sp = TRUE)$xyz.est

proj4string(resid.surf) <- "+proj=longlat +datum=WGS84"

resid.surf <- raster(resid.surf)

pal <- colorNumeric(blue.red, values(resid.surf), na.color = "transparent")

base.map %>%
    addRasterImage(resid.surf, colors = pal, opacity = 0.75, group = "Regression residuals") %>%
    addLegend("bottomright", pal = pal, values = values(resid.surf), opacity = 0.75,
        title = "<center>Regression<br> residuals</center>") %>%
    addLayersControl(baseGroup = c("Terrain", "Satellite"), overlayGroups = c("Regression residuals"),
        options = layersControlOptions(collapsed = FALSE))

Fit some spatial regression models

There’s substantial evidence of residual spatial structure, so let’s fit some models with spatially-structured random effects. First we’ll fit the bayesGeostatExact() model then the spLM() model both from the spBayes package.

n.samples <- 5000

n <- length(temp)
p <- 3  ##Intercept and spatial locations

phi <- 3/200
alpha <- 1/12  #tau^2/sigma^2

beta.prior.mean <- as.matrix(rep(0, times = p))
beta.prior.precision <- matrix(0, nrow = p, ncol = p)

## Assuming an inverse Gamma prior on sigma^2
sigma.sq.prior.shape <- 2
sigma.sq.prior.rate <- 1/12  ##1/scale

m.exact <- bayesGeostatExact(temp ~ coords, coords = coords.utm, beta.prior.mean = beta.prior.mean,
    beta.prior.precision = beta.prior.precision, phi = phi, alpha = alpha, sigma.sq.prior.shape = sigma.sq.prior.shape,
    sigma.sq.prior.rate = sigma.sq.prior.rate, n.samples = n.samples)
-------------------------------------------------
    General model description
-------------------------------------------------
Model fit with 213 observations.
Number of covariates 3 (including intercept if specified).
Using the exponential spatial correlation model.

-------------------------------------------------
        Sampling
-------------------------------------------------
Sampled: 5000 of 5000, 100%
-------------------------------------------------
    Recovering spatial effects
-------------------------------------------------
names(m.exact)
 [1] "X"                    "n"                    "p"                   
 [4] "Y"                    "n.samples"            "beta.prior.mean"     
 [7] "beta.prior.precision" "coords"               "cov.model"           
[10] "phi"                  "alpha"                "sigma.sq.prior.shape"
[13] "sigma.sq.prior.rate"  "verbose"              "p.samples"           
[16] "sp.effects"          
plot(m.exact$p.samples, density = FALSE)

Now let’s fit the full Bayesian model using spLM().

cov.model <- "exponential"

starting <- list(phi = 3/(0.5 * d.max), sigma.sq = 5, tau.sq = 5)

tuning <- list(phi = 0.5, sigma.sq = 0.05, tau.sq = 0.05)

priors <- list("beta.Flat", phi.Unif = c(3/d.max, 3/(0.01 * d.max)), sigma.sq.IG = c(2,
    5), tau.sq.IG = c(2, 5))

m.1 <- spLM(temp ~ coords, coords = coords.utm, starting = starting, tuning = tuning,
    priors = priors, cov.model = cov.model, n.samples = n.samples, n.report = 2000)
----------------------------------------
    General model description
----------------------------------------
Model fit with 213 observations.

Number of covariates 3 (including intercept if specified).

Using the exponential spatial correlation model.

Number of MCMC samples 5000.

Priors and hyperpriors:
    beta flat.
    sigma.sq IG hyperpriors shape=2.00000 and scale=5.00000
    tau.sq IG hyperpriors shape=2.00000 and scale=5.00000
    phi Unif hyperpriors a=0.00356 and b=0.35645
-------------------------------------------------
        Sampling
-------------------------------------------------
Sampled: 2000 of 5000, 40.00%
Report interval Metrop. Acceptance rate: 36.60%
Overall Metrop. Acceptance rate: 36.60%
-------------------------------------------------
Sampled: 4000 of 5000, 80.00%
Report interval Metrop. Acceptance rate: 35.20%
Overall Metrop. Acceptance rate: 35.90%
-------------------------------------------------
plot(m.1$p.theta.samples)

The spLM() function only updates the covariance parameters. The regression coefficients \(\beta\)’s and spatial random effects \(w\) are recovered via the spRecover() function that performs composition sampling using post burn-in samples.

burn.in <- floor(0.75 * n.samples)

m.1 <- spRecover(m.1, start = burn.in, thin = 2)
-------------------------------------------------
        Recovering beta and w
-------------------------------------------------
Sampled: 99 of 626, 15.81%
Sampled: 199 of 626, 31.79%
Sampled: 299 of 626, 47.76%
Sampled: 399 of 626, 63.74%
Sampled: 499 of 626, 79.71%
Sampled: 599 of 626, 95.69%
round(summary(m.1$p.theta.recover.samples)$quantiles[, c(3, 1, 5)], 3)
           50%  2.5%  97.5%
sigma.sq 9.719 5.545 15.584
tau.sq   1.101 0.682  1.775
phi      0.005 0.004  0.010
round(summary(m.1$p.beta.recover.samples)$quantiles[, c(3, 1, 5)], 3)
               50%    2.5%   97.5%
(Intercept) 45.498 -43.370 128.072
coordslon    0.249  -0.478   0.956
coordslat   -0.527  -1.669   0.643

Now let’s take a look at the estimates of the random effects compared with the non-spatial regression residuals.

quants <- function(x) {
    quantile(x, prob = c(0.5, 0.025, 0.975))
}

## Random effects from bayesGeostatExact()
w.exact.summary <- apply(m.exact$sp.effects, 1, quants)

w.exact.surf <- mba.surf(cbind(coords, w.exact.summary[1, ]), no.X = 200, no.Y = 200,
    extend = TRUE, sp = TRUE)$xyz.est
proj4string(w.exact.surf) <- "+proj=longlat +datum=WGS84"
w.exact.surf <- raster(w.exact.surf)

## Random effects from spLM()
w.m1.summary <- apply(m.1$p.w.recover.samples, 1, quants)

w.m1.surf <- mba.surf(cbind(coords, w.m1.summary[1, ]), no.X = 200, no.Y = 200, extend = TRUE,
    sp = TRUE)$xyz.est
proj4string(w.m1.surf) <- "+proj=longlat +datum=WGS84"
w.m1.surf <- raster(w.m1.surf)

## Make the color ramp
all.values <- c(values(w.exact.surf), values(w.m1.surf), values(resid.surf))
pal <- colorNumeric(blue.red, all.values, na.color = "transparent")

base.map %>%
    addRasterImage(w.exact.surf, colors = pal, opacity = 0.75, group = "bayesGeostatExact") %>%
    addRasterImage(w.m1.surf, colors = pal, opacity = 0.75, group = "spLM m.1") %>%
    addRasterImage(resid.surf, colors = pal, opacity = 0.75, group = "Regression residuals") %>%
    addLegend("bottomright", pal = pal, values = all.values, opacity = 0.75, title = "<center>Random effects &<br>Regression<br>residuals</center>") %>%
    addLayersControl(baseGroup = c("Terrain", "Satellite"), overlayGroups = c("bayesGeostatExact",
        "spLM m.1", "Regression residuals"), options = layersControlOptions(collapsed = FALSE)) %>%
    hideGroup(c("spLM m.1", "Regression residuals"))

Looks like the spatial random effects are picking up the effect of elevation so let’s just put it in the mean and refit the non-spatial and spLM() models.

lm.obj <- lm(temp ~ elev + lon + lat, data = data.frame(coords))
summary(lm.obj)

Call:
lm(formula = temp ~ elev + lon + lat, data = data.frame(coords))

Residuals:
    Min      1Q  Median      3Q     Max 
-4.6706 -0.8541 -0.1308  0.9088  3.5183 

Coefficients:
              Estimate Std. Error t value Pr(>|t|)    
(Intercept)  7.9372343  5.9584509   1.332    0.184    
elev        -0.0059590  0.0002029 -29.374  < 2e-16 ***
lon         -0.1982295  0.0499149  -3.971 9.83e-05 ***
lat         -0.4951096  0.0691068  -7.164 1.32e-11 ***
---
Signif. codes:  0 '***' 0.001 '**' 0.01 '*' 0.05 '.' 0.1 ' ' 1

Residual standard error: 1.412 on 209 degrees of freedom
Multiple R-squared:  0.8415,    Adjusted R-squared:  0.8393 
F-statistic:   370 on 3 and 209 DF,  p-value: < 2.2e-16
v <- variog(coords = coords.utm, data = resid(lm.obj), uvec = (seq(0, 0.75 * d.max,
    length = 20)))
variog: computing omnidirectional variogram
plot(v, xlab = "Distance (km)")

Now call spLM() with elevation as a covariate after adjusting the variance priors then recover the parameters via the spRecover() function.

priors$sigma.sq.IG <- c(2, 1)
priors$tau.sq.IG <- c(2, 1)

m.2 <- spLM(temp ~ elev + coords, coords = coords.utm, starting = starting, tuning = tuning,
    priors = priors, cov.model = cov.model, n.samples = n.samples, n.report = 2000)
----------------------------------------
    General model description
----------------------------------------
Model fit with 213 observations.

Number of covariates 4 (including intercept if specified).

Using the exponential spatial correlation model.

Number of MCMC samples 5000.

Priors and hyperpriors:
    beta flat.
    sigma.sq IG hyperpriors shape=2.00000 and scale=1.00000
    tau.sq IG hyperpriors shape=2.00000 and scale=1.00000
    phi Unif hyperpriors a=0.00356 and b=0.35645
-------------------------------------------------
        Sampling
-------------------------------------------------
Sampled: 2000 of 5000, 40.00%
Report interval Metrop. Acceptance rate: 38.45%
Overall Metrop. Acceptance rate: 38.45%
-------------------------------------------------
Sampled: 4000 of 5000, 80.00%
Report interval Metrop. Acceptance rate: 35.70%
Overall Metrop. Acceptance rate: 37.08%
-------------------------------------------------
m.2 <- spRecover(m.2, start = burn.in, thin = 2)
-------------------------------------------------
        Recovering beta and w
-------------------------------------------------
Sampled: 99 of 626, 15.81%
Sampled: 199 of 626, 31.79%
Sampled: 299 of 626, 47.76%
Sampled: 399 of 626, 63.74%
Sampled: 499 of 626, 79.71%
Sampled: 599 of 626, 95.69%
round(summary(m.2$p.theta.recover.samples)$quantiles[, c(3, 1, 5)], 2)
          50% 2.5% 97.5%
sigma.sq 1.18 0.69  2.50
tau.sq   0.99 0.60  1.31
phi      0.01 0.00  0.03
round(summary(m.2$p.beta.recover.samples)$quantiles[, c(3, 1, 5)], 2)
              50%   2.5% 97.5%
(Intercept) 10.04 -22.25 34.82
elev        -0.01  -0.01 -0.01
coordslon   -0.19  -0.48  0.05
coordslat   -0.54  -0.90 -0.14
w.2.summary <- apply(m.2$p.w.recover.samples, 1, quants)

Now let’s take a look at these new random effects compared with those from m.1.

w.m2.surf <- mba.surf(cbind(coords, w.2.summary[1, ]), no.X = 200, no.Y = 200, extend = TRUE,
    sp = TRUE)$xyz.est
proj4string(w.m2.surf) <- "+proj=longlat +datum=WGS84"
w.m2.surf <- raster(w.m2.surf)

pal.2 <- colorNumeric(blue.red, values(w.m2.surf), na.color = "transparent")

base.map %>%
    addRasterImage(w.m2.surf, colors = pal.2, opacity = 0.75, group = "spLM m.2") %>%
    addRasterImage(w.m1.surf, colors = pal, opacity = 0.75, group = "spLM m.1") %>%
    addRasterImage(resid.surf, colors = pal, opacity = 0.75, group = "Regression residuals") %>%
    addLegend("bottomleft", pal = pal.2, values = values(w.m2.surf), opacity = 0.75,
        title = "<center>Random effects (m.2)</center>") %>%
    addLegend("bottomright", pal = pal, values = all.values, opacity = 0.75, title = "<center>Random effects (m.1) &<br>Regression<br>residuals</center>") %>%
    addLayersControl(baseGroup = c("Terrain", "Satellite"), overlayGroups = c("spLM m.1",
        "spLM m.2", "Regression residuals"), options = layersControlOptions(collapsed = FALSE)) %>%
    hideGroup(c("spLM m.1", "Regression residuals"))

###Prediction

Recall, our interest is in creating a temperature surface with associated uncertainty. This is done by sampling from posterior predictive distributions at 17,545 new locations across Colorado. These prediction locations are held in pred.coords with spatially coinciding elevation in pred.elev. Sampling is done using the spPredict() function. Ideally, we would draw more posterior predictive samples, but in the interest of time we take only a small sample to generate the summary statistics and maps. Using more CPU would greatly increase prediction speed.

pred.X <- cbind(1, pred.elev, pred.coords)

m.2.pred <- spPredict(m.2, pred.covars = pred.X, pred.coords = pred.coords, start = burn.in,
    thin = 20, n.report = 10)
-------------------------------------------------
        Recovering beta
-------------------------------------------------
----------------------------------------
    General model description
----------------------------------------
Model fit with 213 observations.

Prediction at 17545 locations.

Number of covariates 4 (including intercept if specified).

Using the exponential spatial correlation model.

-------------------------------------------------
        Sampling
-------------------------------------------------
Sampled: 10 of 63, 14.29%
Sampled: 20 of 63, 30.16%
Sampled: 30 of 63, 46.03%
Sampled: 40 of 63, 61.90%
Sampled: 50 of 63, 77.78%
Sampled: 60 of 63, 93.65%
y.p.summary <- apply(m.2.pred$p.y.predictive.samples, 1, quants)

y.p.width <- y.p.summary[3, ] - y.p.summary[2, ]

The summaries of the posterior predictive distributions can be mapped or exported for use in a GIS system.

y.median.surf <- rasterFromXYZ(cbind(pred.coords, y.p.summary[1, ]))
proj4string(y.median.surf) <- "+proj=longlat +datum=WGS84"
pal.1 <- colorNumeric(blue.red, c(values(y.median.surf), temp), na.color = "transparent")

y.width.surf <- rasterFromXYZ(cbind(pred.coords, y.p.width))
proj4string(y.width.surf) <- "+proj=longlat +datum=WGS84"
pal.2 <- colorNumeric(blue.red, values(y.width.surf), na.color = "transparent")

base.map %>%
    addRasterImage(y.median.surf, colors = pal.1, opacity = 0.9, group = "Predicted temperature") %>%
    addRasterImage(y.width.surf, colors = pal.2, opacity = 0.9, group = "Prediction 95% CI width") %>%
    addCircleMarkers(lng = coords[, 1], lat = coords[, 2], col = pal.1(temp), fillOpacity = 0.9,
        stroke = FALSE, radius = 3, popup = paste("Mean min temp:", round(temp, 1)),
        group = "Stations") %>%
    addLegend("bottomleft", pal = pal.1, values = values(y.median.surf), opacity = 0.9,
        title = "<center>Predicted<br>temperature C</center>") %>%
    addLegend("bottomright", pal = pal.2, values = values(y.width.surf), opacity = 0.9,
        title = "<center>Prediction<br>95% CI width</center>") %>%
    addLayersControl(baseGroup = c("Terrain", "Satellite"), overlayGroups = c("Predicted temperature",
        "Prediction 95% CI width", "Stations"), options = layersControlOptions(collapsed = FALSE)) %>%
    hideGroup(c("Prediction 95% CI width", "Stations"))

Model comparison

Here we take a very brief look at some of the model assessment tools in spBayes. Specifically, we compare the non-spatial and spatial regression models using the deviance information criterion (DIC), where lower values of DIC suggest improved fit.

lm.ref <- bayesLMRef(lm.obj, 500)

spDiag(lm.ref)$DIC
-------------------------------------------------
        Calculating scores
-------------------------------------------------
                value
bar.D       361.13150
D.bar.Omega 356.09917
pD            5.03233
DIC         366.16383
spDiag(m.2)$DIC
-------------------------------------------------
        Calculating scores
-------------------------------------------------
                value
bar.D       204.98381
D.bar.Omega 130.20929
pD           74.77452
DIC         279.75833