These data were originally part of the fields
package’s
COmonthlyMet
dataset.
rm(list = ls())
load("CO-temp-data.RData")
ls()
[1] "coords" "elev" "pred.coords" "pred.elev" "temp"
Our goal is to create a complete prediction surface of minimum spring temperature with associated estimates of uncertainty.
We begin by loading the necessary packages.
library(spBayes)
library(MBA)
library(geoR)
library(raster)
library(leaflet)
library(sp)
Next, set up a leaflet
basemap to help visualize the
data and model output. We’ll make heavy use of the pipe operator
%>%
in the leaflet code to reduce clutter.
blue.red <- c("#2c7bb6", "#abd9e9", "#ffffbf", "#fdae61", "#d7191c")
base.map <- leaflet(width = "100%") %>%
addProviderTiles("Stamen.Terrain", group = "Terrain") %>%
addProviderTiles("Esri.WorldImagery", group = "Satellite") %>%
addLayersControl(baseGroup = c("Terrain", "Satellite"), options = layersControlOptions(collapsed = FALSE))
Take a look at station locations and mean minimum spring temperatures across Colorado. This code below produces a clickable dynamic map.
pal <- colorNumeric(blue.red, domain = temp)
base.map %>%
addCircleMarkers(lng = coords[, 1], lat = coords[, 2], col = pal(temp), stroke = FALSE,
radius = 5, fillOpacity = 0.9, popup = paste("Mean min temp:", round(temp,
1))) %>%
addLegend("bottomright", pal = pal, values = temp, opacity = 0.9, title = "Temperature C")
Consider the non-spatial regression and a little exploratory data analysis (EDA).
lm.obj <- lm(temp ~ lon + lat, data = coords)
summary(lm.obj)
Call:
lm(formula = temp ~ lon + lat, data = coords)
Residuals:
Min 1Q Median 3Q Max
-9.1462 -2.1591 0.6497 1.9584 8.1021
Coefficients:
Estimate Std. Error t value Pr(>|t|)
(Intercept) 79.46995 12.28571 6.468 6.84e-10 ***
lon 0.63528 0.09277 6.848 8.13e-11 ***
lat -0.35593 0.15576 -2.285 0.0233 *
---
Signif. codes: 0 '***' 0.001 '**' 0.01 '*' 0.05 '.' 0.1 ' ' 1
Residual standard error: 3.191 on 210 degrees of freedom
Multiple R-squared: 0.1873, Adjusted R-squared: 0.1796
F-statistic: 24.2 on 2 and 210 DF, p-value: 3.478e-10
We’ll reproject the geographic coordinates (i.e., longitude and latitude) to a coordinate system that provides more familiar distance units and reduces spatial distortion. Here we selected Universal Transverse Mercator (UTM) coordinate system (with km distance units).
## Convert coords to spatial points and reproject.
coordinates(coords) <- ~lon + lat
proj4string(coords) <- "+proj=longlat +datum=WGS84"
coords.utm <- spTransform(coords, CRS("+proj=utm +zone=13 +ellps=WGS84 +datum=WGS84 +units=km +no_defs"))
coords.utm <- coordinates(coords.utm)
## Convert spatial points to matrix for later mapping.
coords <- coordinates(coords)
Next let’s take a look at the regression model residuals, assess their spatial independence, and start thinking about variogram and covariance parameters.
d.max <- max(iDist(coords.utm))
d.max
[1] 841.6322
v.temp <- variog(coords = coords.utm, data = temp, uvec = (seq(0, 0.75 * d.max, length = 20)))
variog: computing omnidirectional variogram
v.resid <- variog(coords = coords.utm, data = resid(lm.obj), uvec = (seq(0, 0.75 *
d.max, length = 20)))
variog: computing omnidirectional variogram
par(mfrow = c(1, 2))
plot(v.temp, xlab = "Distance (km)")
plot(v.resid, xlab = "Distance (km)")
It is also very helpful to create an interpolated surface of the model residuals to further assess spatial structure and potentially identify missing covariates.
resid.surf <- mba.surf(cbind(coords, resid(lm.obj)), no.X = 200, no.Y = 200, extend = TRUE,
sp = TRUE)$xyz.est
proj4string(resid.surf) <- "+proj=longlat +datum=WGS84"
resid.surf <- raster(resid.surf)
pal <- colorNumeric(blue.red, values(resid.surf), na.color = "transparent")
base.map %>%
addRasterImage(resid.surf, colors = pal, opacity = 0.75, group = "Regression residuals") %>%
addLegend("bottomright", pal = pal, values = values(resid.surf), opacity = 0.75,
title = "<center>Regression<br> residuals</center>") %>%
addLayersControl(baseGroup = c("Terrain", "Satellite"), overlayGroups = c("Regression residuals"),
options = layersControlOptions(collapsed = FALSE))
There’s substantial evidence of residual spatial structure, so let’s
fit some models with spatially-structured random effects. First we’ll
fit the bayesGeostatExact()
model then the
spLM()
model both from the spBayes
package.
n.samples <- 5000
n <- length(temp)
p <- 3 ##Intercept and spatial locations
phi <- 3/200
alpha <- 1/12 #tau^2/sigma^2
beta.prior.mean <- as.matrix(rep(0, times = p))
beta.prior.precision <- matrix(0, nrow = p, ncol = p)
## Assuming an inverse Gamma prior on sigma^2
sigma.sq.prior.shape <- 2
sigma.sq.prior.rate <- 1/12 ##1/scale
m.exact <- bayesGeostatExact(temp ~ coords, coords = coords.utm, beta.prior.mean = beta.prior.mean,
beta.prior.precision = beta.prior.precision, phi = phi, alpha = alpha, sigma.sq.prior.shape = sigma.sq.prior.shape,
sigma.sq.prior.rate = sigma.sq.prior.rate, n.samples = n.samples)
-------------------------------------------------
General model description
-------------------------------------------------
Model fit with 213 observations.
Number of covariates 3 (including intercept if specified).
Using the exponential spatial correlation model.
-------------------------------------------------
Sampling
-------------------------------------------------
Sampled: 5000 of 5000, 100%
-------------------------------------------------
Recovering spatial effects
-------------------------------------------------
names(m.exact)
[1] "X" "n" "p"
[4] "Y" "n.samples" "beta.prior.mean"
[7] "beta.prior.precision" "coords" "cov.model"
[10] "phi" "alpha" "sigma.sq.prior.shape"
[13] "sigma.sq.prior.rate" "verbose" "p.samples"
[16] "sp.effects"
plot(m.exact$p.samples, density = FALSE)
Now let’s fit the full Bayesian model using spLM()
.
cov.model <- "exponential"
starting <- list(phi = 3/(0.5 * d.max), sigma.sq = 5, tau.sq = 5)
tuning <- list(phi = 0.5, sigma.sq = 0.05, tau.sq = 0.05)
priors <- list("beta.Flat", phi.Unif = c(3/d.max, 3/(0.01 * d.max)), sigma.sq.IG = c(2,
5), tau.sq.IG = c(2, 5))
m.1 <- spLM(temp ~ coords, coords = coords.utm, starting = starting, tuning = tuning,
priors = priors, cov.model = cov.model, n.samples = n.samples, n.report = 2000)
----------------------------------------
General model description
----------------------------------------
Model fit with 213 observations.
Number of covariates 3 (including intercept if specified).
Using the exponential spatial correlation model.
Number of MCMC samples 5000.
Priors and hyperpriors:
beta flat.
sigma.sq IG hyperpriors shape=2.00000 and scale=5.00000
tau.sq IG hyperpriors shape=2.00000 and scale=5.00000
phi Unif hyperpriors a=0.00356 and b=0.35645
-------------------------------------------------
Sampling
-------------------------------------------------
Sampled: 2000 of 5000, 40.00%
Report interval Metrop. Acceptance rate: 36.60%
Overall Metrop. Acceptance rate: 36.60%
-------------------------------------------------
Sampled: 4000 of 5000, 80.00%
Report interval Metrop. Acceptance rate: 35.20%
Overall Metrop. Acceptance rate: 35.90%
-------------------------------------------------
plot(m.1$p.theta.samples)
The spLM()
function only updates the covariance
parameters. The regression coefficients \(\beta\)’s and spatial random effects \(w\) are recovered via the
spRecover()
function that performs composition sampling
using post burn-in samples.
burn.in <- floor(0.75 * n.samples)
m.1 <- spRecover(m.1, start = burn.in, thin = 2)
-------------------------------------------------
Recovering beta and w
-------------------------------------------------
Sampled: 99 of 626, 15.81%
Sampled: 199 of 626, 31.79%
Sampled: 299 of 626, 47.76%
Sampled: 399 of 626, 63.74%
Sampled: 499 of 626, 79.71%
Sampled: 599 of 626, 95.69%
round(summary(m.1$p.theta.recover.samples)$quantiles[, c(3, 1, 5)], 3)
50% 2.5% 97.5%
sigma.sq 9.719 5.545 15.584
tau.sq 1.101 0.682 1.775
phi 0.005 0.004 0.010
round(summary(m.1$p.beta.recover.samples)$quantiles[, c(3, 1, 5)], 3)
50% 2.5% 97.5%
(Intercept) 45.498 -43.370 128.072
coordslon 0.249 -0.478 0.956
coordslat -0.527 -1.669 0.643
Now let’s take a look at the estimates of the random effects compared with the non-spatial regression residuals.
quants <- function(x) {
quantile(x, prob = c(0.5, 0.025, 0.975))
}
## Random effects from bayesGeostatExact()
w.exact.summary <- apply(m.exact$sp.effects, 1, quants)
w.exact.surf <- mba.surf(cbind(coords, w.exact.summary[1, ]), no.X = 200, no.Y = 200,
extend = TRUE, sp = TRUE)$xyz.est
proj4string(w.exact.surf) <- "+proj=longlat +datum=WGS84"
w.exact.surf <- raster(w.exact.surf)
## Random effects from spLM()
w.m1.summary <- apply(m.1$p.w.recover.samples, 1, quants)
w.m1.surf <- mba.surf(cbind(coords, w.m1.summary[1, ]), no.X = 200, no.Y = 200, extend = TRUE,
sp = TRUE)$xyz.est
proj4string(w.m1.surf) <- "+proj=longlat +datum=WGS84"
w.m1.surf <- raster(w.m1.surf)
## Make the color ramp
all.values <- c(values(w.exact.surf), values(w.m1.surf), values(resid.surf))
pal <- colorNumeric(blue.red, all.values, na.color = "transparent")
base.map %>%
addRasterImage(w.exact.surf, colors = pal, opacity = 0.75, group = "bayesGeostatExact") %>%
addRasterImage(w.m1.surf, colors = pal, opacity = 0.75, group = "spLM m.1") %>%
addRasterImage(resid.surf, colors = pal, opacity = 0.75, group = "Regression residuals") %>%
addLegend("bottomright", pal = pal, values = all.values, opacity = 0.75, title = "<center>Random effects &<br>Regression<br>residuals</center>") %>%
addLayersControl(baseGroup = c("Terrain", "Satellite"), overlayGroups = c("bayesGeostatExact",
"spLM m.1", "Regression residuals"), options = layersControlOptions(collapsed = FALSE)) %>%
hideGroup(c("spLM m.1", "Regression residuals"))
Looks like the spatial random effects are picking up the effect of
elevation so let’s just put it in the mean and refit the non-spatial and
spLM()
models.
lm.obj <- lm(temp ~ elev + lon + lat, data = data.frame(coords))
summary(lm.obj)
Call:
lm(formula = temp ~ elev + lon + lat, data = data.frame(coords))
Residuals:
Min 1Q Median 3Q Max
-4.6706 -0.8541 -0.1308 0.9088 3.5183
Coefficients:
Estimate Std. Error t value Pr(>|t|)
(Intercept) 7.9372343 5.9584509 1.332 0.184
elev -0.0059590 0.0002029 -29.374 < 2e-16 ***
lon -0.1982295 0.0499149 -3.971 9.83e-05 ***
lat -0.4951096 0.0691068 -7.164 1.32e-11 ***
---
Signif. codes: 0 '***' 0.001 '**' 0.01 '*' 0.05 '.' 0.1 ' ' 1
Residual standard error: 1.412 on 209 degrees of freedom
Multiple R-squared: 0.8415, Adjusted R-squared: 0.8393
F-statistic: 370 on 3 and 209 DF, p-value: < 2.2e-16
v <- variog(coords = coords.utm, data = resid(lm.obj), uvec = (seq(0, 0.75 * d.max,
length = 20)))
variog: computing omnidirectional variogram
plot(v, xlab = "Distance (km)")
Now call spLM()
with elevation as a covariate after
adjusting the variance priors then recover the parameters via the
spRecover()
function.
priors$sigma.sq.IG <- c(2, 1)
priors$tau.sq.IG <- c(2, 1)
m.2 <- spLM(temp ~ elev + coords, coords = coords.utm, starting = starting, tuning = tuning,
priors = priors, cov.model = cov.model, n.samples = n.samples, n.report = 2000)
----------------------------------------
General model description
----------------------------------------
Model fit with 213 observations.
Number of covariates 4 (including intercept if specified).
Using the exponential spatial correlation model.
Number of MCMC samples 5000.
Priors and hyperpriors:
beta flat.
sigma.sq IG hyperpriors shape=2.00000 and scale=1.00000
tau.sq IG hyperpriors shape=2.00000 and scale=1.00000
phi Unif hyperpriors a=0.00356 and b=0.35645
-------------------------------------------------
Sampling
-------------------------------------------------
Sampled: 2000 of 5000, 40.00%
Report interval Metrop. Acceptance rate: 38.45%
Overall Metrop. Acceptance rate: 38.45%
-------------------------------------------------
Sampled: 4000 of 5000, 80.00%
Report interval Metrop. Acceptance rate: 35.70%
Overall Metrop. Acceptance rate: 37.08%
-------------------------------------------------
m.2 <- spRecover(m.2, start = burn.in, thin = 2)
-------------------------------------------------
Recovering beta and w
-------------------------------------------------
Sampled: 99 of 626, 15.81%
Sampled: 199 of 626, 31.79%
Sampled: 299 of 626, 47.76%
Sampled: 399 of 626, 63.74%
Sampled: 499 of 626, 79.71%
Sampled: 599 of 626, 95.69%
round(summary(m.2$p.theta.recover.samples)$quantiles[, c(3, 1, 5)], 2)
50% 2.5% 97.5%
sigma.sq 1.18 0.69 2.50
tau.sq 0.99 0.60 1.31
phi 0.01 0.00 0.03
round(summary(m.2$p.beta.recover.samples)$quantiles[, c(3, 1, 5)], 2)
50% 2.5% 97.5%
(Intercept) 10.04 -22.25 34.82
elev -0.01 -0.01 -0.01
coordslon -0.19 -0.48 0.05
coordslat -0.54 -0.90 -0.14
w.2.summary <- apply(m.2$p.w.recover.samples, 1, quants)
Now let’s take a look at these new random effects compared with those
from m.1
.
w.m2.surf <- mba.surf(cbind(coords, w.2.summary[1, ]), no.X = 200, no.Y = 200, extend = TRUE,
sp = TRUE)$xyz.est
proj4string(w.m2.surf) <- "+proj=longlat +datum=WGS84"
w.m2.surf <- raster(w.m2.surf)
pal.2 <- colorNumeric(blue.red, values(w.m2.surf), na.color = "transparent")
base.map %>%
addRasterImage(w.m2.surf, colors = pal.2, opacity = 0.75, group = "spLM m.2") %>%
addRasterImage(w.m1.surf, colors = pal, opacity = 0.75, group = "spLM m.1") %>%
addRasterImage(resid.surf, colors = pal, opacity = 0.75, group = "Regression residuals") %>%
addLegend("bottomleft", pal = pal.2, values = values(w.m2.surf), opacity = 0.75,
title = "<center>Random effects (m.2)</center>") %>%
addLegend("bottomright", pal = pal, values = all.values, opacity = 0.75, title = "<center>Random effects (m.1) &<br>Regression<br>residuals</center>") %>%
addLayersControl(baseGroup = c("Terrain", "Satellite"), overlayGroups = c("spLM m.1",
"spLM m.2", "Regression residuals"), options = layersControlOptions(collapsed = FALSE)) %>%
hideGroup(c("spLM m.1", "Regression residuals"))
###Prediction
Recall, our interest is in creating a temperature surface with
associated uncertainty. This is done by sampling from posterior
predictive distributions at 17,545 new locations across Colorado. These
prediction locations are held in pred.coords
with spatially
coinciding elevation in pred.elev
. Sampling is done using
the spPredict()
function. Ideally, we would draw more
posterior predictive samples, but in the interest of time we take only a
small sample to generate the summary statistics and maps. Using more CPU
would greatly increase prediction speed.
pred.X <- cbind(1, pred.elev, pred.coords)
m.2.pred <- spPredict(m.2, pred.covars = pred.X, pred.coords = pred.coords, start = burn.in,
thin = 20, n.report = 10)
-------------------------------------------------
Recovering beta
-------------------------------------------------
----------------------------------------
General model description
----------------------------------------
Model fit with 213 observations.
Prediction at 17545 locations.
Number of covariates 4 (including intercept if specified).
Using the exponential spatial correlation model.
-------------------------------------------------
Sampling
-------------------------------------------------
Sampled: 10 of 63, 14.29%
Sampled: 20 of 63, 30.16%
Sampled: 30 of 63, 46.03%
Sampled: 40 of 63, 61.90%
Sampled: 50 of 63, 77.78%
Sampled: 60 of 63, 93.65%
y.p.summary <- apply(m.2.pred$p.y.predictive.samples, 1, quants)
y.p.width <- y.p.summary[3, ] - y.p.summary[2, ]
The summaries of the posterior predictive distributions can be mapped or exported for use in a GIS system.
y.median.surf <- rasterFromXYZ(cbind(pred.coords, y.p.summary[1, ]))
proj4string(y.median.surf) <- "+proj=longlat +datum=WGS84"
pal.1 <- colorNumeric(blue.red, c(values(y.median.surf), temp), na.color = "transparent")
y.width.surf <- rasterFromXYZ(cbind(pred.coords, y.p.width))
proj4string(y.width.surf) <- "+proj=longlat +datum=WGS84"
pal.2 <- colorNumeric(blue.red, values(y.width.surf), na.color = "transparent")
base.map %>%
addRasterImage(y.median.surf, colors = pal.1, opacity = 0.9, group = "Predicted temperature") %>%
addRasterImage(y.width.surf, colors = pal.2, opacity = 0.9, group = "Prediction 95% CI width") %>%
addCircleMarkers(lng = coords[, 1], lat = coords[, 2], col = pal.1(temp), fillOpacity = 0.9,
stroke = FALSE, radius = 3, popup = paste("Mean min temp:", round(temp, 1)),
group = "Stations") %>%
addLegend("bottomleft", pal = pal.1, values = values(y.median.surf), opacity = 0.9,
title = "<center>Predicted<br>temperature C</center>") %>%
addLegend("bottomright", pal = pal.2, values = values(y.width.surf), opacity = 0.9,
title = "<center>Prediction<br>95% CI width</center>") %>%
addLayersControl(baseGroup = c("Terrain", "Satellite"), overlayGroups = c("Predicted temperature",
"Prediction 95% CI width", "Stations"), options = layersControlOptions(collapsed = FALSE)) %>%
hideGroup(c("Prediction 95% CI width", "Stations"))
Here we take a very brief look at some of the model assessment tools
in spBayes
. Specifically, we compare the non-spatial and
spatial regression models using the deviance information criterion
(DIC), where lower values of DIC suggest improved fit.
lm.ref <- bayesLMRef(lm.obj, 500)
spDiag(lm.ref)$DIC
-------------------------------------------------
Calculating scores
-------------------------------------------------
value
bar.D 361.13150
D.bar.Omega 356.09917
pD 5.03233
DIC 366.16383
spDiag(m.2)$DIC
-------------------------------------------------
Calculating scores
-------------------------------------------------
value
bar.D 204.98381
D.bar.Omega 130.20929
pD 74.77452
DIC 279.75833