Assessing spatial confounding in Bayesian shared component disease mapping models via SPOCK

layout: true
  
<div class="my-footer"><span>
douglasrm.azevedo@gmail.com   
</span></div>

---
class: title-page

<div>
  <h3 style="text-align: justify; padding-top:50px">Assessing spatial confounding in Bayesian shared component disease mapping models via SPOCK</h3>
  <h4 style="text-align: right;">with applications to SEER cancer data</h4>
<div>

<div style="padding-top:130px; color:#858585">
  <div style="float:left">
    <h5>
      Douglas R. Mesquita Azevedo </br> 
      Dipankar Bandyopadhyay </br> 
      Marcos Oliveira Prates
    </h5>
  </div>
  <div style="float:right">
    <img src="img/logo_ufmg.png" alt="ufmg" height="100"/>
    <img src="img/logo_vcu.png" alt="vcu" height="100"/>
  </div>
</div>

<div style="align:center; padding-top:135px; color:#858585">
  <h5 style="text-align:center"> 
    April 2019
    <br>
    Richmond, USA
  </h5>
</div>

---

## <i class="fas  fa-newspaper "></i> Summary

+ Motivation
+ Disease mapping
    + ICAR
    + Shared component model
+ Spatial confounding
    + General idea
    + SPOCK
+ INLA
    + INLA method
    + Shared Component model in INLA
+ Simulation study
+ SEER data analisys

---

## <i class="fas  fa-lightbulb "></i> Motivation

Public health has a real and lasting positive effect on people

<h5 class="source">Source: <a href="http://graphics.wsj.com/infectious-diseases-and-vaccines/">http://graphics.wsj.com/infectious-diseases-and-vaccines/</a></h5>

---
class: slide-page

## <i class="fas  fa-lightbulb "></i> Motivation

Since we have spatial structure why not exploit it?

.pull-left[
  <center>
    <img src="img/john_snow.jpg" alt="graph" height="250"/>
    <h4 style="margin: 5px 0 0 0;">John Snow - 1854</h4>
    <h5 style="margin: 0 0 0 0;">Cholera cases in London</h5>
  </center>
]

<h5 class="source">Source: <a href="https://www.arcgis.com/apps/PublicInformation/index.html?appid=d7deb67f810d46dfacb80ff80ac224e9">https://www.arcgis.com/apps/PublicInformation/index.html?appid=d7deb67f810d46dfacb80ff80ac224e9</a></h5>

---
class: slide-page

## <i class="fas  fa-lightbulb "></i> Motivation

We are always interested in the effect of covariates. However, why not include spatial structure to model unobserved covariates?

.mathbox[
  `$$\small{g(\mu_{\text{income}}) = \beta_0 + \beta_1\times \text{Yers of study} + \beta_2\text{Experience} + \ldots + \text{County}}$$`
]

How to deal with .enfase[County] information?

+ Capital vs Country town
+ Urban vs Rural area
+ `$\beta_{31}\text{County}_1 + \beta_{32}\text{County}_2 + \ldots + \beta_{3q}\text{County}_q$`
+ `$\text{County} \sim \text{Spatial model}(\theta), ~ \dim(\theta) < q$`
    + Fewer parameters and <span class="enfase">more general</enfase>
    + <span class="enfase">Spatial structure</span> is considered
    
---
class: slide-page

## <i class="fas  fa-lightbulb "></i> Motivation

Lung cancer mortality (2010 to 2014)

.pull-right[
  <h5 class="source">Source: <a href="https://gis.cancer.gov/mapstory/lung/">https://gis.cancer.gov/mapstory/lung/</a></h5>
]

---
class: slide-page

## <i class="fas  fa-lightbulb "></i> Motivation

What if an important variable and the spatial effect (non observed variable) bring similar information?

<center><h3 style="margin:0 0 0 0;">.enfase[Spatial Confounding]</h3></center>

<div style=""max-width:75%>
.pull-left[
  <center>
    <h5>Lung cancer mortality (2010 to 2014)</h5>
    <img src="img/gis_lung_mortality_usa.png" alt="graph" height="250"/>
  </center>
]

.pull-right[
  <center>
    <h5>Current smoking (2015)</h5>
    <img src="img/gis_smoking_usa.png" alt="graph" height="250"/>
  </center>
]
</div>

<h5 class="source">Source: <a href="https://gis.cancer.gov/mapstory/lung/">https://gis.cancer.gov/mapstory/lung/</a></h5>

---
class: slide-page

## <i class="fas  fa-lightbulb "></i> Motivation

What if an important variable and the spatial effect (non observed variable) bring similar information?

<center><h3 style="margin:0 0 0 0;">.enfase[Spatial Confounding]</h3></center>

.pull-left[
  <center>
    <h5>Votes to PT 2014</h5>
    <img src="img/voronoys_pt.png" alt="graph" height="250"/>
  </center>
]
  
.pull-right[
  <center>
    <h5>Income</h5>
    <img src="img/voronoys_income.png" alt="graph" height="250"/>
  </center>
]

<h5 class="source">Source: <a href="https://voronoys.shinyapps.io/voronoys/">https://voronoys.shinyapps.io/voronoys/</a></h5>

---
class: slide-page

## <i class="fas  fa-lightbulb "></i> Motivation

Sometimes we have more than one outcome and the joint behavior is desired

<center><h3 style="margin:0 0 0 0;">.enfase[Multivariate Models]</h3></center>

.pull-left[
  <center>
    <h5>Lung cancer mortality (2010 to 2014)</h5>
    <img src="img/gis_lung_mortality_usa.png" alt="graph" height="275"/>
  </center>
]

.pull-right[
  <center>
    <h5>Breast cancer mortality (2010 to 2014)</h5>
    <img src="img/gis_breast_mortality_usa.png" alt="graph" height="275"/>
  </center>
]

<h5 class="source">Source: <a href="https://gis.cancer.gov/mapstory/lung/">https://gis.cancer.gov/mapstory/lung/</a>; <a href="https://gis.cancer.gov/mapstory/lung/">https://gis.cancer.gov/mapstory/breast/</a></h5>

---
class: slide-page

## <i class="fas  fa-lightbulb "></i> Motivation

What is our interest?

+ <span style="text-decoration:line-through">Understand the direct realtionship of two outcomes (.enfase[MCAR])</span>
    + Specific spatial pattern of each outcome
    + Dependence level of outcomes
+ Understand the common spatial pattern of different outcomes (.enfase[Shared Component model])
    + <span class="enfase">Specific spatial pattern</span> of each outcome
    + <span class="enfase">Overall spatial pattern</span>

---

<div class="my-section">
  <h2>
    <i class="fas  fa-globe-americas "></i> Disease mapping
  </h2>
</div>

---
class: slide-page

## <i class="fas  fa-hashtag "></i> Poisson model

<br>

+ Most common model used for univariate outcomes

.mathbox[
  `$$Y_i \sim \text{Poisson}(E_i\theta_i) ~ \forall ~ i \in \{1, \ldots, n\}$$`
  `$$\log(\theta_i) = \beta_0 + \beta_1X_{1i} + \ldots + \beta_pX_{pi}$$`
]

+ `$E_i = P_i\times r$` is the .enfase[expected] number of new cases of the disease `$Y$` in region `$i$`. 
+ `$P_i$` is the .enfase[population] in region `$i$`. 
+ `$r = \frac{\sum_iY_i}{\sum_iP_i}$` is the .enfase[overall incidence rate]. 
+ `$\theta_i$` is the .enfase[incidence rate] in region `$i$`.

---
class: slide-page

## <i class="fas  fa-project-diagram "></i> ICAR Model

+ .enfase[I]ntrinsic .enfase[C]onditional .enfase[A]uto.enfase[r]egressive model  (<a href="https://rss.onlinelibrary.wiley.com/doi/abs/10.1111/j.2517-6161.1974.tb00999.x">Besag, J. (1974)</a>)

.mathbox[
  `$$\small{\psi_i|\psi_{-i} \sim \text{Normal}\Bigg(\sum_{j \sim i}b_{ij}\psi_j, \sigma^2_{i}\Bigg) ~ \forall ~ i \in \{1, \ldots, n\}}$$`
]

+ `$j \sim i$` means that the region `$j$` is .enfase[neighbor] of region `$i$`. 
+ `$b_{ij}$` is a .enfase[weight] that measure the importance of region `$j$` in the conditional mean of `$\psi_i$`. 
+ `$\sigma^2_{i}$` is the .enfase[variance] parameter in the region `$i$`.

.mathbox[
  `$$\small{\text{Brook's lemma} \implies \pi(\psi) \propto \exp\Bigg\{-\frac{1}{2}\psi'\mathbf{T}^{-1}(\mathbf{I_n}-\mathbf{B})\psi\Bigg\}}$$`
]

---
class: slide-page

## <i class="fas  fa-project-diagram "></i> ICAR Model

+ It looks like a `$\text{Normal}_n(\mathbb{0}, \mathbf{T}^{-1}(\mathbf{I_n}-\mathbf{B}))$`
+ It means that
    + `$\mathbf{T}^{-1}(\mathbf{I_n}-\mathbf{B})$` (precision matrix) must be symetric
    + `$\mathbf{T}^{-1}(\mathbf{I_n}-\mathbf{B})$` (precision matrix) must be positive definite
+ ICAR model:
    + `$\displaystyle \small{\psi_i|\psi_{-i} \sim \text{Normal}\Bigg(\sum_{j \sim i}\frac{\psi_j}{w_{i+}}, \frac{\sigma^2}{w_{i+}}\Bigg) ~ \forall ~ i \in \{1, \ldots, n\}}$`
    
<center><img src="img/adjacency_matrix.png" alt="graph" height="175"/></center>

---
class: slide-page

## <i class="fas  fa-project-diagram "></i> ICAR Model

<br>

+ In this case: `$\mathbf{T}^{-1}(\mathbf{I_n}-\mathbf{B}) = \tau(\mathbf{D} - W)$`; `$\tau = \frac{1}{\sigma^{2}}$`

where `$D = \text{diag}(w_{1+}, \ldots, w_{n+})$` and `$W$` is the adjacency matrix defined before

+ It means that
    + `$\tau(\mathbf{D} - W)$` is symetric
    + `$\tau(\mathbf{D} - W)$` is not positive definite
+ In general it is possible to show that under a bayesian framework the .enfase[posterior distribution is proper]
+ It is widely .enfase[used as prior] for latent effects in spatial models

---
class: slide-page

## <i class="fas  fa-globe-americas "></i> Disease mapping

+ We can now add a latent effect in the Poisson model and assing an ICAR structure to it

.mathbox[
  `$$Y_i \sim \text{Poisson}(E_i\theta_i) ~ \forall ~ i \in \{1, \ldots, n\}$$`
  `$$\log(\theta_i) = \beta_0 + \beta_1X_{1i} + \ldots + \beta_pX_{pi} + \psi_i$$`
  `$$\psi \sim ICAR(\tau_{\psi})$$`
]

+ Our main focus is still on the .enfase[interpretation of covariates], but now we also want to check whether we have spatial patterns or not
    + It is possible to guess which <span class="enfase">variable is missing</span> 
    + It is possible to create polices in the <span class="enfase">hotspot places</span>
    
---
class: slide-page

## <i class="fas  fa-share-alt-square "></i> Shared Component model

+ Shared Component Model (<a href="https://rss.onlinelibrary.wiley.com/doi/abs/10.1111/1467-985X.00187">Knorr‐Held, L., & Best, N. G. (2001)</a>) 
+ Sometimes we have interest in .enfase[more than one outcome]
+ Univariate models may .enfase[not] be .enfase[realistics]

.mathbox[
  `$$Y_{id} \sim \text{Poisson}(E_{id}\theta_{id}) ~ \forall ~ i \in \{1, \ldots, n\}, ~ d \in \{1, 2\}$$`
  `$$\log(\theta_{i1}) = X_{i1}\beta_1 + \delta\psi_i + \phi_{i1}$$`
  `$$\log(\theta_{i2}) = X_{i2}\beta_2 + \frac{1}{\delta}\psi_i + \phi_{i2}$$`
  `$$\psi_i \sim ICAR(\tau_{\psi}); ~ \phi_1 \sim ICAR(\tau_{\phi_1}); ~ \phi_2 \sim ICAR(\tau_{\phi_2})$$`
]

where `$\delta$` is a scale parameter to allow  <span class="enfase">different levels of dependence</span> on the shared component

---

<div class="my-section">
  <h2>
    <i class="fas  fa-random "></i> Spatial Confounding
  </h2>
</div>

---
class: slide-page

## <i class="fas  fa-map-pin "></i> Multicollinearity

+ When two or more fixed effects bring the same information to model there is an .enfase[inflation on the coefficients variance]
+ A reasonable way to check it is using the .enfase[VIF]

.mathbox[
  `$$\displaystyle \widehat{var}({\hat{\beta }}_{j})={\frac{s^{2}}{(n-1)\widehat {{\rm {var}}}(X_{j})}}\cdot{\frac  {1}{1-R_{j}^{2}}},$$`
  `$$\displaystyle \mathrm {VIF_{j}} ={\frac {1}{1-R_{j}^{2}}}, ~~~~~~~~j=0,1,...,p,$$`
]

where,

+ `$R^2_{j}$`: Coefficient of determination of `$X_j \sim \gamma X_{-j}$`
+ `$s^2$`: Is the residual variance

---
class: slide-page

## <i class="fas  fa-random "></i> Spatial Confounding

+ It happens when .enfase[fixed and latent effects] bring the .enfase[same information] to the model
+ Let's consider a .enfase[Gaussian spatial regression]

.mathbox[
  	`$$\displaystyle y|\beta, b, \tau_{\epsilon} \sim N(X\beta + Zb, \tau_{\epsilon}I_n),$$`
		`$$Zb|\tau_s, \sim N(0, \tau_{\psi}Q) \implies b|\tau_{\psi}, \sim N(0, \tau_sD),$$`
]

where
+ `$b = Z'\psi \implies Zb = \psi$` 
+ `$\psi \sim ICAR(\tau_{\psi})$`
+ `$Q = ZDZ'$` is the spectral decomposition of `$Q$`

---
class: slide-page

## <i class="fas  fa-random "></i> Spatial Confounding

+ Reich et al. (2006) showed that

.mathbox[
  	`$$\small{E(\beta|\tau_{\epsilon}, \tau_{\psi}, y) = \displaystyle (X'X)^{-1}X'(y - Z\hat{b}) = \beta_{ols} - (X'X)^{-1}Z\hat{b}}$$`
  	`$$\small{Var^{-1}(\beta|\tau_{\epsilon}, \tau_{\psi}, y) = \displaystyle \tau_{\epsilon}(X'X) - X' Var(Zb|\beta, \tau_{\epsilon}, \tau_{\psi}, y)X =\tau_{\epsilon}X'Z\tilde{D}Z'X,}$$`
]

where

+ `$\hat{b} = E(b|\tau_{\epsilon}, \tau_{\psi}, y) = (Z'P^cZ + rD)^{-1}$`
+ `$P^c = I - X(X'X)^{-1}X'$`
+ `$r = \frac{\tau_{\psi}}{\tau_{\epsilon}}$`
+ `$\tilde{D} = I - (I+rD)^{-1}$`

---
class: slide-page

## <i class="fas  fa-wrench "></i> Restricted Spatial Regression

+ Original model

.mathbox[
  	`$$\displaystyle y|\beta, b, \tau_{\epsilon} \sim N(X\beta + Zb, \tau_{\epsilon}I_n),$$`
]

+ Restricted model

.mathbox[
  	`$$\displaystyle y|\beta, b, \tau_{\epsilon} \sim N(X\beta + P^cZb, \tau_{\epsilon}I_n),$$`
]

where 
+ `$P = X(X'X)^{-1}X'$` is the .enfase[projection matrix] of `$X$`
+ `$PZb + P^cZb = PZb + (I - P)Zb = Zb$`

---
class: slide-page

## <i class="fas  fa-map-pin "></i> Projection matrix

+ Linear regression model

.mathbox[
+ `$\hat{\beta}_{ols} = (X'X)^{-1}X'y,$` 
+ `$\hat{y} = X\hat{\beta} = X(X'X)^{-1}X'y = Py,$` 
+ `$y - \hat{y} = \epsilon = y - Py = (I-P)y = P^cy,$` 
+ `$y = X\beta + \epsilon = Py + P^cy$`
]

+ `$X\hat{\beta}_{ols}$` is the .enfase[projection] of `$y$` in the space of `$X$` and it is our best guess about `$y$` given our set of covariates `$X$`
+ `$\epsilon$` is the information about `$y$` that is not in `$X$`. In other words it is the projection of `$y$` in the orthogonal space of `$X$`

---
class: slide-page

## <i class="fas  fa-map-pin "></i> Comments

<br>

+ It is .enfase[not possible] (or really difficult) to calculate the expectation value and variance under .enfase[non-Gaussian] models
    + At least this approach seems to works fine even to GLMM models
+ It is a .enfase[constraint] `$\implies$` .enfase[computational limitations]
+ The .enfase[variance is underestimated]
+ For more than one spatial effect
    + The <span class="enfase">computation</span> becomes more <span class="enfase">difficult</span>
    + It is <span class="enfase">not possible to use standard softwares</span> (which means that we need to code)
    + More than one sources of Spatial Confounding `$P\epsilon_1, \ldots, P\epsilon_q$`
    
---
class: slide-page

## <i class="fas  fa-hand-spock "></i> SPOCK

+ .enfase[SP]atial .enfase[O]rthogonal .enfase[C]entroid .enfase["K"]orrection (<a href="https://projecteuclid.org/euclid.ba/1537258137">Prates, M. et. al (2018)</a>)

+ Alleviating Spatial Confounding for areal data problems by displacing the geographical centroids

---
class: slide-page

## <i class="fas  fa-hand-spock "></i> SPOCK

<br>

+ Lets call `$W$` the adjacency matrix for the original set of centroids.
+ Let be `$c_i = \{c_{1i}, c_{2i}\} ~ \forall i \in [1, ..., n]$`

<br>

.pull-right[
  `$W = \begin{bmatrix} 0 & 0 & 1 & 0 & 0 \\ 0 & 0 & 1 & 1 & 0 \\ 1 & 1 & 0 & 1 & 1 \\ 0 & 1 & 1 & 0 & 1 \\ 0 & 0 & 1 & 1 & 0 \end{bmatrix}$`
]

---
class: slide-page

## <i class="fas  fa-hand-spock "></i> SPOCK

+ We want a .enfase[new set of centroids] `$c^{\ast} = P^c\times[c_1, c_2]$`
+ We want to find `$W^{\ast}$`: an adjancency matrix .enfase["free" of spatial confounding]

To create `$W^{\ast}$` we can use, for example, the .enfase[k-nearest neighbors algorithm]

---

---
class: slide-page

## <i class="fas  fa-box-open "></i> INLA
+ .enfase[I]ntegrated .enfase[N]ested .enfase[L]aplace .enfase[A]pproximation (<a href="https://rss.onlinelibrary.wiley.com/doi/full/10.1111/j.1467-9868.2008.00700.x">Rue, H., Martino, S., & Chopin, N. (2009)</a>)

---
class: slide-page

## <i class="fas  fa-chart-bar "></i> Bayesian statistics

+ We want to obtain the _a posteriori_ distribution

`$$\pi(\theta|y) = \frac{\pi(y|\theta)\pi(\theta)}{\int\pi(y|\theta)\pi(\theta)d\theta} \propto \pi(y|\theta)\pi(\theta)$$`

+ For .enfase[simple models] it is possible to identify the distribution of `$\theta|y$`
+ In real problems the <span class="enfase">dimension of `$\theta = \{\theta_1, \ldots, \theta_q\}$` is big</span>
+ To have the _a posteriori_ .enfase[marginal distribution]:

`$$\pi(\theta_i|y) = \int_{\theta_{-i}}\pi(\theta|y)d\theta_{-i}$$`

+ It is .enfase[not easy] to calculate these integrals
+ Instead, in general, we use the .enfase[full conditional distributions] and a MCMC method to have a sample from these distributions

---
class: slide-page

## <i class="fas  fa-chart-bar "></i> MCMC methods

+ In general .enfase[MCMC] methods are .enfase[computationally intensive]
+ General public don't have a "free" access to this metodology
    + It is necessary to code
    + Few tools to use it
    + Lots of tunning parameters
    + It takes time and people are impatient
    
<center>
  <img src="img/mcmc_mixing.png" alt="graph" height="200px"/>
</center>

<h5 class="source">Source: <a href="https://support.sas.com/documentation/cdl/en/statug/63962/HTML/default/viewer.htm#statug_mcmc_sect055.htm">https://support.sas.com/documentation/cdl/en/statug/63962/HTML/default/viewer.htm#statug_mcmc_sect055.htm</a></h5>

---
class: slide-page

## <i class="fas  fa-box-open "></i> INLA

+ .enfase[INLA] is a .enfase[fast] way to approximate the desired marginal distributions
+ It is possible to use INLA if the model fits in the next equation

`$$g(\mu_i) = \eta_i = \alpha + \sum_{j = 1}^{n_f}f^{(j)}(u_{ji}) + \sum_{k = 1}^{n_{\beta}}\beta_kz_{ki} + \epsilon_i,$$`

+ `$g(.)$` is a .enfase[link function]
+ `$\alpha$` is the .enfase[intercept] and `$\beta_k$` is a set of .enfase[coefficients] associated to fixed effects `$z_{ki}$`
+ `$f^{(j)}(u_{ji})$` are unknown .enfase[functions] of the covariates `$u_{ij}$`
+ `$\epsilon_i$` are .enfase[unstructured] terms.

INLA assumes Gaussian priors to the vector `$\{\alpha, f^{(j)}(.), \beta, \epsilon \}$` given raise to a .enfase[Gaussian Markov Random Field].

---
class: slide-page

## <i class="fas  fa-box-open "></i> R-INLA example

+ Spatial Poisson model
    + `$g(.) = \log(.)$`
    + `$\alpha$` is the intercept and `$\beta_k$` is a set of coefficients
    + `$f^{(1)}(u_{1i})$` is an ICAR( `$\tau_u$` ) model for the areas `$u_{1i}$`
    
<br>

```r
library(INLA)

f_inla <- y ~ X1 + X2 + 
  f(region_inla, model = 'besag', graph = W)
mod_inla <- inla(data = data_inla, 
                 formula = f_inla, 
                 family = "poisson") # zeroinflatedpoisson1
```

---
class: slide-page

## <i class="fas  fa-box-open "></i> R-INLA example

+ Spatio-temporal Gaussian model
    + `$g(.) = ~ .$`
    + `$\alpha$` is the intercept and `$\beta_k$` is a set of coefficients
    + `$f^{(1)}(u_{1i})$` is an ICAR( `$\tau_u$` ) model for the areas `$u_{i1}$`
    + `$f^{(2)}(u_{2i})$` is an AR(p) model for the times `$u_{2i}$`

```r
library(INLA)

f_inla <- y ~ X1 + X2 + 
  f(region_inla, model = 'besag', graph = W) +
  f(times_inla, model = 'ar', order = p)
mod_inla <- inla(data = data_inla, 
                 formula = f_inla, 
                 family = "gaussian")
```

---
class: slide-page

## <i class="fas  fa-cogs "></i> How does it work?

<br>

+ We assume that .enfase[all elements] in vector `$x$` are .enfase[Gaussian]

`$$x = [\alpha, \beta, f^{(1)}(\cdot), \ldots, f^{(n_j)}(\cdot)] \sim \mathcal{N}(0, \mathbf{\Sigma})$$`
+ `$\mathbf{\Sigma}^{-1} = Q$` is the precision matrix
+ `$Q$` is .enfase[sparse] `$\implies$` it is easier to find the .enfase[inverse matrix] by factorization
+ `$x$` may depend on `$\theta$`: parameters that are .enfase[not normally distributed]
    + Variances
    + Correlation
+ In general <span class="enfase"> `$\text{dim}(x) \gg \text{dim}(\theta)$` </span>
+ We can write `$\pi(x, \theta|y) = \pi(x|\theta, y)\pi(\theta| y)$`

---
class: slide-page

## <i class="fas  fa-cogs "></i> Laplace approximation

+ Taylor series expansion around the mode `$\hat{x}$`

.mathbox[
`$$\log f(x) \approx \log f(\hat{x}) + \frac{\partial \log f(\hat{x})} {\partial x} (x-\hat{x}) +  \frac{\partial^2 \log f(\hat{x})} {2 \partial x^2} (x-\hat{x})^2$$`
]

+ The second term is zero when `$x = \hat{x}$` (maximum point)
+ Lets call `$\sigma = \frac{\partial^2 \log f(\hat{x})} {\partial x^2}$`

---
class: slide-page

## <i class="fas  fa-cogs "></i> Laplace approximation

+ Taking the exponential in both sides

.mathbox[

`$$f(x)dx = \exp\{\log f(x) dx\} \approx constant\times\exp\Bigg\{-\frac{(x-\hat{x})^2}{2\hat{\sigma}^2}\Bigg\}dx$$`
]

+ Example for a `$\chi^2_k$`

.mathbox[
`$$f(x;\,k) = \frac{x^{(k/2-1)} e^{-x/2}}{2^{k/2} \Gamma\left(\frac{k}{2}\right)}, ~~ x \geq 0$$`
]

---
class: slide-page

## <i class="fas  fa-cogs "></i> Laplace approximation

.mathbox[
+ `$\log f(x) = (k/2 - 1) \log x - x/2 + c$`
+ `$\log f'(x) = (k/2 -1)/x - 1/2 = 0$`
+ `$\log f''(x) = -(k/2 - 1)/x^2$`
+ `$\implies \chi^2_k \overset{LA}{\sim}  N(\hat{x} = k-2, ~\hat{\sigma}^2 = 2(k-2))$`
]

---
class: slide-page

## <i class="fas  fa-box-open "></i> INLA

<br>

We want the posterior marginal distributions

<br>

.mathbox[
`$$\pi(x_j|y) = \int \pi(x_j, \theta|y)d\theta  = \int \pi(x_j|\theta, y)\pi(\theta|y)d\theta$$`
`$$\pi(\theta_k|y) = \int \pi(\theta|y) d\theta_{-k}$$`
]

---
class: slide-page

## <i class="fas  fa-box-open "></i> INLA

Approximation for `$\pi(\theta_k|y)$`

.mathbox[
`$$\pi(\theta|y) \approx \frac{\pi(y|x,\theta) \pi(x|\theta)\pi(\theta)}{\tilde{\pi}(\mathbf{x|\theta,y})} \bigg|_{x={\hat{x}}(\theta)}  = \tilde{\pi}(\theta|y)$$`
]

+ `$\tilde{\pi}(\mathbf{x|\theta,y})$` is the .enfase[Laplace approximation] of `$\pi(\mathbf{x|\theta,y})$`
+ `$\hat{x}(\theta)$` Is the mode of `$x$` given values of `$\theta$`

<h5 class="source">Source: <a href="https://rss.onlinelibrary.wiley.com/doi/epdf/10.1111/j.1467-9868.2008.00700.x">https://rss.onlinelibrary.wiley.com/doi/epdf/10.1111/j.1467-9868.2008.00700.x</a></h5>

---
class: slide-page

## <i class="fas  fa-box-open "></i> INLA

Approximation for `$\pi(x_j|y)$`

.mathbox[
`$$\tilde{\pi}(x_j|y) \approx \sum_{h=1}^H \tilde{\pi}(x_j | \theta^*_h, y) \tilde{\pi}(\theta^*_h|y) \Delta_h$$`
]

+ `$\tilde{\pi}(x_j|y)$` is the numerical integration of `$\tilde{\pi}(x_j, \theta| y)$` given a grid of `$\theta$`
+ `$\tilde{\pi}(x_j | \theta^*_h, y)$` can be
   + <span class="enfase">Gaussian approximation</span>: just take the desired dimension of `$\small{\tilde{\pi}(x | \theta, y)}$`
   + <span class="enfase">Laplace approximation</span>: use the Laplace approximation again `$\small{\pi(x_j | \theta, y) \propto \frac{\pi(x,\theta |y)}{\tilde{\pi}(x_{-j} | x_j, \theta, y)}}$`
   + <span class="enfase">Simplified Laplace approximation</span>: a computational alternative

---
class: slide-page

## <i class="fas  fa-box-open "></i> INLA

+ .enfase[Integrated]: Numerical integration to obtain the marginals
+ .enfase[Nested]: We need `$\pi(\theta|y)$` to get `$\pi(x_j|y)$` and we use the Laplace approximation in both of them
+ .enfase[Laplace approximation]: We used all the time to represent unknow distributions of `$x$` and `$x_j$`

---
class: slide-page

## <i class="fas  fa-box-open "></i> INLA and Shared Component

+ Shared Component model

.mathbox[
  `$$\scriptsize{Y_{id} \sim \text{Poisson}(E_{id}\theta_{id}) ~ \forall ~ i \in \{1, \ldots, n\}, ~ d \in \{1, 2\}}$$`
  `$$\scriptsize{\log(\theta_{i1}) = X_{i1}\beta_1 + \delta\psi_i + \phi_{i1}}$$`
  `$$\scriptsize{\log(\theta_{i2}) = X_{i2}\beta_2 + \frac{1}{\delta}\psi_i + \phi_{i2}}$$`
  `$$\scriptsize{\psi_i \sim ICAR(\tau_{\psi}); ~ \phi_1 \sim ICAR(\tau_{\phi_1}); ~ \phi_2 \sim ICAR(\tau_{\phi_2})}$$`
]

+ INLA

.mathbox[
`$$\scriptsize{g(\mu_i) = \eta_i = \alpha + \sum_{j = 1}^{n_f}f^{(j)}(u_{ji}) + \sum_{k = 1}^{n_{\beta}}\beta_kz_{ki} + \epsilon_i,}$$`
]

---
class: slide-page

## <i class="fas  fa-box-open "></i> R-INLA: _copy_ feature

+ We can include weights for the latent effects. However they are fixed.

.mathbox[
`$$g(\mu_i) = \eta_i = \alpha + \sum_{j = 1}^{n_f}{\color{red} {w_j}}f^{(j)}(u_{ji}) + \sum_{k = 1}^{n_{\beta}}\beta_kz_{ki} + \epsilon_i,$$`
]

+ .enfase[_copy_] allows the use of the .enfase[same element more than once]

```r
n <- 1000
i <- j <- 1:n
z <- rnorm(n); w <- runif(n)

y <- z + 2*z*w + rnorm(n)

f_inla <- y ~ f(i, model = "iid", fixed = TRUE) + # Var(z) = 1
  f(j, w, copy = "i", fixed = FALSE)              # w = weights
mod_inla <- inla(f_inla, data = data.frame(i, j, w, y))
```

---
class: slide-page

## <i class="fas  fa-box-open "></i> R-INLA: _copy_ feature

+ Shared component model:

.mathbox[
  `$$\scriptsize{\log(\theta_{i1}) = X_{i1}\beta_1 + {\color{red} {\delta}}\psi_i + \phi_{i1}}$$`
  `$$\scriptsize{\log(\theta_{i2}) = X_{i2}\beta_2 + {\color{red} {\frac{1}{\delta}}}\psi_i + \phi_{i2}}$$`
]

+ INLA:

.mathbox[
  `$$\scriptsize{\log(\theta_{i1}) = X_{i1}\beta_1 + {\color{red} {\gamma}}\psi^{\ast}_i + \phi_{i1}}$$`
  `$$\scriptsize{\log(\theta_{i2}) = X_{i2}\beta_2 + \psi^{\ast}_i + \phi_{i2}}$$`
]

---
class: slide-page

## <i class="fas  fa-box-open "></i> R-INLA: _copy_ feature

Equalizing (Vargas, F. R. (2013)):

.mathbox[
  `$${\color{red} {\delta}}\psi_i = {\color{red} {\gamma}}\psi^{\ast}_i,$$`
  `$${\color{red} {\frac{1}{\delta}}}\psi_i = \psi^{\ast}_i,$$`
]

then

We can do it _a posteriori_ using the .enfase[_inla.tmarginal_] function

---

<div class="my-section">
  <h2>
    <i class="fas  fa-table "></i> Simulation study
  </h2>
</div>

---
class: slide-page

## <i class="fas  fa-table "></i> Simulation study

+ Three scenarios: `$\delta = \{1, 1.5, 1.75\}$`
+ California shapefile was used (58 counties)
+ Data generation
    + Without spatial confounding
    + With spatial confounding ( `$X_2$` = latitude)
+ Fit methods
    + Shared component model 
    + Shared component model + SPOCK
+ For each combination .enfase[1000 datasets] `$\implies$` 12.000 fitted models
+ Total time `$\approx$` .enfase[5 hours]

---
class: slide-page

## <i class="fas  fa-table "></i> Simulation study - posterior means

---
class: slide-page

## <i class="fas  fa-table "></i> Simulation study - posterior SD

---
class: slide-page

## <i class="fas  fa-table "></i> Simulation study - Spatial effects

---

---
class: slide-page

## <i class="fas  fa-database "></i> SEER

<br>

+ .enfase[S]urveillance, .enfase[E]pidemiology, and .enfase[E]nd .enfase[R]esults Program
+ Data on cancer cases from several locations in the USA
+ Comparing men and women
    + lung and bronchus cancer
+ Comparing two cancers
    + Stomach and esophagus cancer
    + Lip, oral cavity and pharynx cancer
    
---
class: slide-page

## <i class="fas  fa-database "></i> SEER: Models

`$$\small{Y_{id} \sim P(E_{id}\theta_{id})}$$`

<hr>

M1 - .enfase[Univariate] non-spatial model:
.mathbox[
`$$\small{\log(\theta_{id}) = \alpha_d + X_{id}\beta},$$`
]

M2 - .enfase[Univariate] spatial model:
.mathbox[
`$$\small{\log(\theta_{id}) = \alpha_d + X_{id}\beta + \psi_{id}},$$`
`$$\small{\psi_1 \sim ICAR(\tau_{\psi_1}); ~~ \psi_2 \sim ICAR(\tau_{\psi_2})},$$`
]
    
---
class: slide-page

## <i class="fas  fa-database "></i> SEER: Models

M3 - .enfase[Shared Component] model without specific spatial term:
.mathbox[
`$$\small{\log(\theta_{i1}) = \alpha_1 + X_{i1}\beta + \delta\psi_{i}; ~~ \log(\theta_{i2}) = \alpha_2 + X_{i2}\beta + \frac{1}{\delta}\psi_{i}},$$`
`$$\small{\psi \sim ICAR(\tau_{\psi})},$$`
]

M4 - .enfase[Shared Component] model with specific spatial term:
.mathbox[
`$$\small{\log(\theta_{i1}) = \alpha_1 + X_{i1}\beta + \delta\psi_{i} + \phi_{1i}; ~~ \log(\theta_{i2}) = \alpha_2 + X_{i2}\beta + \frac{1}{\delta}\psi_{i} + \phi_{2i}},$$`
`$$\small{\psi \sim ICAR(\tau_{\psi}); ~~ \phi_1 \sim ICAR(\tau_{\phi_1}); ~~ \phi \sim ICAR(\tau_{\phi_2})}$$`
]

---
class: slide-page

## <i class="fas  fa-database "></i> SEER: Covariates

---
class: slide-page

## <i class="fas  fa-database "></i> SEER: Results

<br>

---
class: slide-page

## <i class="fas  fa-database "></i> SEER: Results

---
class: slide-page

## <i class="fas  fa-database "></i> SEER: Results

---
class: slide-page

## <i class="fas  fa-database "></i> SEER: Results

<br>

---
class: slide-page

## <i class="fas  fa-database "></i> SEER: Results

---
class: slide-page

## <i class="fas  fa-database "></i> SEER: Results

---
class: slide-page

## <i class="fas  fa-box-open "></i> INLA codes

<br>

```r
## inla_list is a list with: 
##     alpha1, alpha2, X, 
##     phi, phi_beta, psi1, psi2
## E is the expected number of new cases

f_inla <- Y ~ -1 + alpha1 + alpha2 + X +
  f(phi, model = "besag", graph = W) +
  f(phi_beta, copy = "phi",	
    hyper = list(theta = list(fixed = FALSE)), 
    range = c(0, Inf)) +
  f(psi1, model = "besag", graph = W) +
  f(psi2, model = "besag", graph = W)

inla_mod <- inla(formula = f_inla, 
                 data = inla_list, 
                 family = c("poisson", "poisson"), 
                 E = E) 
```

---

## <i class="fas  fa-book-open "></i> Conclusion

+ The .enfase[Shared Component model] is an alternative model for .enfase[multiple outcomes]
+ .enfase[Spatial confounding can mess up the results]
+ .enfase[SPOCK] appears as a good .enfase[solution] for the Spatial Confouding problem
    + The correction seems to be coherent
+ We showed how to fit the Shared Component model using INLA
+ Using the SEER database we showed the potential problems of the Spatial Confounding problem

---

## <i class="fas  fa-book-open "></i> Main references

<p style="text-align:justify">
.enfase[Prates, M. O., Assunção, R. M., & Rodrigues, E. C. (2018)]. **Alleviating Spatial Confounding for Areal Data Problems by Displacing the Geographical Centroids**. _Bayesian Analysis_.

.enfase[Knorr‐Held, L., & Best, N. G. (2001)]. **A shared component model for detecting joint and selective clustering of two diseases**. _Journal of the Royal Statistical Society: Series A (Statistics in Society), 164(1), 73-85_.

.enfase[Rue, H., Martino, S., & Chopin, N. (2009)]. **Approximate Bayesian inference for latent Gaussian models by using integrated nested Laplace approximations**. _Journal of the royal statistical society: Series b (statistical methodology), 71(2), 319-392_.

.enfase[Besag, J. (1974)]. **Spatial interaction and the statistical analysis of lattice systems**. _Journal of the Royal Statistical Society: Series B (Methodological), 36(2), 192-225_.

.enfase[Reich, B. J., Hodges, J. S., & Zadnik, V. (2006)]. **Effects of residual smoothing on the posterior of the fixed effects in disease‐mapping models**. _Biometrics, 62(4), 1197-1206_.
</span>

---

<h1 style="color:#378fbb;">
  THANK YOU!
</h1>

.mathbox[
  <img src="img/richmond/richmond1.jpg" alt="graph" height="210"/>
  <img src="img/richmond/richmond2.jpg" alt="graph" height="210"/>
  <img src="img/richmond/richmond3.jpg" alt="graph" height="210"/>
]

<br>
<h1 style="color:#378fbb;">
  <i class="fas  fa-question "></i> QUESTIONS
</h1>

<h5>
  <span style="color:#ff6868;"><i class="fas  fa-globe "></i></span> <span style="color:#00000;">require-r.com</span>
  <span style="color:#ff6868;"><i class="fab  fa-linkedin "></i></span> <span style="color:#00000;">douglas-mesquita</span>
  <span style="color:#ff6868;"><i class="fab  fa-github "></i></span> <span style="color:#00000;">DouglasMesquita</span>
  <span style="color:#ff6868;"><i class="fab  fa-stack-overflow "></i></span> <span style="color:#00000;">douglas-mesquita</span>
  <span style="color:#ff6868;"><i class="fab  fa-twitter "></i></span> <span style="color:#00000;">DouglasMesqita</span>
  <br>
  <span style="color:#ff6868;"><i class="fab  fa-instagram "></i></span> <span style="color:#00000;">douglasmesqita</span>
  <span style="color:#ff6868;"><i class="fab  fa-facebook "></i></span> <span style="color:#00000;">douglas.mesquita.94</span>
</h5>