layout: true <div class="my-footer"><span> douglasrm.azevedo@gmail.com </span></div> --- class: title-page <div> <h3 style="text-align: justify; padding-top:50px">Assessing spatial confounding in Bayesian shared component disease mapping models via SPOCK</h3> <h4 style="text-align: right;">with applications to SEER cancer data</h4> <div> <div style="padding-top:130px; color:#858585"> <div style="float:left"> <h5> Douglas R. Mesquita Azevedo </br> Dipankar Bandyopadhyay </br> Marcos Oliveira Prates </h5> </div> <div style="float:right"> <img src="img/logo_ufmg.png" alt="ufmg" height="100"/> <img src="img/logo_vcu.png" alt="vcu" height="100"/> </div> </div> <div style="align:center; padding-top:135px; color:#858585"> <h5 style="text-align:center"> April 2019 <br> Richmond, USA </h5> </div> --- class: toc-page <link rel="stylesheet" href="https://maxcdn.bootstrapcdn.com/font-awesome/4.5.0/css/font-awesome.min.css"> ##
Summary + Motivation + Disease mapping + ICAR + Shared component model + Spatial confounding + General idea + SPOCK + INLA + INLA method + Shared Component model in INLA + Simulation study + SEER data analisys --- class: slide-page <link rel="stylesheet" href="https://maxcdn.bootstrapcdn.com/font-awesome/4.5.0/css/font-awesome.min.css"> ##
Motivation Public health has a real and lasting positive effect on people <center><img src="img/polio_map.png" alt="graph" height="400"/></center> <h5 class="source">Source: <a href="http://graphics.wsj.com/infectious-diseases-and-vaccines/">http://graphics.wsj.com/infectious-diseases-and-vaccines/</a></h5> --- class: slide-page ##
Motivation Since we have spatial structure why not exploit it? .pull-left[ <center> <img src="img/john_snow.jpg" alt="graph" height="250"/> <h4 style="margin: 5px 0 0 0;">John Snow - 1854</h4> <h5 style="margin: 0 0 0 0;">Cholera cases in London</h5> </center> ] .pull-right[ <center><img src="img/cholera_map.png" alt="graph" height="400"/></center> ] <h5 class="source">Source: <a href="https://www.arcgis.com/apps/PublicInformation/index.html?appid=d7deb67f810d46dfacb80ff80ac224e9">https://www.arcgis.com/apps/PublicInformation/index.html?appid=d7deb67f810d46dfacb80ff80ac224e9</a></h5> --- class: slide-page ##
Motivation We are always interested in the effect of covariates. However, why not include spatial structure to model unobserved covariates? .mathbox[ `$$\small{g(\mu_{\text{income}}) = \beta_0 + \beta_1\times \text{Yers of study} + \beta_2\text{Experience} + \ldots + \text{County}}$$` ] How to deal with .enfase[County] information? + Capital vs Country town + Urban vs Rural area + `\(\beta_{31}\text{County}_1 + \beta_{32}\text{County}_2 + \ldots + \beta_{3q}\text{County}_q\)` + `\(\text{County} \sim \text{Spatial model}(\theta), ~ \dim(\theta) < q\)` + Fewer parameters and <span class="enfase">more general</enfase> + <span class="enfase">Spatial structure</span> is considered --- class: slide-page ##
Motivation Lung cancer mortality (2010 to 2014) <center><img src="img/gis_lung_mortality_usa.png" alt="graph" height="400"/></center> .pull-right[ <h5 class="source">Source: <a href="https://gis.cancer.gov/mapstory/lung/">https://gis.cancer.gov/mapstory/lung/</a></h5> ] --- class: slide-page ##
Motivation What if an important variable and the spatial effect (non observed variable) bring similar information? <center><h3 style="margin:0 0 0 0;">.enfase[Spatial Confounding]</h3></center> <div style=""max-width:75%> .pull-left[ <center> <h5>Lung cancer mortality (2010 to 2014)</h5> <img src="img/gis_lung_mortality_usa.png" alt="graph" height="250"/> </center> ] .pull-right[ <center> <h5>Current smoking (2015)</h5> <img src="img/gis_smoking_usa.png" alt="graph" height="250"/> </center> ] </div> <h5 class="source">Source: <a href="https://gis.cancer.gov/mapstory/lung/">https://gis.cancer.gov/mapstory/lung/</a></h5> --- class: slide-page ##
Motivation What if an important variable and the spatial effect (non observed variable) bring similar information? <center><h3 style="margin:0 0 0 0;">.enfase[Spatial Confounding]</h3></center> .pull-left[ <center> <h5>Votes to PT 2014</h5> <img src="img/voronoys_pt.png" alt="graph" height="250"/> </center> ] .pull-right[ <center> <h5>Income</h5> <img src="img/voronoys_income.png" alt="graph" height="250"/> </center> ] <h5 class="source">Source: <a href="https://voronoys.shinyapps.io/voronoys/">https://voronoys.shinyapps.io/voronoys/</a></h5> --- class: slide-page ##
Motivation Sometimes we have more than one outcome and the joint behavior is desired <center><h3 style="margin:0 0 0 0;">.enfase[Multivariate Models]</h3></center> .pull-left[ <center> <h5>Lung cancer mortality (2010 to 2014)</h5> <img src="img/gis_lung_mortality_usa.png" alt="graph" height="275"/> </center> ] .pull-right[ <center> <h5>Breast cancer mortality (2010 to 2014)</h5> <img src="img/gis_breast_mortality_usa.png" alt="graph" height="275"/> </center> ] <h5 class="source">Source: <a href="https://gis.cancer.gov/mapstory/lung/">https://gis.cancer.gov/mapstory/lung/</a>; <a href="https://gis.cancer.gov/mapstory/lung/">https://gis.cancer.gov/mapstory/breast/</a></h5> --- class: slide-page ##
Motivation <br><br> What is our interest? + <span style="text-decoration:line-through">Understand the direct realtionship of two outcomes (.enfase[MCAR])</span> + Specific spatial pattern of each outcome + Dependence level of outcomes + Understand the common spatial pattern of different outcomes (.enfase[Shared Component model]) + <span class="enfase">Specific spatial pattern</span> of each outcome + <span class="enfase">Overall spatial pattern</span> --- class: section-page <link rel="stylesheet" href="https://maxcdn.bootstrapcdn.com/font-awesome/4.5.0/css/font-awesome.min.css"> <div class="my-section"> <h2>
Disease mapping </h2> </div> --- class: slide-page ##
Poisson model <br> + Most common model used for univariate outcomes .mathbox[ `$$Y_i \sim \text{Poisson}(E_i\theta_i) ~ \forall ~ i \in \{1, \ldots, n\}$$` `$$\log(\theta_i) = \beta_0 + \beta_1X_{1i} + \ldots + \beta_pX_{pi}$$` ] + `\(E_i = P_i\times r\)` is the .enfase[expected] number of new cases of the disease `\(Y\)` in region `\(i\)`. + `\(P_i\)` is the .enfase[population] in region `\(i\)`. + `\(r = \frac{\sum_iY_i}{\sum_iP_i}\)` is the .enfase[overall incidence rate]. + `\(\theta_i\)` is the .enfase[incidence rate] in region `\(i\)`. --- class: slide-page ##
ICAR Model + .enfase[I]ntrinsic .enfase[C]onditional .enfase[A]uto.enfase[r]egressive model (<a href="https://rss.onlinelibrary.wiley.com/doi/abs/10.1111/j.2517-6161.1974.tb00999.x">Besag, J. (1974)</a>) .mathbox[ `$$\small{\psi_i|\psi_{-i} \sim \text{Normal}\Bigg(\sum_{j \sim i}b_{ij}\psi_j, \sigma^2_{i}\Bigg) ~ \forall ~ i \in \{1, \ldots, n\}}$$` ] + `\(j \sim i\)` means that the region `\(j\)` is .enfase[neighbor] of region `\(i\)`. + `\(b_{ij}\)` is a .enfase[weight] that measure the importance of region `\(j\)` in the conditional mean of `\(\psi_i\)`. + `\(\sigma^2_{i}\)` is the .enfase[variance] parameter in the region `\(i\)`. .mathbox[ `$$\small{\text{Brook's lemma} \implies \pi(\psi) \propto \exp\Bigg\{-\frac{1}{2}\psi'\mathbf{T}^{-1}(\mathbf{I_n}-\mathbf{B})\psi\Bigg\}}$$` ] --- class: slide-page ##
ICAR Model + It looks like a `\(\text{Normal}_n(\mathbb{0}, \mathbf{T}^{-1}(\mathbf{I_n}-\mathbf{B}))\)` + It means that + `\(\mathbf{T}^{-1}(\mathbf{I_n}-\mathbf{B})\)` (precision matrix) must be symetric + `\(\mathbf{T}^{-1}(\mathbf{I_n}-\mathbf{B})\)` (precision matrix) must be positive definite + ICAR model: + `\(\displaystyle \small{\psi_i|\psi_{-i} \sim \text{Normal}\Bigg(\sum_{j \sim i}\frac{\psi_j}{w_{i+}}, \frac{\sigma^2}{w_{i+}}\Bigg) ~ \forall ~ i \in \{1, \ldots, n\}}\)` <center><img src="img/adjacency_matrix.png" alt="graph" height="175"/></center> --- class: slide-page ##
ICAR Model <br> + In this case: `\(\mathbf{T}^{-1}(\mathbf{I_n}-\mathbf{B}) = \tau(\mathbf{D} - W)\)`; `\(\tau = \frac{1}{\sigma^{2}}\)` where `\(D = \text{diag}(w_{1+}, \ldots, w_{n+})\)` and `\(W\)` is the adjacency matrix defined before + It means that + `\(\tau(\mathbf{D} - W)\)` is symetric + `\(\tau(\mathbf{D} - W)\)` is not positive definite + In general it is possible to show that under a bayesian framework the .enfase[posterior distribution is proper] + It is widely .enfase[used as prior] for latent effects in spatial models --- class: slide-page ##
Disease mapping + We can now add a latent effect in the Poisson model and assing an ICAR structure to it .mathbox[ `$$Y_i \sim \text{Poisson}(E_i\theta_i) ~ \forall ~ i \in \{1, \ldots, n\}$$` `$$\log(\theta_i) = \beta_0 + \beta_1X_{1i} + \ldots + \beta_pX_{pi} + \psi_i$$` `$$\psi \sim ICAR(\tau_{\psi})$$` ] + Our main focus is still on the .enfase[interpretation of covariates], but now we also want to check whether we have spatial patterns or not + It is possible to guess which <span class="enfase">variable is missing</span> + It is possible to create polices in the <span class="enfase">hotspot places</span> --- class: slide-page ##
Shared Component model + Shared Component Model (<a href="https://rss.onlinelibrary.wiley.com/doi/abs/10.1111/1467-985X.00187">Knorr‐Held, L., & Best, N. G. (2001)</a>) + Sometimes we have interest in .enfase[more than one outcome] + Univariate models may .enfase[not] be .enfase[realistics] .mathbox[ `$$Y_{id} \sim \text{Poisson}(E_{id}\theta_{id}) ~ \forall ~ i \in \{1, \ldots, n\}, ~ d \in \{1, 2\}$$` `$$\log(\theta_{i1}) = X_{i1}\beta_1 + \delta\psi_i + \phi_{i1}$$` `$$\log(\theta_{i2}) = X_{i2}\beta_2 + \frac{1}{\delta}\psi_i + \phi_{i2}$$` `$$\psi_i \sim ICAR(\tau_{\psi}); ~ \phi_1 \sim ICAR(\tau_{\phi_1}); ~ \phi_2 \sim ICAR(\tau_{\phi_2})$$` ] where `\(\delta\)` is a scale parameter to allow <span class="enfase">different levels of dependence</span> on the shared component --- class: section-page <link rel="stylesheet" href="https://maxcdn.bootstrapcdn.com/font-awesome/4.5.0/css/font-awesome.min.css"> <div class="my-section"> <h2>
Spatial Confounding </h2> </div> --- class: slide-page ##
Multicollinearity + When two or more fixed effects bring the same information to model there is an .enfase[inflation on the coefficients variance] + A reasonable way to check it is using the .enfase[VIF] .mathbox[ `$$\displaystyle \widehat{var}({\hat{\beta }}_{j})={\frac{s^{2}}{(n-1)\widehat {{\rm {var}}}(X_{j})}}\cdot{\frac {1}{1-R_{j}^{2}}},$$` `$$\displaystyle \mathrm {VIF_{j}} ={\frac {1}{1-R_{j}^{2}}}, ~~~~~~~~j=0,1,...,p,$$` ] where, + `\(R^2_{j}\)`: Coefficient of determination of `\(X_j \sim \gamma X_{-j}\)` + `\(s^2\)`: Is the residual variance --- class: slide-page ##
Spatial Confounding + It happens when .enfase[fixed and latent effects] bring the .enfase[same information] to the model + Let's consider a .enfase[Gaussian spatial regression] .mathbox[ `$$\displaystyle y|\beta, b, \tau_{\epsilon} \sim N(X\beta + Zb, \tau_{\epsilon}I_n),$$` `$$Zb|\tau_s, \sim N(0, \tau_{\psi}Q) \implies b|\tau_{\psi}, \sim N(0, \tau_sD),$$` ] where + `\(b = Z'\psi \implies Zb = \psi\)` + `\(\psi \sim ICAR(\tau_{\psi})\)` + `\(Q = ZDZ'\)` is the spectral decomposition of `\(Q\)` --- class: slide-page ##
Spatial Confounding + Reich et al. (2006) showed that .mathbox[ `$$\small{E(\beta|\tau_{\epsilon}, \tau_{\psi}, y) = \displaystyle (X'X)^{-1}X'(y - Z\hat{b}) = \beta_{ols} - (X'X)^{-1}Z\hat{b}}$$` `$$\small{Var^{-1}(\beta|\tau_{\epsilon}, \tau_{\psi}, y) = \displaystyle \tau_{\epsilon}(X'X) - X' Var(Zb|\beta, \tau_{\epsilon}, \tau_{\psi}, y)X =\tau_{\epsilon}X'Z\tilde{D}Z'X,}$$` ] where + `\(\hat{b} = E(b|\tau_{\epsilon}, \tau_{\psi}, y) = (Z'P^cZ + rD)^{-1}\)` + `\(P^c = I - X(X'X)^{-1}X'\)` + `\(r = \frac{\tau_{\psi}}{\tau_{\epsilon}}\)` + `\(\tilde{D} = I - (I+rD)^{-1}\)` --- class: slide-page ##
Restricted Spatial Regression + Original model .mathbox[ `$$\displaystyle y|\beta, b, \tau_{\epsilon} \sim N(X\beta + Zb, \tau_{\epsilon}I_n),$$` ] + Restricted model .mathbox[ `$$\displaystyle y|\beta, b, \tau_{\epsilon} \sim N(X\beta + P^cZb, \tau_{\epsilon}I_n),$$` ] where + `\(P = X(X'X)^{-1}X'\)` is the .enfase[projection matrix] of `\(X\)` + `\(PZb + P^cZb = PZb + (I - P)Zb = Zb\)` --- class: slide-page ##
Projection matrix + Linear regression model .mathbox[ + `\(\hat{\beta}_{ols} = (X'X)^{-1}X'y,\)` + `\(\hat{y} = X\hat{\beta} = X(X'X)^{-1}X'y = Py,\)` + `\(y - \hat{y} = \epsilon = y - Py = (I-P)y = P^cy,\)` + `\(y = X\beta + \epsilon = Py + P^cy\)` ] + `\(X\hat{\beta}_{ols}\)` is the .enfase[projection] of `\(y\)` in the space of `\(X\)` and it is our best guess about `\(y\)` given our set of covariates `\(X\)` + `\(\epsilon\)` is the information about `\(y\)` that is not in `\(X\)`. In other words it is the projection of `\(y\)` in the orthogonal space of `\(X\)` --- class: slide-page ##
Comments <br> + It is .enfase[not possible] (or really difficult) to calculate the expectation value and variance under .enfase[non-Gaussian] models + At least this approach seems to works fine even to GLMM models + It is a .enfase[constraint] `\(\implies\)` .enfase[computational limitations] + The .enfase[variance is underestimated] + For more than one spatial effect + The <span class="enfase">computation</span> becomes more <span class="enfase">difficult</span> + It is <span class="enfase">not possible to use standard softwares</span> (which means that we need to code) + More than one sources of Spatial Confounding `\(P\epsilon_1, \ldots, P\epsilon_q\)` --- class: slide-page ##
SPOCK + .enfase[SP]atial .enfase[O]rthogonal .enfase[C]entroid .enfase["K"]orrection (<a href="https://projecteuclid.org/euclid.ba/1537258137">Prates, M. et. al (2018)</a>) <center> <img src="img/SPOCK.jpg" alt="graph" height="125px"/> </center> + Alleviating Spatial Confounding for areal data problems by displacing the geographical centroids <center> <img src="img/spock_ilustration.png" alt="graph" height="150px"/> </center> --- class: slide-page ##
SPOCK <br> + Lets call `\(W\)` the adjacency matrix for the original set of centroids. + Let be `\(c_i = \{c_{1i}, c_{2i}\} ~ \forall i \in [1, ..., n]\)` <br> .pull-left[ <center> <img src="img/centroids.png" alt="graph" height="200px"/> </center> ] .pull-right[ `\(W = \begin{bmatrix} 0 & 0 & 1 & 0 & 0 \\ 0 & 0 & 1 & 1 & 0 \\ 1 & 1 & 0 & 1 & 1 \\ 0 & 1 & 1 & 0 & 1 \\ 0 & 0 & 1 & 1 & 0 \end{bmatrix}\)` ] --- class: slide-page ##
SPOCK + We want a .enfase[new set of centroids] `\(c^{\ast} = P^c\times[c_1, c_2]\)` + We want to find `\(W^{\ast}\)`: an adjancency matrix .enfase["free" of spatial confounding] <center> <img src="img/centroid_projection.png" alt="graph" height="150px"/> </center> To create `\(W^{\ast}\)` we can use, for example, the .enfase[k-nearest neighbors algorithm] <center> <img src="img/knn.gif" alt="graph" height="125px"/> </center> --- class: section-page <link rel="stylesheet" href="https://maxcdn.bootstrapcdn.com/font-awesome/4.5.0/css/font-awesome.min.css"> <div class="my-section"> <h2>
INLA </h2> </div> --- class: slide-page ##
INLA + .enfase[I]ntegrated .enfase[N]ested .enfase[L]aplace .enfase[A]pproximation (<a href="https://rss.onlinelibrary.wiley.com/doi/full/10.1111/j.1467-9868.2008.00700.x">Rue, H., Martino, S., & Chopin, N. (2009)</a>) <center> <img src="img/inla_paper.png" alt="graph" height="400px"/> </center> --- class: slide-page ##
Bayesian statistics + We want to obtain the _a posteriori_ distribution `$$\pi(\theta|y) = \frac{\pi(y|\theta)\pi(\theta)}{\int\pi(y|\theta)\pi(\theta)d\theta} \propto \pi(y|\theta)\pi(\theta)$$` + For .enfase[simple models] it is possible to identify the distribution of `\(\theta|y\)` + In real problems the <span class="enfase">dimension of `\(\theta = \{\theta_1, \ldots, \theta_q\}\)` is big</span> + To have the _a posteriori_ .enfase[marginal distribution]: `$$\pi(\theta_i|y) = \int_{\theta_{-i}}\pi(\theta|y)d\theta_{-i}$$` + It is .enfase[not easy] to calculate these integrals + Instead, in general, we use the .enfase[full conditional distributions] and a MCMC method to have a sample from these distributions --- class: slide-page ##
MCMC methods + In general .enfase[MCMC] methods are .enfase[computationally intensive] + General public don't have a "free" access to this metodology + It is necessary to code + Few tools to use it + Lots of tunning parameters + It takes time and people are impatient <center> <img src="img/mcmc_mixing.png" alt="graph" height="200px"/> </center> <h5 class="source">Source: <a href="https://support.sas.com/documentation/cdl/en/statug/63962/HTML/default/viewer.htm#statug_mcmc_sect055.htm">https://support.sas.com/documentation/cdl/en/statug/63962/HTML/default/viewer.htm#statug_mcmc_sect055.htm</a></h5> --- class: slide-page ##
INLA + .enfase[INLA] is a .enfase[fast] way to approximate the desired marginal distributions + It is possible to use INLA if the model fits in the next equation `$$g(\mu_i) = \eta_i = \alpha + \sum_{j = 1}^{n_f}f^{(j)}(u_{ji}) + \sum_{k = 1}^{n_{\beta}}\beta_kz_{ki} + \epsilon_i,$$` + `\(g(.)\)` is a .enfase[link function] + `\(\alpha\)` is the .enfase[intercept] and `\(\beta_k\)` is a set of .enfase[coefficients] associated to fixed effects `\(z_{ki}\)` + `\(f^{(j)}(u_{ji})\)` are unknown .enfase[functions] of the covariates `\(u_{ij}\)` + `\(\epsilon_i\)` are .enfase[unstructured] terms. INLA assumes Gaussian priors to the vector `\(\{\alpha, f^{(j)}(.), \beta, \epsilon \}\)` given raise to a .enfase[Gaussian Markov Random Field]. --- class: slide-page ##
R-INLA example + Spatial Poisson model + `\(g(.) = \log(.)\)` + `\(\alpha\)` is the intercept and `\(\beta_k\)` is a set of coefficients + `\(f^{(1)}(u_{1i})\)` is an ICAR( `\(\tau_u\)` ) model for the areas `\(u_{1i}\)` <br> ```r library(INLA) f_inla <- y ~ X1 + X2 + f(region_inla, model = 'besag', graph = W) mod_inla <- inla(data = data_inla, formula = f_inla, family = "poisson") # zeroinflatedpoisson1 ``` --- class: slide-page ##
R-INLA example + Spatio-temporal Gaussian model + `\(g(.) = ~ .\)` + `\(\alpha\)` is the intercept and `\(\beta_k\)` is a set of coefficients + `\(f^{(1)}(u_{1i})\)` is an ICAR( `\(\tau_u\)` ) model for the areas `\(u_{i1}\)` + `\(f^{(2)}(u_{2i})\)` is an AR(p) model for the times `\(u_{2i}\)` ```r library(INLA) f_inla <- y ~ X1 + X2 + f(region_inla, model = 'besag', graph = W) + f(times_inla, model = 'ar', order = p) mod_inla <- inla(data = data_inla, formula = f_inla, family = "gaussian") ``` --- class: slide-page ##
How does it work? <br> + We assume that .enfase[all elements] in vector `\(x\)` are .enfase[Gaussian] `$$x = [\alpha, \beta, f^{(1)}(\cdot), \ldots, f^{(n_j)}(\cdot)] \sim \mathcal{N}(0, \mathbf{\Sigma})$$` + `\(\mathbf{\Sigma}^{-1} = Q\)` is the precision matrix + `\(Q\)` is .enfase[sparse] `\(\implies\)` it is easier to find the .enfase[inverse matrix] by factorization + `\(x\)` may depend on `\(\theta\)`: parameters that are .enfase[not normally distributed] + Variances + Correlation + In general <span class="enfase"> `\(\text{dim}(x) \gg \text{dim}(\theta)\)` </span> + We can write `\(\pi(x, \theta|y) = \pi(x|\theta, y)\pi(\theta| y)\)` --- class: slide-page ##
Laplace approximation + Taylor series expansion around the mode `\(\hat{x}\)` .mathbox[ `$$\log f(x) \approx \log f(\hat{x}) + \frac{\partial \log f(\hat{x})} {\partial x} (x-\hat{x}) + \frac{\partial^2 \log f(\hat{x})} {2 \partial x^2} (x-\hat{x})^2$$` ] + The second term is zero when `\(x = \hat{x}\)` (maximum point) + Lets call `\(\sigma = \frac{\partial^2 \log f(\hat{x})} {\partial x^2}\)` .mathbox[ $$\log f(x) \approx \log f(\hat{x}) - \frac{1}{2\sigma^2} (x-\hat{x})^2 $$ ] --- class: slide-page ##
Laplace approximation + Taking the exponential in both sides .mathbox[ <!-- `$$f(x) \approx f(\hat{x})\times \exp\Bigg\{-\frac{1}{2\sigma^2} (x-\hat{x})^2\Bigg\} = c\times \exp\Bigg\{-\frac{1}{2\sigma^2} (x-\hat{x})^2\Bigg\}$$` --> `$$f(x)dx = \exp\{\log f(x) dx\} \approx constant\times\exp\Bigg\{-\frac{(x-\hat{x})^2}{2\hat{\sigma}^2}\Bigg\}dx$$` ] + Example for a `\(\chi^2_k\)` .mathbox[ `$$f(x;\,k) = \frac{x^{(k/2-1)} e^{-x/2}}{2^{k/2} \Gamma\left(\frac{k}{2}\right)}, ~~ x \geq 0$$` ] --- class: slide-page ##
Laplace approximation .mathbox[ + `\(\log f(x) = (k/2 - 1) \log x - x/2 + c\)` + `\(\log f'(x) = (k/2 -1)/x - 1/2 = 0\)` + `\(\log f''(x) = -(k/2 - 1)/x^2\)` + `\(\implies \chi^2_k \overset{LA}{\sim} N(\hat{x} = k-2, ~\hat{\sigma}^2 = 2(k-2))\)` ] <center style="padding-top:20px"> <img src="img/laplace_approximation.png" alt="graph" height="225px"/> </center> --- class: slide-page ##
INLA <br> We want the posterior marginal distributions <br> .mathbox[ `$$\pi(x_j|y) = \int \pi(x_j, \theta|y)d\theta = \int \pi(x_j|\theta, y)\pi(\theta|y)d\theta$$` `$$\pi(\theta_k|y) = \int \pi(\theta|y) d\theta_{-k}$$` ] --- class: slide-page ##
INLA Approximation for `\(\pi(\theta_k|y)\)` .mathbox[ `$$\pi(\theta|y) \approx \frac{\pi(y|x,\theta) \pi(x|\theta)\pi(\theta)}{\tilde{\pi}(\mathbf{x|\theta,y})} \bigg|_{x={\hat{x}}(\theta)} = \tilde{\pi}(\theta|y)$$` ] + `\(\tilde{\pi}(\mathbf{x|\theta,y})\)` is the .enfase[Laplace approximation] of `\(\pi(\mathbf{x|\theta,y})\)` + `\(\hat{x}(\theta)\)` Is the mode of `\(x\)` given values of `\(\theta\)` <center> <img src="img/inla_theta.png" alt="graph" height="150px"/> </center> <h5 class="source">Source: <a href="https://rss.onlinelibrary.wiley.com/doi/epdf/10.1111/j.1467-9868.2008.00700.x">https://rss.onlinelibrary.wiley.com/doi/epdf/10.1111/j.1467-9868.2008.00700.x</a></h5> --- class: slide-page ##
INLA Approximation for `\(\pi(x_j|y)\)` .mathbox[ `$$\tilde{\pi}(x_j|y) \approx \sum_{h=1}^H \tilde{\pi}(x_j | \theta^*_h, y) \tilde{\pi}(\theta^*_h|y) \Delta_h$$` ] + `\(\tilde{\pi}(x_j|y)\)` is the numerical integration of `\(\tilde{\pi}(x_j, \theta| y)\)` given a grid of `\(\theta\)` + `\(\tilde{\pi}(x_j | \theta^*_h, y)\)` can be + <span class="enfase">Gaussian approximation</span>: just take the desired dimension of `\(\small{\tilde{\pi}(x | \theta, y)}\)` + <span class="enfase">Laplace approximation</span>: use the Laplace approximation again `\(\small{\pi(x_j | \theta, y) \propto \frac{\pi(x,\theta |y)}{\tilde{\pi}(x_{-j} | x_j, \theta, y)}}\)` + <span class="enfase">Simplified Laplace approximation</span>: a computational alternative --- class: slide-page ##
INLA <br><br><br><br> + .enfase[Integrated]: Numerical integration to obtain the marginals + .enfase[Nested]: We need `\(\pi(\theta|y)\)` to get `\(\pi(x_j|y)\)` and we use the Laplace approximation in both of them + .enfase[Laplace approximation]: We used all the time to represent unknow distributions of `\(x\)` and `\(x_j\)` --- class: slide-page ##
INLA and Shared Component + Shared Component model .mathbox[ `$$\scriptsize{Y_{id} \sim \text{Poisson}(E_{id}\theta_{id}) ~ \forall ~ i \in \{1, \ldots, n\}, ~ d \in \{1, 2\}}$$` `$$\scriptsize{\log(\theta_{i1}) = X_{i1}\beta_1 + \delta\psi_i + \phi_{i1}}$$` `$$\scriptsize{\log(\theta_{i2}) = X_{i2}\beta_2 + \frac{1}{\delta}\psi_i + \phi_{i2}}$$` `$$\scriptsize{\psi_i \sim ICAR(\tau_{\psi}); ~ \phi_1 \sim ICAR(\tau_{\phi_1}); ~ \phi_2 \sim ICAR(\tau_{\phi_2})}$$` ] + INLA .mathbox[ `$$\scriptsize{g(\mu_i) = \eta_i = \alpha + \sum_{j = 1}^{n_f}f^{(j)}(u_{ji}) + \sum_{k = 1}^{n_{\beta}}\beta_kz_{ki} + \epsilon_i,}$$` ] --- class: slide-page ##
R-INLA: _copy_ feature + We can include weights for the latent effects. However they are fixed. .mathbox[ `$$g(\mu_i) = \eta_i = \alpha + \sum_{j = 1}^{n_f}{\color{red} {w_j}}f^{(j)}(u_{ji}) + \sum_{k = 1}^{n_{\beta}}\beta_kz_{ki} + \epsilon_i,$$` ] + .enfase[_copy_] allows the use of the .enfase[same element more than once] ```r n <- 1000 i <- j <- 1:n z <- rnorm(n); w <- runif(n) y <- z + 2*z*w + rnorm(n) f_inla <- y ~ f(i, model = "iid", fixed = TRUE) + # Var(z) = 1 f(j, w, copy = "i", fixed = FALSE) # w = weights mod_inla <- inla(f_inla, data = data.frame(i, j, w, y)) ``` --- class: slide-page ##
R-INLA: _copy_ feature + Shared component model: .mathbox[ `$$\scriptsize{\log(\theta_{i1}) = X_{i1}\beta_1 + {\color{red} {\delta}}\psi_i + \phi_{i1}}$$` `$$\scriptsize{\log(\theta_{i2}) = X_{i2}\beta_2 + {\color{red} {\frac{1}{\delta}}}\psi_i + \phi_{i2}}$$` ] + INLA: .mathbox[ `$$\scriptsize{\log(\theta_{i1}) = X_{i1}\beta_1 + {\color{red} {\gamma}}\psi^{\ast}_i + \phi_{i1}}$$` `$$\scriptsize{\log(\theta_{i2}) = X_{i2}\beta_2 + \psi^{\ast}_i + \phi_{i2}}$$` ] --- class: slide-page ##
R-INLA: _copy_ feature Equalizing (Vargas, F. R. (2013)): .mathbox[ `$${\color{red} {\delta}}\psi_i = {\color{red} {\gamma}}\psi^{\ast}_i,$$` `$${\color{red} {\frac{1}{\delta}}}\psi_i = \psi^{\ast}_i,$$` ] then .mathbox[ `$$\delta = \sqrt{\gamma}$$` ] We can do it _a posteriori_ using the .enfase[_inla.tmarginal_] function --- class: section-page <link rel="stylesheet" href="https://maxcdn.bootstrapcdn.com/font-awesome/4.5.0/css/font-awesome.min.css"> <div class="my-section"> <h2>
Simulation study </h2> </div> --- class: slide-page ##
Simulation study + Three scenarios: `\(\delta = \{1, 1.5, 1.75\}\)` + California shapefile was used (58 counties) + Data generation + Without spatial confounding + With spatial confounding ( `\(X_2\)` = latitude) + Fit methods + Shared component model + Shared component model + SPOCK + For each combination .enfase[1000 datasets] `\(\implies\)` 12.000 fitted models + Total time `\(\approx\)` .enfase[5 hours] --- class: slide-page ##
Simulation study - posterior means <center><img src="img/boxplot_mean_simulation.png" alt="graph" height="450"/></center> --- class: slide-page ##
Simulation study - posterior SD <center><img src="img/boxplot_sd_simulation.png" alt="graph" height="450"/></center> --- class: slide-page ##
Simulation study - Spatial effects <center><img src="img/simulation_effects.png" alt="graph" height="430"/></center> --- class: section-page <link rel="stylesheet" href="https://maxcdn.bootstrapcdn.com/font-awesome/4.5.0/css/font-awesome.min.css"> <div class="my-section"> <h2>
SEER </h2> </div> --- class: slide-page ##
SEER <br> + .enfase[S]urveillance, .enfase[E]pidemiology, and .enfase[E]nd .enfase[R]esults Program + Data on cancer cases from several locations in the USA + Comparing men and women + lung and bronchus cancer + Comparing two cancers + Stomach and esophagus cancer + Lip, oral cavity and pharynx cancer --- class: slide-page ##
SEER: Models .enfase[For all models we have:] `$$\small{Y_{id} \sim P(E_{id}\theta_{id})}$$` <hr> M1 - .enfase[Univariate] non-spatial model: .mathbox[ `$$\small{\log(\theta_{id}) = \alpha_d + X_{id}\beta},$$` ] M2 - .enfase[Univariate] spatial model: .mathbox[ `$$\small{\log(\theta_{id}) = \alpha_d + X_{id}\beta + \psi_{id}},$$` `$$\small{\psi_1 \sim ICAR(\tau_{\psi_1}); ~~ \psi_2 \sim ICAR(\tau_{\psi_2})},$$` ] --- class: slide-page ##
SEER: Models M3 - .enfase[Shared Component] model without specific spatial term: .mathbox[ `$$\small{\log(\theta_{i1}) = \alpha_1 + X_{i1}\beta + \delta\psi_{i}; ~~ \log(\theta_{i2}) = \alpha_2 + X_{i2}\beta + \frac{1}{\delta}\psi_{i}},$$` `$$\small{\psi \sim ICAR(\tau_{\psi})},$$` ] M4 - .enfase[Shared Component] model with specific spatial term: .mathbox[ `$$\small{\log(\theta_{i1}) = \alpha_1 + X_{i1}\beta + \delta\psi_{i} + \phi_{1i}; ~~ \log(\theta_{i2}) = \alpha_2 + X_{i2}\beta + \frac{1}{\delta}\psi_{i} + \phi_{2i}},$$` `$$\small{\psi \sim ICAR(\tau_{\psi}); ~~ \phi_1 \sim ICAR(\tau_{\phi_1}); ~~ \phi \sim ICAR(\tau_{\phi_2})}$$` ] --- class: slide-page ##
SEER: Covariates <br><br><br> <center><img src="img/covariates.png" alt="graph" height="200"/></center> --- class: slide-page ##
SEER: Results .enfase[Lung and bronchus cancer]: Men vs Women <br> <center><img src="img/tab1.png" alt="graph" height="300"/></center> --- class: slide-page ##
SEER: Results .enfase[Lung and bronchus cancer]: Men vs Women <center><img src="img/effects_ap1.png" alt="graph" height="400"/></center> --- class: slide-page ##
SEER: Results .enfase[Lung and bronchus cancer]: Men vs Women <center><img src="img/effect_indiv1.png" alt="graph" height="400"/></center> --- class: slide-page ##
SEER: Results .enfase[Stomach and esophagus] cancer vs .enfase[Lip, oral cavity and pharynx] cancer <br> <center><img src="img/tab2.png" alt="graph" height="250"/></center> --- class: slide-page ##
SEER: Results .enfase[Stomach and esophagus] cancer vs .enfase[Lip, oral cavity and pharynx] cancer <center><img src="img/effects_ap2.png" alt="graph" height="400"/></center> --- class: slide-page ##
SEER: Results .enfase[Stomach and esophagus] cancer vs .enfase[Lip, oral cavity and pharynx] cancer <center><img src="img/effect_indiv2.png" alt="graph" height="400"/></center> --- class: slide-page ##
INLA codes <br> ```r ## inla_list is a list with: ## alpha1, alpha2, X, ## phi, phi_beta, psi1, psi2 ## E is the expected number of new cases f_inla <- Y ~ -1 + alpha1 + alpha2 + X + f(phi, model = "besag", graph = W) + f(phi_beta, copy = "phi", hyper = list(theta = list(fixed = FALSE)), range = c(0, Inf)) + f(psi1, model = "besag", graph = W) + f(psi2, model = "besag", graph = W) inla_mod <- inla(formula = f_inla, data = inla_list, family = c("poisson", "poisson"), E = E) ``` --- class: slide-page <link rel="stylesheet" href="https://maxcdn.bootstrapcdn.com/font-awesome/4.5.0/css/font-awesome.min.css"> ##
Conclusion <br><br> + The .enfase[Shared Component model] is an alternative model for .enfase[multiple outcomes] + .enfase[Spatial confounding can mess up the results] + .enfase[SPOCK] appears as a good .enfase[solution] for the Spatial Confouding problem + The correction seems to be coherent + We showed how to fit the Shared Component model using INLA + Using the SEER database we showed the potential problems of the Spatial Confounding problem --- class: slide-page <link rel="stylesheet" href="https://maxcdn.bootstrapcdn.com/font-awesome/4.5.0/css/font-awesome.min.css"> ##
Main references <p style="text-align:justify"> .enfase[Prates, M. O., Assunção, R. M., & Rodrigues, E. C. (2018)]. **Alleviating Spatial Confounding for Areal Data Problems by Displacing the Geographical Centroids**. _Bayesian Analysis_. .enfase[Knorr‐Held, L., & Best, N. G. (2001)]. **A shared component model for detecting joint and selective clustering of two diseases**. _Journal of the Royal Statistical Society: Series A (Statistics in Society), 164(1), 73-85_. .enfase[Rue, H., Martino, S., & Chopin, N. (2009)]. **Approximate Bayesian inference for latent Gaussian models by using integrated nested Laplace approximations**. _Journal of the royal statistical society: Series b (statistical methodology), 71(2), 319-392_. .enfase[Besag, J. (1974)]. **Spatial interaction and the statistical analysis of lattice systems**. _Journal of the Royal Statistical Society: Series B (Methodological), 36(2), 192-225_. .enfase[Reich, B. J., Hodges, J. S., & Zadnik, V. (2006)]. **Effects of residual smoothing on the posterior of the fixed effects in disease‐mapping models**. _Biometrics, 62(4), 1197-1206_. </span> --- class: center, inverse <link rel="stylesheet" href="https://maxcdn.bootstrapcdn.com/font-awesome/4.5.0/css/font-awesome.min.css"> <h1 style="color:#378fbb;"> THANK YOU! </h1> <!-- <img src="img/richmond/richmond3.jpg" alt="graph" height="100"/> --> <!-- <img src="img/richmond/richmond4.jpg" alt="graph" height="100"/> --> <!-- <img src="img/richmond/richmond5.jpg" alt="graph" height="100"/> --> <!-- <img src="img/richmond/richmond6.jpg" alt="graph" height="100"/> --> <!-- <img src="img/richmond/richmond7.jpg" alt="graph" height="100"/> --> <!-- <img src="img/richmond/richmond8.jpg" alt="graph" height="100"/> --> .mathbox[ <img src="img/richmond/richmond1.jpg" alt="graph" height="210"/> <img src="img/richmond/richmond2.jpg" alt="graph" height="210"/> <img src="img/richmond/richmond3.jpg" alt="graph" height="210"/> ] <br> <h1 style="color:#378fbb;">
QUESTIONS </h1> <h5> <span style="color:#ff6868;">
</span> <span style="color:#00000;">require-r.com</span> <span style="color:#ff6868;">
</span> <span style="color:#00000;">douglas-mesquita</span> <span style="color:#ff6868;">
</span> <span style="color:#00000;">DouglasMesquita</span> <span style="color:#ff6868;">
</span> <span style="color:#00000;">douglas-mesquita</span> <span style="color:#ff6868;">
</span> <span style="color:#00000;">DouglasMesqita</span> <br> <span style="color:#ff6868;">
</span> <span style="color:#00000;">douglasmesqita</span> <span style="color:#ff6868;">
</span> <span style="color:#00000;">douglas.mesquita.94</span> </h5>