This last module is intended to give you an intuition of Bayesian estimation of LMMs as well as the generalization of what we learned of LMMs to non-normal target distributions : generalized LMMs
No !
n_rep = 100 #100 tests repeated
matrix_ci_values = NULL
for (i in 1:n_rep){
grp1 = rnorm(10, 0, 1)
grp2 = rnorm(10, -1, 1)#H1 is true
matrix_ci_values <- rbind(matrix_ci_values,matrix(c(t.test(grp1,grp2)$conf.int[1],t.test(grp1,grp2)$conf.int[2]), ncol=2,nrow=1) )
}
ci_corr = matrix_ci_values[(1 >= matrix_ci_values[,1]) & (1 <= matrix_ci_values[,2]),]#true included in CI
ci_incorr = matrix_ci_values[(1 >= matrix_ci_values[,2]) | (1 <= matrix_ci_values[,1]),]#true not included in CI
#Graphical parameters
plot.new()
plot.window(xlim=c(-5,5), ylim=c(0.5, 100+.5), ylab="a")
axis(1, at=seq(-3, 3, by=1))
axis(2, at=seq(0, 100, by=20))
#Plotting correct CI vs incorrect
arrows(ci_corr[,1], 1:nrow(ci_corr), ci_corr[,2], 1:nrow(ci_corr), code=3, length=0.05, angle=90)
arrows(ci_incorr[,1], nrow(ci_corr):(nrow(ci_incorr)+nrow(ci_corr)), ci_incorr[,2],
nrow(ci_corr):(nrow(ci_incorr)+nrow(ci_corr)), code=3, length=0.05, angle=90,col="red")
#Plotting true effect as a line
abline(v=1)
95% of the CI contain true effect is the correct definition
Bayesian inference $= p(\theta|\text{Data,Model,Prior})$ vs. frequentist $= p(\text{Data}|\theta)$ (e.g. likelihood in module 2)
How do we go from $= p(\text{Data}|\theta)$ to $= p(\theta|\text{Data})$ ?
Bayes rule :
\begin{equation} \label{eq:bayes} p(\theta|Data) = \frac{p(\theta ) p(Data |\theta)}{p(Data)} \end{equation}Let's go back to our coin flip example of module 2 where $Data = 28$ heads of 40 flips:
# Code adapted from R. McElreath's book (see ressources at the end of the module)
grid = 50
p_grid = seq(from=0, to=1, length.out = grid) #Discretizing the parameter space, aka grid approximation
prior = rep(1,grid)#Flat uninformative prior
likelihood = dbinom(28, size=40, prob=p_grid)
posterior = (likelihood*prior)/sum(likelihood*prior)
plot(p_grid, prior, type="b")
plot(p_grid, likelihood, type="b")
plot(p_grid, posterior, type="b")
Now we change the prior to illustrate the updating process from the observation of data on prior beliefs
# Code adapted from R. McElreath's book (see ressources at the end of the module)
grid = 50
p_grid = seq(from=0, to=1, length.out = grid)
prior = ifelse(p_grid <= .5, 0, 1)#informative prior we know coin is biased towards head, but agnostic on how much
likelihood = dbinom(28, size=40, prob=p_grid)
posterior = (likelihood*prior)/sum(likelihood*prior)
plot(p_grid, prior, type="b")
plot(p_grid, likelihood, type="b")
plot(p_grid, posterior, type="b")
# Code taken from R. McElreath's book (see ressources at the end of the module)
grid = 50
p_grid = seq(from=0, to=1, length.out = grid) #Discretizing the parameter space, aka grid approximation
prior = ifelse(p_grid > .5, 0, 1)#improper prior
likelihood = dbinom(28, size=40, prob=p_grid)
posterior = (likelihood*prior)/sum(likelihood*prior)
plot(p_grid, prior, type="b")
plot(p_grid, likelihood, type="b")
plot(p_grid, posterior, type="b")
But this grid approximation badly scales with the number of parameters + $p(Data)$ is hard to estimate when fitting LMM : Markov Chain Monte Carlo
MCMC boils down to generate new propositions of parameter values $\theta_{n+1}$ and compare to current proposal ($\theta_n$)
$current = \frac{p(\theta_n ) p(Data |\theta_n)}{p(Data)}$
$new = \frac{p(\theta_{n+1} ) p(Data |\theta_{n+1})}{p(Data)}$
Acceptance probability $p = \frac{p(\theta_{n+1} ) p(Data |\theta_{n+1})}{p(\theta_{n}) p(Data |\theta_{n})} $
Using these comparisons we explore the parameter space even though we do not compute $p(Data)$ (see blog post by T. Wiecki for an intuitive approach of the math behind MCMC sampling)
Let's illustrate MCMC and the different algorithm behind it : http://chi-feng.github.io/mcmc-demo/app.html?algorithm=RandomWalkMH&target=normal
To illustrate in the context of the R environment and to introduce the package we are going to use (brms) : we use an MCMC procedure to estimate the mean and sd of the known IQ distribution
set.seed(234)
options(mc.cores = parallel::detectCores())#Detects and stores the number of cores on your computer
library(brms)#Library for Bayesian estimation of LMMs (amongst other models)
library(shinystan)#Library we are going to use to inspect our MCMC procedures
Loading required package: Rcpp Loading 'brms' package (version 2.14.4). Useful instructions can be found by typing help('brms'). A more detailed introduction to the package is available through vignette('brms_overview'). Attaching package: ‘brms’ The following object is masked from ‘package:stats’: ar Loading required package: shiny This is shinystan version 2.5.0
dummy_data = data.frame(y=rnorm(100, 100,15))#e.g. Normal with known mean of 100 and sd of 15
Setting large priors
priors = c(prior(normal(0, 1000), class = Intercept),#The prior we have on the mean
prior(cauchy(0, 1000), class = sigma))#The prior we have on the SD a fat-tailed truncated distribution
values = seq(from=-5000, to=5000, length.out = 50)
plot(values,dnorm(values,0,1000),type="b",ylab="density")
lines(values[26:50],dcauchy(values[26:50],0,1000),type="b",col="red")
Estimating the parameters
a_mean = brm(y ~ 1, data=dummy_data,
family="normal", prior = priors, chain = 2, warmup=10, iter=100)#A shotgun to a knife fight
Compiling Stan program... recompiling to avoid crashing R session Start sampling Warning message: “There were 2 transitions after warmup that exceeded the maximum treedepth. Increase max_treedepth above 10. See http://mc-stan.org/misc/warnings.html#maximum-treedepth-exceeded” Warning message: “There were 2 chains where the estimated Bayesian Fraction of Missing Information was low. See http://mc-stan.org/misc/warnings.html#bfmi-low” Warning message: “Examine the pairs() plot to diagnose sampling problems ” Warning message: “The largest R-hat is 1.93, indicating chains have not mixed. Running the chains for more iterations may help. See http://mc-stan.org/misc/warnings.html#r-hat” Warning message: “Bulk Effective Samples Size (ESS) is too low, indicating posterior means and medians may be unreliable. Running the chains for more iterations may help. See http://mc-stan.org/misc/warnings.html#bulk-ess” Warning message: “Tail Effective Samples Size (ESS) is too low, indicating posterior variances and tail quantiles may be unreliable. Running the chains for more iterations may help. See http://mc-stan.org/misc/warnings.html#tail-ess”
print(stancode(a_mean))
// generated with brms 2.14.4 functions { } data { int<lower=1> N; // total number of observations vector[N] Y; // response variable int prior_only; // should the likelihood be ignored? } transformed data { } parameters { real Intercept; // temporary intercept for centered predictors real<lower=0> sigma; // residual SD } transformed parameters { } model { // likelihood including all constants if (!prior_only) { // initialize linear predictor term vector[N] mu = Intercept + rep_vector(0.0, N); target += normal_lpdf(Y | mu, sigma); } // priors including all constants target += normal_lpdf(Intercept | 0, 1000); target += cauchy_lpdf(sigma | 0, 1000) - 1 * cauchy_lccdf(0 | 0, 1000); } generated quantities { // actual population-level intercept real b_Intercept = Intercept; }
Let's inspect the MCMC algorithm
mcmc_plot(a_mean, type="trace")
No divergences to plot.
summary(a_mean)
Warning message: “Parts of the model have not converged (some Rhats are > 1.05). Be careful when analysing the results! We recommend running more iterations and/or setting stronger priors.”
Family: gaussian Links: mu = identity; sigma = identity Formula: y ~ 1 Data: dummy_data (Number of observations: 100) Samples: 2 chains, each with iter = 100; warmup = 10; thin = 1; total post-warmup samples = 180 Population-Level Effects: Estimate Est.Error l-95% CI u-95% CI Rhat Bulk_ESS Tail_ESS Intercept 62.14 42.41 -0.17 101.85 1.93 3 12 Family Specific Parameters: Estimate Est.Error l-95% CI u-95% CI Rhat Bulk_ESS Tail_ESS sigma 41.31 35.60 12.72 106.16 1.75 3 49 Samples were drawn using sampling(NUTS). For each parameter, Bulk_ESS and Tail_ESS are effective sample size measures, and Rhat is the potential scale reduction factor on split chains (at convergence, Rhat = 1).
Note that there are numerous ways to check convergence and model adequacy, e.g. shinystan
options(browser = "firefox")
launch_shinystan(a_mean)
Launching ShinyStan interface... for large models this may take some time. Listening on http://127.0.0.1:3383
How can you improve the estimation ? Try it yourself
Prior predictive checks :
a_mean_prior_only = brm(y ~ 1, data=dummy_data, sample_prior = "only",
family="normal", prior = priors, chain = 4, cores=5, warmup=500, iter=1000)
pp_check(nsamples = 100, a_mean_prior_only)
Compiling Stan program... Start sampling
Priors are inconsistent with domain knowledge about IQ. Let's induce some more informative priors
priors = c(prior(normal(100, 10), class = Intercept),#The prior we have on the mean
prior(cauchy(15, 5), class = sigma))#The prior we have on the SD a fat-tailed truncated distribution
a_mean_prior_only = brm(y ~ 1, data=dummy_data, sample_prior = "only",
family="normal", prior = priors, chain = 4, cores=5, warmup=500, iter=1000)
pp_check(nsamples = 10, a_mean_prior_only)
Compiling Stan program... Start sampling Warning message: “Tail Effective Samples Size (ESS) is too low, indicating posterior variances and tail quantiles may be unreliable. Running the chains for more iterations may help. See http://mc-stan.org/misc/warnings.html#tail-ess”
Posterior predictive checks :
a_mean = brm(y ~ 1, data=dummy_data,
family="normal", prior = priors, chain = 4, cores=5, warmup=500, iter=1000)
pp_check(nsamples = 100, a_mean)
Compiling Stan program... Start sampling
If we induced improper priors :
improper_priors = c(prior(normal(-150, 1), class = Intercept),#The (wrong) prior we have on the mean
prior(normal(1, 1), class = sigma))#Bad prior on variance
a_mean_improper = brm(y ~ 1, data=dummy_data, #Note that the model is slower to fit
family="normal", prior = improper_priors, chain = 4, cores=5, warmup=500, iter=1000)
pp_check(nsamples = 100, a_mean_improper)
Compiling Stan program... Start sampling
But summaries (e.g. Rhat) do not explicitely show the problem
summary(a_mean_improper)
Family: gaussian Links: mu = identity; sigma = identity Formula: y ~ 1 Data: dummy_data (Number of observations: 100) Samples: 4 chains, each with iter = 1000; warmup = 500; thin = 1; total post-warmup samples = 2000 Population-Level Effects: Estimate Est.Error l-95% CI u-95% CI Rhat Bulk_ESS Tail_ESS Intercept -139.93 0.97 -141.81 -138.07 1.00 1390 1226 Family Specific Parameters: Estimate Est.Error l-95% CI u-95% CI Rhat Bulk_ESS Tail_ESS sigma 48.77 0.51 47.78 49.75 1.00 1894 1495 Samples were drawn using sampling(NUTS). For each parameter, Bulk_ESS and Tail_ESS are effective sample size measures, and Rhat is the potential scale reduction factor on split chains (at convergence, Rhat = 1).
However if the prior do not exclude the real values, even improper priors will be overruled by the likelihood of the data (given a reasonable amount of it) :
slightly_improper_priors = c(prior(normal(-150, 100), class = Intercept),#We increased the variance around our prior
prior(cauchy(1, 1), class = sigma))#From normal to Cauchy
a_mean_slightly_improper = brm(y ~ 1, data=dummy_data, #Note that the model is slower to fit
family="normal", prior = slightly_improper_priors, chain = 4, cores=5, warmup=500, iter=1000)
pp_check(nsamples = 100, a_mean_slightly_improper)
Compiling Stan program... Start sampling
We move to linear models using the dataset we generated in module 2 :
library(MASS)
mean_wm = 6
sd_wm = 1
n_participants = 15
varcovmat = matrix(c(1, .33, .33, .33, .33, 1, .33, .33, .33, .33,1,.33, .33, .33, .33,1), nrow=4)
data = data.frame()
for(i in 1:n_participants){
re = mvrnorm(n=1, mu=c(0, 0, 0, 0), Sigma=varcovmat)
mu_i = re[1] * sd_wm + mean_wm
b_age = re[2] *(sd_wm*2) - (sd_wm*4) #b_age_j
b_task = re[3] * sd_wm + -(sd_wm*2)#b_task_j
b_age_task = re[4] * (sd_wm/2) - (sd_wm*2)#b_agetask_j
age_tested = runif(runif(1, 5,15), 1, 99)
for (age in age_tested){# for loop in for loops is especially dirty but more transparent
age = age/100#We recode age for scale purpose
for (task in c(0,1)){
ntrials = runif(1, 50, 150)
subdata = data.frame(trial=1:ntrials)
mu = mu_i+b_age*age+b_task*task+b_age_task*age*task
subdata$wm = rnorm(ntrials, mu, sd=sd_wm)
subdata$participant = as.character(i)
subdata$age = age
subdata$task = task
data = rbind(data, subdata)
}
}
}
means_ = aggregate(data$wm, FUN=mean,
by=list(age=data$age, task=data$task, participant = data$participant))#pre-averaging at first
#We let brms use the default priors out of commodity
blm0 = brm(x ~ age * task, data=means_,
family="normal", chain = 4, cores=4, warmup=500, iter=1000)
Compiling Stan program... Start sampling
print(stancode(blm0))
// generated with brms 2.14.4 functions { } data { int<lower=1> N; // total number of observations vector[N] Y; // response variable int<lower=1> K; // number of population-level effects matrix[N, K] X; // population-level design matrix int prior_only; // should the likelihood be ignored? } transformed data { int Kc = K - 1; matrix[N, Kc] Xc; // centered version of X without an intercept vector[Kc] means_X; // column means of X before centering for (i in 2:K) { means_X[i - 1] = mean(X[, i]); Xc[, i - 1] = X[, i] - means_X[i - 1]; } } parameters { vector[Kc] b; // population-level effects real Intercept; // temporary intercept for centered predictors real<lower=0> sigma; // residual SD } transformed parameters { } model { // likelihood including all constants if (!prior_only) { target += normal_id_glm_lpdf(Y | Xc, Intercept, b, sigma); } // priors including all constants target += student_t_lpdf(Intercept | 3, 2.5, 3.2); target += student_t_lpdf(sigma | 3, 0, 3.2) - 1 * student_t_lccdf(0 | 3, 0, 3.2); } generated quantities { // actual population-level intercept real b_Intercept = Intercept - dot_product(means_X, b); }
blm0
Family: gaussian Links: mu = identity; sigma = identity Formula: x ~ age * task Data: means_ (Number of observations: 272) Samples: 4 chains, each with iter = 1000; warmup = 500; thin = 1; total post-warmup samples = 2000 Population-Level Effects: Estimate Est.Error l-95% CI u-95% CI Rhat Bulk_ESS Tail_ESS Intercept 5.59 0.42 4.74 6.37 1.00 897 1163 age -3.56 0.71 -4.85 -2.12 1.00 951 1124 task -1.84 0.57 -2.97 -0.70 1.00 845 979 age:task -2.23 0.98 -4.11 -0.33 1.00 839 1015 Family Specific Parameters: Estimate Est.Error l-95% CI u-95% CI Rhat Bulk_ESS Tail_ESS sigma 2.34 0.10 2.14 2.56 1.00 1569 1223 Samples were drawn using sampling(NUTS). For each parameter, Bulk_ESS and Tail_ESS are effective sample size measures, and Rhat is the potential scale reduction factor on split chains (at convergence, Rhat = 1).
options(repr.plot.width=8, repr.plot.height=8, repr.plot.res = 200)#hidden code for display size
plot(blm0)
options(repr.plot.width=8, repr.plot.height=5, repr.plot.res = 200)#hidden code for display size
pp_check(nsamples = 100, blm0)
Every point raised until now in the course apply, e.g. we can check for normality and homeoscedasticity of the residuals
library(ggplot2)
fitted <- fitted(blm0)[, 1]
resid <- residuals(blm0)[, 1]
qplot(fitted, resid)
We see that there is some heteroscedasticity, we applied the wrong model as also seen in the estimation of the residual SD (but see full LMM)
hist(resid)
We can also look at the correlation between parameter estimation
pairs(blm0)
Bayesian estimation allows you to easily use the uncertainty associated with the estimation of parameters. E.g. we can draw 500 regression lines from the posterior distribution
#install.packages('tidybayes') see http://mjskay.github.io/tidybayes/articles/tidy-brms.html
library(dplyr)
library(ggplot2)
library(tidybayes)
library(modelr)
means_ %>%
group_by(task) %>%
data_grid(age = seq_range(age, n = 100)) %>%
add_fitted_draws(blm0, n = 50) %>%
ggplot(aes(x = age, y = x, color = ordered(task))) +
geom_line(aes(y = .value, group = paste(task, .draw)), alpha = .1) +
geom_point(data = means_)
Attaching package: ‘dplyr’ The following object is masked from ‘package:MASS’: select The following objects are masked from ‘package:stats’: filter, lag The following objects are masked from ‘package:base’: intersect, setdiff, setequal, union Attaching package: ‘tidybayes’ The following objects are masked from ‘package:brms’: dstudent_t, pstudent_t, qstudent_t, rstudent_t
Additionally MCMC process allows you to access the estimates of each iteration
get_variables(blm0)
Imagine you want to compute the predicted wm at age 70 but keeping the uncertainty associated with the parameters
intercept = blm0 %>% spread_draws(b_Intercept)
slope_age = blm0 %>% spread_draws(b_age)
wm_at_age_70 = intercept$b_Intercept + slope_age$b_age*.70
hist(wm_at_age_70,breaks=100)
Now directly to full linear mixed models
priors = c(prior(normal(6, 2), class = Intercept),#The prior we have on the intercept
prior(cauchy(1, 5), class = sigma),#The prior we have on the SD a fat-tailed truncated distribution
prior(normal(0,10), class = b, coef=age),#prior on the age effect
prior(normal(0,10), class = b, coef=task),#prior on the task effect
prior(normal(0,10), class = b, coef=age:task),#prior on the interaction between age and task
prior(lkj(2), class = cor))#prior on the correlation between RE
#The following is too slow to be run in class so I just recover it from a previous fit
blm1 = brm(wm ~ age * task + (age * task | participant), data=data, prior = priors,
family="normal", chain = 8, cores=8, warmup=1000, iter=1200, file="blm1")
Plotting the population parameters
options(repr.plot.width=8, repr.plot.height=8, repr.plot.res = 200)#hidden code for display size
plot(blm1)
options(repr.plot.width=8, repr.plot.height=5, repr.plot.res = 200)#hidden code for display size
Looking at the summary of the population parameters
summary(blm1)
Warning message: “There were 1 divergent transitions after warmup. Increasing adapt_delta above 0.8 may help. See http://mc-stan.org/misc/warnings.html#divergent-transitions-after-warmup”
Family: gaussian Links: mu = identity; sigma = identity Formula: wm ~ age * task + (age * task | participant) Data: data (Number of observations: 25006) Samples: 8 chains, each with iter = 1200; warmup = 1000; thin = 1; total post-warmup samples = 1600 Group-Level Effects: ~participant (Number of levels: 15) Estimate Est.Error l-95% CI u-95% CI Rhat Bulk_ESS sd(Intercept) 0.93 0.21 0.61 1.42 1.00 1066 sd(age) 2.09 0.46 1.41 3.20 1.00 995 sd(task) 0.98 0.21 0.65 1.46 1.00 1240 sd(age:task) 0.76 0.18 0.49 1.19 1.01 964 cor(Intercept,age) 0.11 0.23 -0.35 0.54 1.01 1037 cor(Intercept,task) 0.07 0.24 -0.38 0.51 1.01 1761 cor(age,task) 0.19 0.23 -0.29 0.60 1.01 917 cor(Intercept,age:task) -0.08 0.24 -0.53 0.37 1.01 1570 cor(age,age:task) 0.34 0.22 -0.14 0.72 1.01 1302 cor(task,age:task) 0.03 0.26 -0.48 0.52 1.00 1630 Tail_ESS sd(Intercept) 1067 sd(age) 981 sd(task) 1235 sd(age:task) 1193 cor(Intercept,age) 1025 cor(Intercept,task) 1294 cor(age,task) 1008 cor(Intercept,age:task) 1259 cor(age,age:task) 1319 cor(task,age:task) 1130 Population-Level Effects: Estimate Est.Error l-95% CI u-95% CI Rhat Bulk_ESS Tail_ESS Intercept 6.05 0.26 5.56 6.58 1.00 510 769 age -4.49 0.58 -5.61 -3.28 1.01 640 854 task -1.64 0.27 -2.15 -1.09 1.01 801 1129 age:task -1.98 0.22 -2.40 -1.55 1.00 1254 1313 Family Specific Parameters: Estimate Est.Error l-95% CI u-95% CI Rhat Bulk_ESS Tail_ESS sigma 1.00 0.00 0.99 1.00 1.01 2032 1085 Samples were drawn using sampling(NUTS). For each parameter, Bulk_ESS and Tail_ESS are effective sample size measures, and Rhat is the potential scale reduction factor on split chains (at convergence, Rhat = 1).
pp_check(nsamples = 100, blm1)
Every point raised until now in the course apply, e.g. we can check for normality and homeoscedasticity of the residuals
fitted <- fitted(blm1)[, 1]
resid <- residuals(blm1)[, 1]
qplot(fitted, resid)
hist(resid)
pairs(blm1, pars = "^sd_")#looking at the scatterplotes for the SD parameters
get_variables(blm1)
This section cannot be made in the time it should take, therefore it serves the purpose to show you how GLMMs are direct extension of LMMs assuming a different underlying distribution of the DV.
Note that we are going to skip some tests in order to have enough time for the lecture but every point raised during this module (and the two previous ones) on MCMC convergence and models assumption diagnostics also applies to GLMMs (except normality of the residuals course)
First an example with Reaction Times (RTs).
For this illustration we are going to use a synthetic dataset created in another lecture on mental chronometry (if interested see the github repository )
data = read.csv("data.csv", sep='')
data$rt = data$rt * 1000 #from s to ms
data$encoding = as.character(data$encoding)
data$participant = as.character(data$participant)
data$ease = data$ease-1
data$encoding = ifelse(data$encoding == "standard", 0, 1)
data$response = ifelse(data$response == "upper",1, 0)
require(lattice)#needed for histogram plot
#Histogram for first five participants
histogram( ~ rt | participant ,data = data[data$participant %in% unique(data$participant)[1:5],], breaks=20)
Loading required package: lattice
means = aggregate(data$rt, FUN=mean,
by=list(participant=data$participant, ease=data$ease, encoding=data$encoding)) #getting the mean for each experimental cell X participant
grand_means = aggregate(means$x, FUN=mean,
by=list(ease=means$ease, encoding=means$encoding)) #getting the mean for each experimental cell
options(repr.plot.width=12, repr.plot.height=5, repr.plot.res = 400)
ggplot(grand_means, aes(x=ease, y=x, color=ordered(encoding))) +
geom_line()+
geom_point(size=4) +
scale_size_area() +
xlab("easiness")+
ylab("RT")
options(repr.plot.width=8, repr.plot.height=5, repr.plot.res = 200)#hidden code for display size
First we are going to apply an LMM on the raw single RTs then we are going to model these RTs using a lognormal
priors = c(prior(normal(700, 200), class = Intercept),#The prior we have on the intercept
prior(cauchy(100, 50), class = sigma),#The prior we have on the SD a fat-tailed truncated distribution
prior(normal(0,100), class = b, coef=ease),#prior on the age effect
prior(normal(0,100), class = b, coef=encoding),#prior on the task effect
prior(normal(0,100), class = b, coef=ease:encoding),#prior on the task effect
prior(lkj(2), class = cor))#prior on the correlation between RE
#The following is too slow to be run in class so I just recover it from a previous fit
blm2_rt = brm(rt ~ ease * encoding + (ease*encoding|participant), data=data, prior = priors,
family="normal", chain = 4, cores=4, warmup=1000, iter=2000, file="blm2_rt")
blm2_rt
Family: gaussian Links: mu = identity; sigma = identity Formula: rt ~ ease * encoding + (ease * encoding | participant) Data: data (Number of observations: 100000) Samples: 4 chains, each with iter = 2000; warmup = 1000; thin = 1; total post-warmup samples = 4000 Group-Level Effects: ~participant (Number of levels: 50) Estimate Est.Error l-95% CI u-95% CI Rhat Bulk_ESS sd(Intercept) 124.77 12.35 103.22 151.55 1.00 981 sd(ease) 25.69 2.58 21.05 31.23 1.00 1161 sd(encoding) 65.65 7.23 53.47 81.21 1.00 2307 sd(ease:encoding) 18.03 2.28 13.85 22.85 1.00 1350 cor(Intercept,ease) -0.74 0.07 -0.85 -0.59 1.00 1280 cor(Intercept,encoding) 0.49 0.11 0.25 0.69 1.00 1953 cor(ease,encoding) -0.51 0.11 -0.71 -0.27 1.00 1645 cor(Intercept,ease:encoding) -0.15 0.14 -0.43 0.13 1.00 2181 cor(ease,ease:encoding) 0.07 0.15 -0.23 0.36 1.00 1774 cor(encoding,ease:encoding) -0.47 0.12 -0.68 -0.21 1.00 2015 Tail_ESS sd(Intercept) 1638 sd(ease) 1815 sd(encoding) 2568 sd(ease:encoding) 2434 cor(Intercept,ease) 2169 cor(Intercept,encoding) 2722 cor(ease,encoding) 2204 cor(Intercept,ease:encoding) 2512 cor(ease,ease:encoding) 2311 cor(encoding,ease:encoding) 2260 Population-Level Effects: Estimate Est.Error l-95% CI u-95% CI Rhat Bulk_ESS Tail_ESS Intercept 700.30 18.06 664.08 734.05 1.00 426 797 ease -72.65 3.90 -80.30 -64.77 1.00 578 867 encoding 192.50 9.50 173.73 210.82 1.00 1161 1979 ease:encoding 112.17 2.87 106.48 117.80 1.00 1957 2438 Family Specific Parameters: Estimate Est.Error l-95% CI u-95% CI Rhat Bulk_ESS Tail_ESS sigma 263.32 0.59 262.15 264.48 1.00 5389 2317 Samples were drawn using sampling(NUTS). For each parameter, Bulk_ESS and Tail_ESS are effective sample size measures, and Rhat is the potential scale reduction factor on split chains (at convergence, Rhat = 1).
Now modeling RT as :
$$ RT_{ij} \sim lognormal(\mu_j, \sigma)$$priors = c(prior(normal(6, 1), class = Intercept),#The prior we have on the intercept
prior(cauchy(.5, 1), class = sigma),#The prior we have on the SD a fat-tailed truncated distribution
prior(normal(0,1), class = b, coef=ease),#prior on the age effect
prior(normal(0,1), class = b, coef=encoding),#prior on the task effect
prior(normal(0,1), class = b, coef=ease:encoding),#prior on the task effect
prior(lkj(2), class = cor))#prior on the correlation between RE
#The following is too slow to be run in class so I just recover it from a previous fit
blm2_lognorm_rt = brm(rt ~ ease * encoding + (ease*encoding|participant), data=data, prior = priors,
family="lognormal", chain = 4, cores=4, warmup=1000, iter=2000, file="blm2_lognorm_rt")
# Equivalent to : brm(log(rt) ~ ease * encoding + (ease*encoding|participant),family="normal",...)
blm2_lognorm_rt
Family: lognormal Links: mu = identity; sigma = identity Formula: rt ~ ease * encoding + (ease * encoding | participant) Data: data (Number of observations: 100000) Samples: 4 chains, each with iter = 2000; warmup = 1000; thin = 1; total post-warmup samples = 4000 Group-Level Effects: ~participant (Number of levels: 50) Estimate Est.Error l-95% CI u-95% CI Rhat Bulk_ESS sd(Intercept) 0.18 0.02 0.15 0.22 1.01 534 sd(ease) 0.03 0.00 0.03 0.04 1.00 924 sd(encoding) 0.06 0.01 0.05 0.08 1.00 1966 sd(ease:encoding) 0.03 0.00 0.02 0.04 1.00 1162 cor(Intercept,ease) -0.30 0.13 -0.54 -0.02 1.00 856 cor(Intercept,encoding) -0.15 0.14 -0.41 0.13 1.00 1901 cor(ease,encoding) -0.10 0.15 -0.38 0.18 1.01 1256 cor(Intercept,ease:encoding) -0.52 0.11 -0.72 -0.28 1.00 976 cor(ease,ease:encoding) -0.44 0.12 -0.65 -0.19 1.00 1399 cor(encoding,ease:encoding) -0.10 0.14 -0.36 0.18 1.00 1647 Tail_ESS sd(Intercept) 1102 sd(ease) 1540 sd(encoding) 2614 sd(ease:encoding) 1769 cor(Intercept,ease) 1404 cor(Intercept,encoding) 2544 cor(ease,encoding) 1880 cor(Intercept,ease:encoding) 1460 cor(ease,ease:encoding) 2135 cor(encoding,ease:encoding) 2640 Population-Level Effects: Estimate Est.Error l-95% CI u-95% CI Rhat Bulk_ESS Tail_ESS Intercept 6.47 0.03 6.41 6.52 1.01 343 618 ease -0.11 0.00 -0.12 -0.10 1.01 649 1005 encoding 0.24 0.01 0.22 0.26 1.01 1122 1744 ease:encoding 0.17 0.00 0.16 0.18 1.01 695 1144 Family Specific Parameters: Estimate Est.Error l-95% CI u-95% CI Rhat Bulk_ESS Tail_ESS sigma 0.28 0.00 0.28 0.28 1.00 4657 2462 Samples were drawn using sampling(NUTS). For each parameter, Bulk_ESS and Tail_ESS are effective sample size measures, and Rhat is the potential scale reduction factor on split chains (at convergence, Rhat = 1).
Now about RTs we could fit some more complex distribution (e.g. ex-gaussian, gamma...) but the linking function between parameters and predictors becomes unclear.
An alternative would be to use process models (e.g. Drift Diffusion Model) and build regression on top of its parameters intended to be interpretable.
An example with a Bernoulli process : response accuracy (correct vs. incorrect)
Usually such kind of data is pre-averaged and fed into a repeated measure ANOVA (usually with assumption violations, e.g. ceiling performances)
GLMMs allow to directly model trial accuracy :
$$ p(response_{ij}=1) \sim Bernoulli(logit(\mu_j))$$Where $\mu_j$ takes the same definition as previoulsy in the module 2
priors = c(prior(normal(1, 1), class = Intercept),#The prior we have on the intercept
prior(normal(0,.5), class = b, coef=ease),#prior on the age effect
prior(normal(0,.5), class = b, coef=encoding),#prior on the task effect
prior(normal(0,.5), class = b, coef=ease:encoding),#prior on the task effect
prior(lkj(2), class = cor))#prior on the correlation between RE
blm2_correct = brm(response ~ ease * encoding + (ease*encoding|participant), data=data, prior = priors,
family = bernoulli(), chain = 4, cores=4, warmup=1000, iter=2000, file="blm1_acc")
summary(blm2_correct)
Warning message: “There were 29 divergent transitions after warmup. Increasing adapt_delta above 0.8 may help. See http://mc-stan.org/misc/warnings.html#divergent-transitions-after-warmup”
Family: bernoulli Links: mu = logit Formula: response ~ ease * encoding + (ease * encoding | participant) Data: data (Number of observations: 100000) Samples: 4 chains, each with iter = 2000; warmup = 1000; thin = 1; total post-warmup samples = 4000 Group-Level Effects: ~participant (Number of levels: 50) Estimate Est.Error l-95% CI u-95% CI Rhat Bulk_ESS sd(Intercept) 0.41 0.05 0.32 0.51 1.00 1154 sd(ease) 0.18 0.03 0.12 0.26 1.00 968 sd(encoding) 0.20 0.04 0.12 0.29 1.00 1724 sd(ease:encoding) 0.07 0.04 0.00 0.17 1.00 466 cor(Intercept,ease) 0.58 0.16 0.20 0.84 1.00 1467 cor(Intercept,encoding) -0.61 0.15 -0.83 -0.27 1.00 2065 cor(ease,encoding) -0.58 0.18 -0.87 -0.15 1.00 1171 cor(Intercept,ease:encoding) 0.03 0.34 -0.64 0.65 1.00 4072 cor(ease,ease:encoding) -0.34 0.36 -0.84 0.52 1.00 1579 cor(encoding,ease:encoding) 0.01 0.34 -0.64 0.67 1.00 2706 Tail_ESS sd(Intercept) 2127 sd(ease) 557 sd(encoding) 2561 sd(ease:encoding) 418 cor(Intercept,ease) 1489 cor(Intercept,encoding) 2350 cor(ease,encoding) 1682 cor(Intercept,ease:encoding) 2621 cor(ease,ease:encoding) 2375 cor(encoding,ease:encoding) 2877 Population-Level Effects: Estimate Est.Error l-95% CI u-95% CI Rhat Bulk_ESS Tail_ESS Intercept 1.60 0.06 1.48 1.73 1.00 953 1536 ease 1.37 0.04 1.30 1.46 1.00 918 324 encoding -0.68 0.04 -0.77 -0.60 1.00 2079 2429 ease:encoding -0.62 0.04 -0.70 -0.55 1.01 887 313 Samples were drawn using sampling(NUTS). For each parameter, Bulk_ESS and Tail_ESS are effective sample size measures, and Rhat is the potential scale reduction factor on split chains (at convergence, Rhat = 1).
Now all these estimates are on the logit scale which is not easy to interpret, given that we can backtransform at each iteration of the MCMC processes we can represent the estimation on the natural scale :
inv.logit = function(x){
inv = exp(x)/(1+exp(x))
return(inv)
}
intercept = blm2_correct %>% spread_draws(b_Intercept)
hist(inv.logit(intercept$b_Intercept),breaks=100)
Books :
Articles :
See also the series of Lectures by R. McElreath on youtube