This last module is intended to give you an intuition of Bayesian estimation of LMMs as well as the generalization of what we learned of LMMs to non-normal target distributions : generalized LMMs
No !
n_rep = 100 #100 tests repeated
matrix_ci_values = NULL
for (i in 1:n_rep){
grp1 = rnorm(10, 0, 1)
grp2 = rnorm(10, -1, 1)#H1 is true
matrix_ci_values <- rbind(matrix_ci_values,matrix(c(t.test(grp1,grp2)$conf.int[1],t.test(grp1,grp2)$conf.int[2]), ncol=2,nrow=1) )
}
ci_corr = matrix_ci_values[(1 >= matrix_ci_values[,1]) & (1 <= matrix_ci_values[,2]),]#true included in CI
ci_incorr = matrix_ci_values[(1 >= matrix_ci_values[,2]) | (1 <= matrix_ci_values[,1]),]#true not included in CI
#Graphical parameters
plot.new()
plot.window(xlim=c(-5,5), ylim=c(0.5, 100+.5), ylab="a")
axis(1, at=seq(-3, 3, by=1))
axis(2, at=seq(0, 100, by=20))
#Plotting correct CI vs incorrect
arrows(ci_corr[,1], 1:nrow(ci_corr), ci_corr[,2], 1:nrow(ci_corr), code=3, length=0.05, angle=90)
arrows(ci_incorr[,1], nrow(ci_corr):(nrow(ci_incorr)+nrow(ci_corr)), ci_incorr[,2],
nrow(ci_corr):(nrow(ci_incorr)+nrow(ci_corr)), code=3, length=0.05, angle=90,col="red")
#Plotting true effect as a line
abline(v=1)
95% of the CI contain true effect is the correct definition
Bayesian inference $= p(\theta|\text{Data,Model,Prior})$ vs. frequentist $= p(\text{Data}|\theta)$ (e.g. likelihood in module 2)
How do we go from $= p(\text{Data}|\theta)$ to $= p(\theta|\text{Data})$ ?
Bayes rule :
\begin{equation} \label{eq:bayes} p(\theta|Data) = \frac{p(\theta ) p(Data |\theta)}{p(Data)} \end{equation}Let's go back to our coin flip example of module 2 where $Data = 28$ heads of 40 flips:
# Code adapted from R. McElreath's book (see ressources at the end of the module)
grid = 50
p_grid = seq(from=0, to=1, length.out = grid) #Discretizing the parameter space, aka grid approximation
prior = rep(1,grid)#Flat uninformative prior
likelihood = dbinom(28, size=40, prob=p_grid)
posterior = (likelihood*prior)/sum(likelihood*prior)
plot(p_grid, prior, type="b")
plot(p_grid, likelihood, type="b")
plot(p_grid, posterior, type="b")
Now we change the prior to illustrate the updating process from the observation of data on prior beliefs
# Code adapted from R. McElreath's book (see ressources at the end of the module)
grid = 50
p_grid = seq(from=0, to=1, length.out = grid)
prior = ifelse(p_grid <= .5, 0, 1)#informative prior we know coin is biased towards head, but agnostic on how much
likelihood = dbinom(28, size=40, prob=p_grid)
posterior = (likelihood*prior)/sum(likelihood*prior)
plot(p_grid, prior, type="b")
plot(p_grid, likelihood, type="b")
plot(p_grid, posterior, type="b")
# Code taken from R. McElreath's book (see ressources at the end of the module)
grid = 50
p_grid = seq(from=0, to=1, length.out = grid) #Discretizing the parameter space, aka grid approximation
prior = ifelse(p_grid > .5, 0, 1)#improper prior
likelihood = dbinom(28, size=40, prob=p_grid)
posterior = (likelihood*prior)/sum(likelihood*prior)
plot(p_grid, prior, type="b")
plot(p_grid, likelihood, type="b")
plot(p_grid, posterior, type="b")
But this grid approximation badly scales with the number of parameters + $p(Data)$ is hard to estimate when fitting LMM : Markov Chain Monte Carlo
MCMC boils down to generate new propositions of parameter values $\theta_{n+1}$ and compare to current proposal ($\theta_n$)
$current = \frac{p(\theta_n ) p(Data |\theta_n)}{p(Data)}$
$new = \frac{p(\theta_{n+1} ) p(Data |\theta_{n+1})}{p(Data)}$
Acceptance probability $p = \frac{p(\theta_{n+1} ) p(Data |\theta_{n+1})}{p(\theta_{n}) p(Data |\theta_{n})} $
Using these comparisons we explore the parameter space even though we do not compute $p(Data)$ (see blog post by T. Wiecki for an intuitive approach of the math behind MCMC sampling)
Let's illustrate MCMC and the different algorithm behind it : http://chi-feng.github.io/mcmc-demo/app.html?algorithm=RandomWalkMH&target=normal
To illustrate in the context of the R environment and to introduce the package we are going to use (brms) : we use an MCMC procedure to estimate the mean and sd of the known IQ distribution
set.seed(234)
options(mc.cores = parallel::detectCores())#Detects and stores the number of cores on your computer
library(brms)#Library for Bayesian estimation of LMMs (amongst other models)
library(shinystan)#Library we are going to use to inspect our MCMC procedures
Loading required package: Rcpp
Loading 'brms' package (version 2.14.4). Useful instructions
can be found by typing help('brms'). A more detailed introduction
to the package is available through vignette('brms_overview').
Attaching package: ‘brms’
The following object is masked from ‘package:stats’:
ar
Loading required package: shiny
This is shinystan version 2.5.0
dummy_data = data.frame(y=rnorm(100, 100,15))#e.g. Normal with known mean of 100 and sd of 15
Setting large priors
priors = c(prior(normal(0, 1000), class = Intercept),#The prior we have on the mean
prior(cauchy(0, 1000), class = sigma))#The prior we have on the SD a fat-tailed truncated distribution
values = seq(from=-5000, to=5000, length.out = 50)
plot(values,dnorm(values,0,1000),type="b",ylab="density")
lines(values[26:50],dcauchy(values[26:50],0,1000),type="b",col="red")
Estimating the parameters
a_mean = brm(y ~ 1, data=dummy_data,
family="normal", prior = priors, chain = 2, warmup=10, iter=100)#A shotgun to a knife fight
Compiling Stan program... recompiling to avoid crashing R session Start sampling Warning message: “There were 2 transitions after warmup that exceeded the maximum treedepth. Increase max_treedepth above 10. See http://mc-stan.org/misc/warnings.html#maximum-treedepth-exceeded” Warning message: “There were 2 chains where the estimated Bayesian Fraction of Missing Information was low. See http://mc-stan.org/misc/warnings.html#bfmi-low” Warning message: “Examine the pairs() plot to diagnose sampling problems ” Warning message: “The largest R-hat is 1.93, indicating chains have not mixed. Running the chains for more iterations may help. See http://mc-stan.org/misc/warnings.html#r-hat” Warning message: “Bulk Effective Samples Size (ESS) is too low, indicating posterior means and medians may be unreliable. Running the chains for more iterations may help. See http://mc-stan.org/misc/warnings.html#bulk-ess” Warning message: “Tail Effective Samples Size (ESS) is too low, indicating posterior variances and tail quantiles may be unreliable. Running the chains for more iterations may help. See http://mc-stan.org/misc/warnings.html#tail-ess”
print(stancode(a_mean))
// generated with brms 2.14.4
functions {
}
data {
int<lower=1> N; // total number of observations
vector[N] Y; // response variable
int prior_only; // should the likelihood be ignored?
}
transformed data {
}
parameters {
real Intercept; // temporary intercept for centered predictors
real<lower=0> sigma; // residual SD
}
transformed parameters {
}
model {
// likelihood including all constants
if (!prior_only) {
// initialize linear predictor term
vector[N] mu = Intercept + rep_vector(0.0, N);
target += normal_lpdf(Y | mu, sigma);
}
// priors including all constants
target += normal_lpdf(Intercept | 0, 1000);
target += cauchy_lpdf(sigma | 0, 1000)
- 1 * cauchy_lccdf(0 | 0, 1000);
}
generated quantities {
// actual population-level intercept
real b_Intercept = Intercept;
}
Let's inspect the MCMC algorithm
mcmc_plot(a_mean, type="trace")
No divergences to plot.
summary(a_mean)
Warning message: “Parts of the model have not converged (some Rhats are > 1.05). Be careful when analysing the results! We recommend running more iterations and/or setting stronger priors.”
Family: gaussian
Links: mu = identity; sigma = identity
Formula: y ~ 1
Data: dummy_data (Number of observations: 100)
Samples: 2 chains, each with iter = 100; warmup = 10; thin = 1;
total post-warmup samples = 180
Population-Level Effects:
Estimate Est.Error l-95% CI u-95% CI Rhat Bulk_ESS Tail_ESS
Intercept 62.14 42.41 -0.17 101.85 1.93 3 12
Family Specific Parameters:
Estimate Est.Error l-95% CI u-95% CI Rhat Bulk_ESS Tail_ESS
sigma 41.31 35.60 12.72 106.16 1.75 3 49
Samples were drawn using sampling(NUTS). For each parameter, Bulk_ESS
and Tail_ESS are effective sample size measures, and Rhat is the potential
scale reduction factor on split chains (at convergence, Rhat = 1).
Note that there are numerous ways to check convergence and model adequacy, e.g. shinystan
options(browser = "firefox")
launch_shinystan(a_mean)
Launching ShinyStan interface... for large models this may take some time. Listening on http://127.0.0.1:3383
How can you improve the estimation ? Try it yourself
Prior predictive checks :
a_mean_prior_only = brm(y ~ 1, data=dummy_data, sample_prior = "only",
family="normal", prior = priors, chain = 4, cores=5, warmup=500, iter=1000)
pp_check(nsamples = 100, a_mean_prior_only)
Compiling Stan program... Start sampling
Priors are inconsistent with domain knowledge about IQ. Let's induce some more informative priors
priors = c(prior(normal(100, 10), class = Intercept),#The prior we have on the mean
prior(cauchy(15, 5), class = sigma))#The prior we have on the SD a fat-tailed truncated distribution
a_mean_prior_only = brm(y ~ 1, data=dummy_data, sample_prior = "only",
family="normal", prior = priors, chain = 4, cores=5, warmup=500, iter=1000)
pp_check(nsamples = 10, a_mean_prior_only)
Compiling Stan program... Start sampling Warning message: “Tail Effective Samples Size (ESS) is too low, indicating posterior variances and tail quantiles may be unreliable. Running the chains for more iterations may help. See http://mc-stan.org/misc/warnings.html#tail-ess”
Posterior predictive checks :
a_mean = brm(y ~ 1, data=dummy_data,
family="normal", prior = priors, chain = 4, cores=5, warmup=500, iter=1000)
pp_check(nsamples = 100, a_mean)
Compiling Stan program... Start sampling
If we induced improper priors :
improper_priors = c(prior(normal(-150, 1), class = Intercept),#The (wrong) prior we have on the mean
prior(normal(1, 1), class = sigma))#Bad prior on variance
a_mean_improper = brm(y ~ 1, data=dummy_data, #Note that the model is slower to fit
family="normal", prior = improper_priors, chain = 4, cores=5, warmup=500, iter=1000)
pp_check(nsamples = 100, a_mean_improper)
Compiling Stan program... Start sampling
But summaries (e.g. Rhat) do not explicitely show the problem
summary(a_mean_improper)
Family: gaussian
Links: mu = identity; sigma = identity
Formula: y ~ 1
Data: dummy_data (Number of observations: 100)
Samples: 4 chains, each with iter = 1000; warmup = 500; thin = 1;
total post-warmup samples = 2000
Population-Level Effects:
Estimate Est.Error l-95% CI u-95% CI Rhat Bulk_ESS Tail_ESS
Intercept -139.93 0.97 -141.81 -138.07 1.00 1390 1226
Family Specific Parameters:
Estimate Est.Error l-95% CI u-95% CI Rhat Bulk_ESS Tail_ESS
sigma 48.77 0.51 47.78 49.75 1.00 1894 1495
Samples were drawn using sampling(NUTS). For each parameter, Bulk_ESS
and Tail_ESS are effective sample size measures, and Rhat is the potential
scale reduction factor on split chains (at convergence, Rhat = 1).
However if the prior do not exclude the real values, even improper priors will be overruled by the likelihood of the data (given a reasonable amount of it) :
slightly_improper_priors = c(prior(normal(-150, 100), class = Intercept),#We increased the variance around our prior
prior(cauchy(1, 1), class = sigma))#From normal to Cauchy
a_mean_slightly_improper = brm(y ~ 1, data=dummy_data, #Note that the model is slower to fit
family="normal", prior = slightly_improper_priors, chain = 4, cores=5, warmup=500, iter=1000)
pp_check(nsamples = 100, a_mean_slightly_improper)
Compiling Stan program... Start sampling
We move to linear models using the dataset we generated in module 2 :
library(MASS)
mean_wm = 6
sd_wm = 1
n_participants = 15
varcovmat = matrix(c(1, .33, .33, .33, .33, 1, .33, .33, .33, .33,1,.33, .33, .33, .33,1), nrow=4)
data = data.frame()
for(i in 1:n_participants){
re = mvrnorm(n=1, mu=c(0, 0, 0, 0), Sigma=varcovmat)
mu_i = re[1] * sd_wm + mean_wm
b_age = re[2] *(sd_wm*2) - (sd_wm*4) #b_age_j
b_task = re[3] * sd_wm + -(sd_wm*2)#b_task_j
b_age_task = re[4] * (sd_wm/2) - (sd_wm*2)#b_agetask_j
age_tested = runif(runif(1, 5,15), 1, 99)
for (age in age_tested){# for loop in for loops is especially dirty but more transparent
age = age/100#We recode age for scale purpose
for (task in c(0,1)){
ntrials = runif(1, 50, 150)
subdata = data.frame(trial=1:ntrials)
mu = mu_i+b_age*age+b_task*task+b_age_task*age*task
subdata$wm = rnorm(ntrials, mu, sd=sd_wm)
subdata$participant = as.character(i)
subdata$age = age
subdata$task = task
data = rbind(data, subdata)
}
}
}
means_ = aggregate(data$wm, FUN=mean,
by=list(age=data$age, task=data$task, participant = data$participant))#pre-averaging at first
#We let brms use the default priors out of commodity
blm0 = brm(x ~ age * task, data=means_,
family="normal", chain = 4, cores=4, warmup=500, iter=1000)
Compiling Stan program... Start sampling
print(stancode(blm0))
// generated with brms 2.14.4
functions {
}
data {
int<lower=1> N; // total number of observations
vector[N] Y; // response variable
int<lower=1> K; // number of population-level effects
matrix[N, K] X; // population-level design matrix
int prior_only; // should the likelihood be ignored?
}
transformed data {
int Kc = K - 1;
matrix[N, Kc] Xc; // centered version of X without an intercept
vector[Kc] means_X; // column means of X before centering
for (i in 2:K) {
means_X[i - 1] = mean(X[, i]);
Xc[, i - 1] = X[, i] - means_X[i - 1];
}
}
parameters {
vector[Kc] b; // population-level effects
real Intercept; // temporary intercept for centered predictors
real<lower=0> sigma; // residual SD
}
transformed parameters {
}
model {
// likelihood including all constants
if (!prior_only) {
target += normal_id_glm_lpdf(Y | Xc, Intercept, b, sigma);
}
// priors including all constants
target += student_t_lpdf(Intercept | 3, 2.5, 3.2);
target += student_t_lpdf(sigma | 3, 0, 3.2)
- 1 * student_t_lccdf(0 | 3, 0, 3.2);
}
generated quantities {
// actual population-level intercept
real b_Intercept = Intercept - dot_product(means_X, b);
}
blm0
Family: gaussian
Links: mu = identity; sigma = identity
Formula: x ~ age * task
Data: means_ (Number of observations: 272)
Samples: 4 chains, each with iter = 1000; warmup = 500; thin = 1;
total post-warmup samples = 2000
Population-Level Effects:
Estimate Est.Error l-95% CI u-95% CI Rhat Bulk_ESS Tail_ESS
Intercept 5.59 0.42 4.74 6.37 1.00 897 1163
age -3.56 0.71 -4.85 -2.12 1.00 951 1124
task -1.84 0.57 -2.97 -0.70 1.00 845 979
age:task -2.23 0.98 -4.11 -0.33 1.00 839 1015
Family Specific Parameters:
Estimate Est.Error l-95% CI u-95% CI Rhat Bulk_ESS Tail_ESS
sigma 2.34 0.10 2.14 2.56 1.00 1569 1223
Samples were drawn using sampling(NUTS). For each parameter, Bulk_ESS
and Tail_ESS are effective sample size measures, and Rhat is the potential
scale reduction factor on split chains (at convergence, Rhat = 1).
options(repr.plot.width=8, repr.plot.height=8, repr.plot.res = 200)#hidden code for display size
plot(blm0)
options(repr.plot.width=8, repr.plot.height=5, repr.plot.res = 200)#hidden code for display size
pp_check(nsamples = 100, blm0)
Every point raised until now in the course apply, e.g. we can check for normality and homeoscedasticity of the residuals
library(ggplot2)
fitted <- fitted(blm0)[, 1]
resid <- residuals(blm0)[, 1]
qplot(fitted, resid)
We see that there is some heteroscedasticity, we applied the wrong model as also seen in the estimation of the residual SD (but see full LMM)
hist(resid)
We can also look at the correlation between parameter estimation
pairs(blm0)
Bayesian estimation allows you to easily use the uncertainty associated with the estimation of parameters. E.g. we can draw 500 regression lines from the posterior distribution
#install.packages('tidybayes') see http://mjskay.github.io/tidybayes/articles/tidy-brms.html
library(dplyr)
library(ggplot2)
library(tidybayes)
library(modelr)
means_ %>%
group_by(task) %>%
data_grid(age = seq_range(age, n = 100)) %>%
add_fitted_draws(blm0, n = 50) %>%
ggplot(aes(x = age, y = x, color = ordered(task))) +
geom_line(aes(y = .value, group = paste(task, .draw)), alpha = .1) +
geom_point(data = means_)
Attaching package: ‘dplyr’
The following object is masked from ‘package:MASS’:
select
The following objects are masked from ‘package:stats’:
filter, lag
The following objects are masked from ‘package:base’:
intersect, setdiff, setequal, union
Attaching package: ‘tidybayes’
The following objects are masked from ‘package:brms’:
dstudent_t, pstudent_t, qstudent_t, rstudent_t
Additionally MCMC process allows you to access the estimates of each iteration
get_variables(blm0)
Imagine you want to compute the predicted wm at age 70 but keeping the uncertainty associated with the parameters
intercept = blm0 %>% spread_draws(b_Intercept)
slope_age = blm0 %>% spread_draws(b_age)
wm_at_age_70 = intercept$b_Intercept + slope_age$b_age*.70
hist(wm_at_age_70,breaks=100)
Now directly to full linear mixed models
priors = c(prior(normal(6, 2), class = Intercept),#The prior we have on the intercept
prior(cauchy(1, 5), class = sigma),#The prior we have on the SD a fat-tailed truncated distribution
prior(normal(0,10), class = b, coef=age),#prior on the age effect
prior(normal(0,10), class = b, coef=task),#prior on the task effect
prior(normal(0,10), class = b, coef=age:task),#prior on the interaction between age and task
prior(lkj(2), class = cor))#prior on the correlation between RE
#The following is too slow to be run in class so I just recover it from a previous fit
blm1 = brm(wm ~ age * task + (age * task | participant), data=data, prior = priors,
family="normal", chain = 8, cores=8, warmup=1000, iter=1200, file="blm1")
Plotting the population parameters
options(repr.plot.width=8, repr.plot.height=8, repr.plot.res = 200)#hidden code for display size
plot(blm1)
options(repr.plot.width=8, repr.plot.height=5, repr.plot.res = 200)#hidden code for display size
Looking at the summary of the population parameters
summary(blm1)
Warning message: “There were 1 divergent transitions after warmup. Increasing adapt_delta above 0.8 may help. See http://mc-stan.org/misc/warnings.html#divergent-transitions-after-warmup”
Family: gaussian
Links: mu = identity; sigma = identity
Formula: wm ~ age * task + (age * task | participant)
Data: data (Number of observations: 25006)
Samples: 8 chains, each with iter = 1200; warmup = 1000; thin = 1;
total post-warmup samples = 1600
Group-Level Effects:
~participant (Number of levels: 15)
Estimate Est.Error l-95% CI u-95% CI Rhat Bulk_ESS
sd(Intercept) 0.93 0.21 0.61 1.42 1.00 1066
sd(age) 2.09 0.46 1.41 3.20 1.00 995
sd(task) 0.98 0.21 0.65 1.46 1.00 1240
sd(age:task) 0.76 0.18 0.49 1.19 1.01 964
cor(Intercept,age) 0.11 0.23 -0.35 0.54 1.01 1037
cor(Intercept,task) 0.07 0.24 -0.38 0.51 1.01 1761
cor(age,task) 0.19 0.23 -0.29 0.60 1.01 917
cor(Intercept,age:task) -0.08 0.24 -0.53 0.37 1.01 1570
cor(age,age:task) 0.34 0.22 -0.14 0.72 1.01 1302
cor(task,age:task) 0.03 0.26 -0.48 0.52 1.00 1630
Tail_ESS
sd(Intercept) 1067
sd(age) 981
sd(task) 1235
sd(age:task) 1193
cor(Intercept,age) 1025
cor(Intercept,task) 1294
cor(age,task) 1008
cor(Intercept,age:task) 1259
cor(age,age:task) 1319
cor(task,age:task) 1130
Population-Level Effects:
Estimate Est.Error l-95% CI u-95% CI Rhat Bulk_ESS Tail_ESS
Intercept 6.05 0.26 5.56 6.58 1.00 510 769
age -4.49 0.58 -5.61 -3.28 1.01 640 854
task -1.64 0.27 -2.15 -1.09 1.01 801 1129
age:task -1.98 0.22 -2.40 -1.55 1.00 1254 1313
Family Specific Parameters:
Estimate Est.Error l-95% CI u-95% CI Rhat Bulk_ESS Tail_ESS
sigma 1.00 0.00 0.99 1.00 1.01 2032 1085
Samples were drawn using sampling(NUTS). For each parameter, Bulk_ESS
and Tail_ESS are effective sample size measures, and Rhat is the potential
scale reduction factor on split chains (at convergence, Rhat = 1).
pp_check(nsamples = 100, blm1)
Every point raised until now in the course apply, e.g. we can check for normality and homeoscedasticity of the residuals
fitted <- fitted(blm1)[, 1]
resid <- residuals(blm1)[, 1]
qplot(fitted, resid)
hist(resid)
pairs(blm1, pars = "^sd_")#looking at the scatterplotes for the SD parameters
get_variables(blm1)
This section cannot be made in the time it should take, therefore it serves the purpose to show you how GLMMs are direct extension of LMMs assuming a different underlying distribution of the DV.
Note that we are going to skip some tests in order to have enough time for the lecture but every point raised during this module (and the two previous ones) on MCMC convergence and models assumption diagnostics also applies to GLMMs (except normality of the residuals course)
First an example with Reaction Times (RTs).
For this illustration we are going to use a synthetic dataset created in another lecture on mental chronometry (if interested see the github repository )
data = read.csv("data.csv", sep='')
data$rt = data$rt * 1000 #from s to ms
data$encoding = as.character(data$encoding)
data$participant = as.character(data$participant)
data$ease = data$ease-1
data$encoding = ifelse(data$encoding == "standard", 0, 1)
data$response = ifelse(data$response == "upper",1, 0)
require(lattice)#needed for histogram plot
#Histogram for first five participants
histogram( ~ rt | participant ,data = data[data$participant %in% unique(data$participant)[1:5],], breaks=20)
Loading required package: lattice
means = aggregate(data$rt, FUN=mean,
by=list(participant=data$participant, ease=data$ease, encoding=data$encoding)) #getting the mean for each experimental cell X participant
grand_means = aggregate(means$x, FUN=mean,
by=list(ease=means$ease, encoding=means$encoding)) #getting the mean for each experimental cell
options(repr.plot.width=12, repr.plot.height=5, repr.plot.res = 400)
ggplot(grand_means, aes(x=ease, y=x, color=ordered(encoding))) +
geom_line()+
geom_point(size=4) +
scale_size_area() +
xlab("easiness")+
ylab("RT")
options(repr.plot.width=8, repr.plot.height=5, repr.plot.res = 200)#hidden code for display size
First we are going to apply an LMM on the raw single RTs then we are going to model these RTs using a lognormal
priors = c(prior(normal(700, 200), class = Intercept),#The prior we have on the intercept
prior(cauchy(100, 50), class = sigma),#The prior we have on the SD a fat-tailed truncated distribution
prior(normal(0,100), class = b, coef=ease),#prior on the age effect
prior(normal(0,100), class = b, coef=encoding),#prior on the task effect
prior(normal(0,100), class = b, coef=ease:encoding),#prior on the task effect
prior(lkj(2), class = cor))#prior on the correlation between RE
#The following is too slow to be run in class so I just recover it from a previous fit
blm2_rt = brm(rt ~ ease * encoding + (ease*encoding|participant), data=data, prior = priors,
family="normal", chain = 4, cores=4, warmup=1000, iter=2000, file="blm2_rt")
blm2_rt
Family: gaussian
Links: mu = identity; sigma = identity
Formula: rt ~ ease * encoding + (ease * encoding | participant)
Data: data (Number of observations: 100000)
Samples: 4 chains, each with iter = 2000; warmup = 1000; thin = 1;
total post-warmup samples = 4000
Group-Level Effects:
~participant (Number of levels: 50)
Estimate Est.Error l-95% CI u-95% CI Rhat Bulk_ESS
sd(Intercept) 124.77 12.35 103.22 151.55 1.00 981
sd(ease) 25.69 2.58 21.05 31.23 1.00 1161
sd(encoding) 65.65 7.23 53.47 81.21 1.00 2307
sd(ease:encoding) 18.03 2.28 13.85 22.85 1.00 1350
cor(Intercept,ease) -0.74 0.07 -0.85 -0.59 1.00 1280
cor(Intercept,encoding) 0.49 0.11 0.25 0.69 1.00 1953
cor(ease,encoding) -0.51 0.11 -0.71 -0.27 1.00 1645
cor(Intercept,ease:encoding) -0.15 0.14 -0.43 0.13 1.00 2181
cor(ease,ease:encoding) 0.07 0.15 -0.23 0.36 1.00 1774
cor(encoding,ease:encoding) -0.47 0.12 -0.68 -0.21 1.00 2015
Tail_ESS
sd(Intercept) 1638
sd(ease) 1815
sd(encoding) 2568
sd(ease:encoding) 2434
cor(Intercept,ease) 2169
cor(Intercept,encoding) 2722
cor(ease,encoding) 2204
cor(Intercept,ease:encoding) 2512
cor(ease,ease:encoding) 2311
cor(encoding,ease:encoding) 2260
Population-Level Effects:
Estimate Est.Error l-95% CI u-95% CI Rhat Bulk_ESS Tail_ESS
Intercept 700.30 18.06 664.08 734.05 1.00 426 797
ease -72.65 3.90 -80.30 -64.77 1.00 578 867
encoding 192.50 9.50 173.73 210.82 1.00 1161 1979
ease:encoding 112.17 2.87 106.48 117.80 1.00 1957 2438
Family Specific Parameters:
Estimate Est.Error l-95% CI u-95% CI Rhat Bulk_ESS Tail_ESS
sigma 263.32 0.59 262.15 264.48 1.00 5389 2317
Samples were drawn using sampling(NUTS). For each parameter, Bulk_ESS
and Tail_ESS are effective sample size measures, and Rhat is the potential
scale reduction factor on split chains (at convergence, Rhat = 1).
Now modeling RT as :
$$ RT_{ij} \sim lognormal(\mu_j, \sigma)$$priors = c(prior(normal(6, 1), class = Intercept),#The prior we have on the intercept
prior(cauchy(.5, 1), class = sigma),#The prior we have on the SD a fat-tailed truncated distribution
prior(normal(0,1), class = b, coef=ease),#prior on the age effect
prior(normal(0,1), class = b, coef=encoding),#prior on the task effect
prior(normal(0,1), class = b, coef=ease:encoding),#prior on the task effect
prior(lkj(2), class = cor))#prior on the correlation between RE
#The following is too slow to be run in class so I just recover it from a previous fit
blm2_lognorm_rt = brm(rt ~ ease * encoding + (ease*encoding|participant), data=data, prior = priors,
family="lognormal", chain = 4, cores=4, warmup=1000, iter=2000, file="blm2_lognorm_rt")
# Equivalent to : brm(log(rt) ~ ease * encoding + (ease*encoding|participant),family="normal",...)
blm2_lognorm_rt
Family: lognormal
Links: mu = identity; sigma = identity
Formula: rt ~ ease * encoding + (ease * encoding | participant)
Data: data (Number of observations: 100000)
Samples: 4 chains, each with iter = 2000; warmup = 1000; thin = 1;
total post-warmup samples = 4000
Group-Level Effects:
~participant (Number of levels: 50)
Estimate Est.Error l-95% CI u-95% CI Rhat Bulk_ESS
sd(Intercept) 0.18 0.02 0.15 0.22 1.01 534
sd(ease) 0.03 0.00 0.03 0.04 1.00 924
sd(encoding) 0.06 0.01 0.05 0.08 1.00 1966
sd(ease:encoding) 0.03 0.00 0.02 0.04 1.00 1162
cor(Intercept,ease) -0.30 0.13 -0.54 -0.02 1.00 856
cor(Intercept,encoding) -0.15 0.14 -0.41 0.13 1.00 1901
cor(ease,encoding) -0.10 0.15 -0.38 0.18 1.01 1256
cor(Intercept,ease:encoding) -0.52 0.11 -0.72 -0.28 1.00 976
cor(ease,ease:encoding) -0.44 0.12 -0.65 -0.19 1.00 1399
cor(encoding,ease:encoding) -0.10 0.14 -0.36 0.18 1.00 1647
Tail_ESS
sd(Intercept) 1102
sd(ease) 1540
sd(encoding) 2614
sd(ease:encoding) 1769
cor(Intercept,ease) 1404
cor(Intercept,encoding) 2544
cor(ease,encoding) 1880
cor(Intercept,ease:encoding) 1460
cor(ease,ease:encoding) 2135
cor(encoding,ease:encoding) 2640
Population-Level Effects:
Estimate Est.Error l-95% CI u-95% CI Rhat Bulk_ESS Tail_ESS
Intercept 6.47 0.03 6.41 6.52 1.01 343 618
ease -0.11 0.00 -0.12 -0.10 1.01 649 1005
encoding 0.24 0.01 0.22 0.26 1.01 1122 1744
ease:encoding 0.17 0.00 0.16 0.18 1.01 695 1144
Family Specific Parameters:
Estimate Est.Error l-95% CI u-95% CI Rhat Bulk_ESS Tail_ESS
sigma 0.28 0.00 0.28 0.28 1.00 4657 2462
Samples were drawn using sampling(NUTS). For each parameter, Bulk_ESS
and Tail_ESS are effective sample size measures, and Rhat is the potential
scale reduction factor on split chains (at convergence, Rhat = 1).
Now about RTs we could fit some more complex distribution (e.g. ex-gaussian, gamma...) but the linking function between parameters and predictors becomes unclear.
An alternative would be to use process models (e.g. Drift Diffusion Model) and build regression on top of its parameters intended to be interpretable.
An example with a Bernoulli process : response accuracy (correct vs. incorrect)
Usually such kind of data is pre-averaged and fed into a repeated measure ANOVA (usually with assumption violations, e.g. ceiling performances)
GLMMs allow to directly model trial accuracy :
$$ p(response_{ij}=1) \sim Bernoulli(logit(\mu_j))$$Where $\mu_j$ takes the same definition as previoulsy in the module 2
priors = c(prior(normal(1, 1), class = Intercept),#The prior we have on the intercept
prior(normal(0,.5), class = b, coef=ease),#prior on the age effect
prior(normal(0,.5), class = b, coef=encoding),#prior on the task effect
prior(normal(0,.5), class = b, coef=ease:encoding),#prior on the task effect
prior(lkj(2), class = cor))#prior on the correlation between RE
blm2_correct = brm(response ~ ease * encoding + (ease*encoding|participant), data=data, prior = priors,
family = bernoulli(), chain = 4, cores=4, warmup=1000, iter=2000, file="blm1_acc")
summary(blm2_correct)
Warning message: “There were 29 divergent transitions after warmup. Increasing adapt_delta above 0.8 may help. See http://mc-stan.org/misc/warnings.html#divergent-transitions-after-warmup”
Family: bernoulli
Links: mu = logit
Formula: response ~ ease * encoding + (ease * encoding | participant)
Data: data (Number of observations: 100000)
Samples: 4 chains, each with iter = 2000; warmup = 1000; thin = 1;
total post-warmup samples = 4000
Group-Level Effects:
~participant (Number of levels: 50)
Estimate Est.Error l-95% CI u-95% CI Rhat Bulk_ESS
sd(Intercept) 0.41 0.05 0.32 0.51 1.00 1154
sd(ease) 0.18 0.03 0.12 0.26 1.00 968
sd(encoding) 0.20 0.04 0.12 0.29 1.00 1724
sd(ease:encoding) 0.07 0.04 0.00 0.17 1.00 466
cor(Intercept,ease) 0.58 0.16 0.20 0.84 1.00 1467
cor(Intercept,encoding) -0.61 0.15 -0.83 -0.27 1.00 2065
cor(ease,encoding) -0.58 0.18 -0.87 -0.15 1.00 1171
cor(Intercept,ease:encoding) 0.03 0.34 -0.64 0.65 1.00 4072
cor(ease,ease:encoding) -0.34 0.36 -0.84 0.52 1.00 1579
cor(encoding,ease:encoding) 0.01 0.34 -0.64 0.67 1.00 2706
Tail_ESS
sd(Intercept) 2127
sd(ease) 557
sd(encoding) 2561
sd(ease:encoding) 418
cor(Intercept,ease) 1489
cor(Intercept,encoding) 2350
cor(ease,encoding) 1682
cor(Intercept,ease:encoding) 2621
cor(ease,ease:encoding) 2375
cor(encoding,ease:encoding) 2877
Population-Level Effects:
Estimate Est.Error l-95% CI u-95% CI Rhat Bulk_ESS Tail_ESS
Intercept 1.60 0.06 1.48 1.73 1.00 953 1536
ease 1.37 0.04 1.30 1.46 1.00 918 324
encoding -0.68 0.04 -0.77 -0.60 1.00 2079 2429
ease:encoding -0.62 0.04 -0.70 -0.55 1.01 887 313
Samples were drawn using sampling(NUTS). For each parameter, Bulk_ESS
and Tail_ESS are effective sample size measures, and Rhat is the potential
scale reduction factor on split chains (at convergence, Rhat = 1).
Now all these estimates are on the logit scale which is not easy to interpret, given that we can backtransform at each iteration of the MCMC processes we can represent the estimation on the natural scale :
inv.logit = function(x){
inv = exp(x)/(1+exp(x))
return(inv)
}
intercept = blm2_correct %>% spread_draws(b_Intercept)
hist(inv.logit(intercept$b_Intercept),breaks=100)
Books :
Articles :
See also the series of Lectures by R. McElreath on youtube