library(rio)
library(lavaan)
library(semTools)
library(semhelpinghands)
4 Model Evaluation and Comparison
You can download the R code used in this lab by right-clicking this link and selecting “Save Link As…” in the drop-down menu: modelevaluation.R
4.1 Loading R Packages
Load the required packages for this lab into your R environment:
4.2 Loading Data
Load the data into your environment. We will use the same data we’ve used in the previous Lab:
<- import(file = "data/meece.csv") meece
4.3 Model 1
Model Specification
Next, we will specify a bare bones model that aligns with the central hypotheses of the authors behind these data. We will take a model building approach. The central hypotheses are:
- Science Attitudes affect Active Cognitive Engagement through the intermediate variable Ego-Social Goals
- Intrinsic Motivation affects Active Cognitive Engagement through the intermediate variable Task Mastery Goals
<- '
mod1 ActiveEn ~ b1*TaskMast + b2*EgoSoc
TaskMast ~ a1*IntMot
EgoSoc ~ a2*SciAtt
SciAtt ~~ IntMot
# indirect and total effect between IntMot > AciveEn
im.tm.ind := a1*b1
# indirect effects between SciAtt > AciveEn
sa.es.ind := a2*b2
'
What are some additional theory-informed hypotheses we can list?
Model Estimation
Some of the local fit options are not available when bootstrapping is used during estimation, so for now, we will estimate the model without bootstrapping. Effects that are defined after estimation (such as indirect effects) do not affect the fit of a model to the data, so this decision will not affect our model evaluation choices. Once the model evaluation process is complete, we can re-estimate the final model with bootstrapping.
<- sem(model = mod1, data = meece) fit1
Model Evaluation
Here, we will follow the steps described in the textbook.
Global Fit
First, we will test the exact fit of the model to the data using the Chi-square model test. We can find this statistic by including fit.measures = T
in the summary()
function and finding the section labeled Model Test User Model
. Note that I also set estimates = F
, so that the parameter estimates are not included in the output (mostly to reduce the length of the output):
summary(fit1, fit.measures = T, estimates = F)
lavaan 0.6-19 ended normally after 18 iterations
Estimator ML
Optimization method NLMINB
Number of model parameters 10
Number of observations 256
Model Test User Model:
Test statistic 56.448
Degrees of freedom 5
P-value (Chi-square) 0.000
Model Test Baseline Model:
Test statistic 398.422
Degrees of freedom 10
P-value 0.000
User Model versus Baseline Model:
Comparative Fit Index (CFI) 0.868
Tucker-Lewis Index (TLI) 0.735
Loglikelihood and Information Criteria:
Loglikelihood user model (H0) -828.169
Loglikelihood unrestricted model (H1) -799.945
Akaike (AIC) 1676.338
Bayesian (BIC) 1711.790
Sample-size adjusted Bayesian (SABIC) 1680.087
Root Mean Square Error of Approximation:
RMSEA 0.200
90 Percent confidence interval - lower 0.155
90 Percent confidence interval - upper 0.249
P-value H_0: RMSEA <= 0.050 0.000
P-value H_0: RMSEA >= 0.080 1.000
Standardized Root Mean Square Residual:
SRMR 0.094
What does the Chi-square test tell us about the exact fit of this model?
Other fit indices that are reported here:
- Model Test Baseline Model: This is the Chi-square of the worst model, that is used to compute the CFI/TLI. This Chi-square should always be larger than the Model Test User Model.
- CFI/TLI: Goodness-of-fit indices, values closer to 1 indicate better fit
- Loglikelihood and Information Criteria: This information is used to compute the AIC and BIC (and SABIC), which help you compare non-nested models
- Root Mean Square Error of Approximation: Badness-of-fit index, values closer to 0 indicate better fit. This section helps you use the 90% confidence interval to compute the p-value for H0: RMSEA is <= .05 (if p > .05, this indicates good fit) and H0 = RMSEA >= .08 (if p > .05, this indicates poor fit).
- SRMR: Badness-of-fit index, values closer to 0 indicate better fit.
Local Fit
To identify where the friction between the data and the model might be, we can look at local fit using the covariance residuals. It can be helpful to compare results across different methods:
residuals(fit1, type = "standardized")
$type
[1] "standardized"
$cov
ActvEn TskMst EgoSoc IntMot SciAtt
ActiveEn 2.905
TaskMast 2.905 0.000
EgoSoc 2.905 2.905 0.000
IntMot 1.658 0.000 -2.346 0.000
SciAtt 3.026 4.630 0.000 0.000 0.000
residuals(fit1, type = "normalized")
$type
[1] "normalized"
$cov
ActvEn TskMst EgoSoc IntMot SciAtt
ActiveEn 0.311
TaskMast 0.263 0.000
EgoSoc 1.779 2.635 0.000
IntMot 0.971 0.000 -1.908 0.000
SciAtt 2.303 3.208 0.000 0.000 0.000
residuals(fit1, type = "cor.bollen")
$type
[1] "cor.bollen"
$cov
ActvEn TskMst EgoSoc IntMot SciAtt
ActiveEn 0.000
TaskMast 0.011 0.000
EgoSoc 0.112 0.166 0.000
IntMot 0.060 0.000 -0.122 0.000
SciAtt 0.148 0.222 0.000 0.000 0.000
What patterns can you see in the covariance residuals? What path could we add to the model?
4.4 Model 2
Model Respecification
Lets add the hypothesis that Task Mastery Goals and Ego-Social Goals are correlated/associated
<- '
mod2 ActiveEn ~ b1*TaskMast + b2*EgoSoc
TaskMast ~ a1*IntMot
EgoSoc ~ a2*SciAtt
TaskMast ~~ EgoSoc
SciAtt ~~ IntMot
# indirect and total effect between IntMot > AciveEn
im.tm.ind := a1*b1
# indirect effects between SciAtt > AciveEn
sa.es.ind := a2*b2
'
We can estimate the model and save it in fit2
:
<- sem(model = mod2, data = meece) fit2
Model Evaluation
We will use the same sources of information to evaluate the fit of this modified model.
Global Fit
summary(fit2, fit.measures = T, estimates = F)
lavaan 0.6-19 ended normally after 24 iterations
Estimator ML
Optimization method NLMINB
Number of model parameters 11
Number of observations 256
Model Test User Model:
Test statistic 32.567
Degrees of freedom 4
P-value (Chi-square) 0.000
Model Test Baseline Model:
Test statistic 398.422
Degrees of freedom 10
P-value 0.000
User Model versus Baseline Model:
Comparative Fit Index (CFI) 0.926
Tucker-Lewis Index (TLI) 0.816
Loglikelihood and Information Criteria:
Loglikelihood user model (H0) -816.229
Loglikelihood unrestricted model (H1) -799.945
Akaike (AIC) 1654.457
Bayesian (BIC) 1693.454
Sample-size adjusted Bayesian (SABIC) 1658.581
Root Mean Square Error of Approximation:
RMSEA 0.167
90 Percent confidence interval - lower 0.117
90 Percent confidence interval - upper 0.222
P-value H_0: RMSEA <= 0.050 0.000
P-value H_0: RMSEA >= 0.080 0.997
Standardized Root Mean Square Residual:
SRMR 0.080
What does the Chi-square test tell us about the exact fit of this model?
Local Fit
We can examine to what extend the added covariance between Task Mastery and Ego-Social Goals has resolved the local fit issues (only showing one method here):
residuals(fit2, type = "cor.bollen")
$type
[1] "cor.bollen"
$cov
ActvEn TskMst EgoSoc IntMot SciAtt
ActiveEn 0.000
TaskMast -0.009 0.000
EgoSoc -0.054 -0.078 0.000
IntMot 0.058 -0.017 -0.083 0.000
SciAtt 0.153 0.213 0.069 0.000 0.000
What patterns can you see in the covariance residuals? What path could we add to the model?
4.5 Model Comparison of Model 1 and Model 2
We can also use the Chi-square difference test to compare the original and modified model to each other. Here, the original model is more constraint and the modified model is less constraint. The Chi-square difference test, tests the Null hypothesis: Chi-square constrained == Chi-square free.
- If the Chi-square difference test is significant, it means that the contrained model fits the data worse than the free model, which means that adding the additional path helped us improve the global fit.
- If the Chi-square difference test is not significant, it means that the constrained mode fits the data equally well as the free model, which means that adding the additional path did not improve model fit.
To ensure that our models are nested, we use the net()
function from the semTools
package. This function checks if two models are nested and if two models are equivalent:
net(fit1, fit2)
If cell [R, C] is TRUE, the model in row R is nested within column C.
If the models also have the same degrees of freedom, they are equivalent.
NA indicates the model in column C did not converge when fit to the
implied means and covariance matrix from the model in row R.
The hidden diagonal is TRUE because any model is equivalent to itself.
The upper triangle is hidden because for models with the same degrees
of freedom, cell [C, R] == cell [R, C]. For all models with different
degrees of freedom, the upper diagonal is all FALSE because models with
fewer degrees of freedom (i.e., more parameters) cannot be nested
within models with more degrees of freedom (i.e., fewer parameters).
fit2 fit1
fit2 (df = 4)
fit1 (df = 5) TRUE
Next, we can use the compareFit()
function from the semTools
package to compare the models using the Chi-square difference test:
<- compareFit(fit1, fit2)
comp12 summary(comp12)
################### Nested Model Comparison #########################
Chi-Squared Difference Test
Df AIC BIC Chisq Chisq diff RMSEA Df diff Pr(>Chisq)
fit2 4 1654.5 1693.5 32.568
fit1 5 1676.3 1711.8 56.448 23.881 0.29896 1 1.025e-06 ***
---
Signif. codes: 0 '***' 0.001 '**' 0.01 '*' 0.05 '.' 0.1 ' ' 1
####################### Model Fit Indices ###########################
chisq df pvalue rmsea cfi tli srmr aic bic
fit2 32.567† 4 .000 .167† .926† .816† .080† 1654.457† 1693.454†
fit1 56.448 5 .000 .200 .868 .735 .094 1676.338 1711.790
################## Differences in Fit Indices #######################
df rmsea cfi tli srmr aic bic
fit1 - fit2 1 0.033 -0.059 -0.081 0.014 21.881 18.336
Does the modified model result in an improvement in global fit?
4.6 Model 3
Model Respecification
We can keep building out model by including a path from Science Attitudes to Task Mastery Goals, which means that we hypothesize that Science Attitudes indirectly affect Active Cognitive Engagement through both Ego-Social and Task Mastery Goals.
<- '
mod3 ActiveEn ~ b1*TaskMast + b2*EgoSoc
TaskMast ~ a1*IntMot + a3*SciAtt
EgoSoc ~ a2*SciAtt
TaskMast ~~ EgoSoc
SciAtt ~~ IntMot
# indirect and total effect between IntMot > AciveEn
im.tm.ind := a1*b1
# indirect effects between SciAtt > AciveEn
sa.es.ind := a2*b2
sa.tm.ind := a3*b1
sa.total.ind := (a2*b2) + (a3*b1)
'
We can estimate the model and save it in fit3
:
<- sem(model = mod3, data = meece) fit3
Model Evaluation
We will use the same sources of information to evaluate the fit of this modified model.
Global Fit
summary(fit3, fit.measures = T, estimates = F)
lavaan 0.6-19 ended normally after 23 iterations
Estimator ML
Optimization method NLMINB
Number of model parameters 12
Number of observations 256
Model Test User Model:
Test statistic 11.194
Degrees of freedom 3
P-value (Chi-square) 0.011
Model Test Baseline Model:
Test statistic 398.422
Degrees of freedom 10
P-value 0.000
User Model versus Baseline Model:
Comparative Fit Index (CFI) 0.979
Tucker-Lewis Index (TLI) 0.930
Loglikelihood and Information Criteria:
Loglikelihood user model (H0) -805.542
Loglikelihood unrestricted model (H1) -799.945
Akaike (AIC) 1635.084
Bayesian (BIC) 1677.626
Sample-size adjusted Bayesian (SABIC) 1639.583
Root Mean Square Error of Approximation:
RMSEA 0.103
90 Percent confidence interval - lower 0.044
90 Percent confidence interval - upper 0.171
P-value H_0: RMSEA <= 0.050 0.067
P-value H_0: RMSEA >= 0.080 0.775
Standardized Root Mean Square Residual:
SRMR 0.037
What does the Chi-square test tell us about the exact fit of this model?
Local Fit
residuals(fit3, type = "cor.bollen")
$type
[1] "cor.bollen"
$cov
ActvEn TskMst EgoSoc IntMot SciAtt
ActiveEn 0.000
TaskMast -0.006 0.000
EgoSoc -0.025 -0.037 0.000
IntMot 0.047 -0.025 -0.122 0.000
SciAtt 0.001 0.004 0.000 0.000 0.000
What patterns can you see in the covariance residuals? What path could we add to the model?
4.7 Model Comparison of Model 2 and Model 3
<- compareFit(fit3, fit2)
comp23 summary(comp23)
################### Nested Model Comparison #########################
Chi-Squared Difference Test
Df AIC BIC Chisq Chisq diff RMSEA Df diff Pr(>Chisq)
fit3 3 1635.1 1677.6 11.194
fit2 4 1654.5 1693.5 32.568 21.373 0.2821 1 3.78e-06 ***
---
Signif. codes: 0 '***' 0.001 '**' 0.01 '*' 0.05 '.' 0.1 ' ' 1
####################### Model Fit Indices ###########################
chisq df pvalue rmsea cfi tli srmr aic bic
fit3 11.194† 3 .011 .103† .979† .930† .037† 1635.084† 1677.626†
fit2 32.567 4 .000 .167 .926 .816 .080 1654.457 1693.454
################## Differences in Fit Indices #######################
df rmsea cfi tli srmr aic bic
fit2 - fit3 1 0.064 -0.052 -0.114 0.043 19.373 15.828
Does the modified model result in an improvement in global fit?
4.8 Model 4
Model Respecification
Lets add the path from Intrinsic Motivation to Ego-Social Goals, which matches the hypothesis that Intrinsic Motivation indirectly affects Cognitive Engagement through both Task Mastery and Ego-Social Goals:
<- '
mod4 ActiveEn ~ b1*TaskMast + b2*EgoSoc
TaskMast ~ a1*IntMot + a3*SciAtt
EgoSoc ~ a2*SciAtt + a4*IntMot
TaskMast ~~ EgoSoc
SciAtt ~~ IntMot
# indirect and total effect between IntMot > AciveEn
im.tm.ind := a1*b1
im.es.ind := a4*b1
im.total.ind := (a1*b1) + (a4*b1)
# indirect effects between SciAtt > AciveEn
sa.es.ind := a2*b2
sa.tm.ind := a3*b1
sa.total.ind := (a2*b2) + (a3*b1)
'
We can estimate the model and save it in fit4
:
<- sem(model = mod4, data = meece) fit4
Model Evaluation
We will use the same sources of information to evaluate the fit of this modified model.
Global Fit
summary(fit4, fit.measures = T, estimates = F)
lavaan 0.6-19 ended normally after 23 iterations
Estimator ML
Optimization method NLMINB
Number of model parameters 13
Number of observations 256
Model Test User Model:
Test statistic 5.506
Degrees of freedom 2
P-value (Chi-square) 0.064
Model Test Baseline Model:
Test statistic 398.422
Degrees of freedom 10
P-value 0.000
User Model versus Baseline Model:
Comparative Fit Index (CFI) 0.991
Tucker-Lewis Index (TLI) 0.955
Loglikelihood and Information Criteria:
Loglikelihood user model (H0) -802.698
Loglikelihood unrestricted model (H1) -799.945
Akaike (AIC) 1631.396
Bayesian (BIC) 1677.483
Sample-size adjusted Bayesian (SABIC) 1636.270
Root Mean Square Error of Approximation:
RMSEA 0.083
90 Percent confidence interval - lower 0.000
90 Percent confidence interval - upper 0.169
P-value H_0: RMSEA <= 0.050 0.186
P-value H_0: RMSEA >= 0.080 0.612
Standardized Root Mean Square Residual:
SRMR 0.021
What does the Chi-square test tell us about the exact fit of this model?
Local Fit
residuals(fit4, type = "cor.bollen")
$type
[1] "cor.bollen"
$cov
ActvEn TskMst EgoSoc IntMot SciAtt
ActiveEn 0.000
TaskMast 0.000 0.000
EgoSoc 0.000 0.000 0.000
IntMot 0.079 0.000 0.000 0.000
SciAtt -0.002 0.000 0.000 0.000 0.000
What patterns can you see in the covariance residuals? What path could we add to the model?
4.9 Model Comparison of Model 3 and Model 4
<- compareFit(fit3, fit4)
comp34 summary(comp34)
################### Nested Model Comparison #########################
Chi-Squared Difference Test
Df AIC BIC Chisq Chisq diff RMSEA Df diff Pr(>Chisq)
fit4 2 1631.4 1677.5 5.5063
fit3 3 1635.1 1677.6 11.1941 5.6878 0.13532 1 0.01708 *
---
Signif. codes: 0 '***' 0.001 '**' 0.01 '*' 0.05 '.' 0.1 ' ' 1
####################### Model Fit Indices ###########################
chisq df pvalue rmsea cfi tli srmr aic bic
fit4 5.506† 2 .064 .083† .991† .955† .021† 1631.396† 1677.483†
fit3 11.194 3 .011 .103 .979 .930 .037 1635.084 1677.626
################## Differences in Fit Indices #######################
df rmsea cfi tli srmr aic bic
fit3 - fit4 1 0.021 -0.012 -0.025 0.017 3.688 0.143
Does the modified model result in an improvement in global fit?
4.10 Parameter Estimate Interpretation
Typically, you would look at the estimates of newly added paths at each step of the model building process, to ensure that the estimates align with theoretical expectations. Here, we kind of skipped that, but we will take a quick look now. Please refer back to the Indirect Effects R Lab to read more about interpreting indirect effects.
<- sem(model = mod4, data = meece,
fit4 se = "bootstrap", bootstrap = 1000,
iseed = 8789)
First, we can look at the (unstandardized) parameter estimates:
parameterEstimates(fit4, boot.ci.type = "bca.simple",
ci = TRUE, se = TRUE,
zstat = FALSE, pvalue = FALSE,
output = "text")
Regressions:
Estimate Std.Err ci.lower ci.upper
ActiveEn ~
TaskMast (b1) 0.499 0.035 0.430 0.572
EgoSoc (b2) 0.056 0.022 0.012 0.099
TaskMast ~
IntMot (a1) 0.248 0.064 0.128 0.372
SciAtt (a3) 0.255 0.051 0.153 0.358
EgoSoc ~
SciAtt (a2) -0.050 0.086 -0.214 0.126
IntMot (a4) -0.246 0.104 -0.454 -0.036
Covariances:
Estimate Std.Err ci.lower ci.upper
.TaskMast ~~
.EgoSoc 0.083 0.017 0.049 0.120
IntMot ~~
SciAtt 0.184 0.024 0.138 0.238
Variances:
Estimate Std.Err ci.lower ci.upper
.ActiveEn 0.060 0.005 0.052 0.073
.TaskMast 0.164 0.014 0.137 0.192
.EgoSoc 0.537 0.046 0.458 0.641
IntMot 0.290 0.024 0.249 0.343
SciAtt 0.371 0.032 0.312 0.441
Defined Parameters:
Estimate Std.Err ci.lower ci.upper
im.tm.ind 0.124 0.035 0.058 0.195
im.es.ind -0.123 0.052 -0.229 -0.021
im.total.ind 0.001 0.068 -0.134 0.133
sa.es.ind -0.003 0.005 -0.016 0.006
sa.tm.ind 0.127 0.025 0.078 0.178
sa.total.ind 0.124 0.027 0.071 0.178
Even though our model fits the data well, we can see that the path from Science Attitudes to Ego-Social Goals (which was part of our theoretical model at the start of all this), is not significant. We might want to explore if model trimming (i.e., removing this path) does not affect the overall model fit. However, since this path was so central to our theory, we would ideally want to collect another sample and test if the trimmed model can be replicated.
Another piece of output that we can examine are the R-squared estimates for each endogenous variable in the model. These represent the proportion of variance in the endogenous variables that can be explained by the predictors in the model. Remember, a well-fitting model can have endogenous variables with low R-squareds. This is because the model-implied variances include both the explained (R-squared) and unexplained/error (1 - R-squared) variance. Thus, a model can fit well as long as the model closely predicts the total variance in the endogenous variables, even if a large part of that variance is unexplained/error.
We can use lavInspect()
to get the R-squared estimates:
lavInspect(fit4, what = "rsquare")
ActiveEn TaskMast EgoSoc
0.504 0.284 0.041
For which endogenous variable does the model result in the largest unexplained/error variance?
4.11 Comparing groups
Next, we will look at a special use of model evaluation and comparison: multiple group path model. We will learn how to compare the same model across different groups. For that purpose, lets pretend that the meece
study was repeated for a sample of high school students (the original sample are middle school students). These data are saved in data_meece2.csv
.
You can download the data by right-clicking this link and selecting “Save Link As…” in the drop-down menu: data/meece2.csv. Make sure to save it in the folder you are using for this class.
We will add a sample
variable to each data frame and then combine the two data frames together:
<- import("data/meece2.csv")
meece2
# Create group variable for each dataset
$sample <- 1
meece$sample <- 2
meece2
# Combine two groups into one dataframe
<- rbind(meece, meece2) meece_all
For this example, we will estimate a simplified version of the model we’ve been working on so far, by only including Intrinsic Motivation as an exogenous variable. Compared to the single-group analysis model, we need to make some adjustments to the multiple-group scenario, specifically in how we label variables.
We initially start with a model in which all paths are estimated separately for each group. Now, each path needs a label for each group, and we use c()
to list these labels for each path. For example, for the path from Task Mastery to Active Engagement, we use c(b1.g1, b1.g2)
. We also need to define indirect (and if present total) effects for each group separately.
<- '
mod_g1 ActiveEn ~ c(b1.g1, b1.g2)*TaskMast + c(b2.g1, b2.g2)*EgoSoc
TaskMast ~ c(a1.g1, a1.g2)*IntMot
EgoSoc ~ c(a2.g1, a2.g2)*IntMot
TaskMast ~~ EgoSoc
# Group = 1
# indirect and total effect between IntMot > AciveEn
im.tm.ind.g1 := a1.g1*b1.g1
im.es.ind.g1 := a2.g1*b2.g1
im.total.ind.g1 := (a1.g1*b1.g1) + (a2.g1*b2.g1)
# Group = 2
# indirect and total effect between IntMot > AciveEn
im.tm.ind.g2 := a1.g2*b1.g2
im.es.ind.g2 := a2.g2*b2.g2
im.total.ind.g2 := (a1.g2*b1.g2) + (a2.g2*b2.g2)
'
When we estimate this model, we need to tell lavaan
what variable contains the group assignment for each participant by adding group = "group"
. In your data, your group variable might have a different name. Again, we’re going to skip bootstrapping because I am focused on demonstrating how you can compare estimates across groups using the Chi-square difference test:
<- sem(model = mod_g1, data = meece_all, group = "sample") fit_g1
If you use the summary()
function, you will get parameter estimates per group, printed out sequentially. For complex models, this can result in a very long output object:
summary(fit_g1, fit.measures = T)
lavaan 0.6-19 ended normally after 42 iterations
Estimator ML
Optimization method NLMINB
Number of model parameters 22
Number of observations per group:
1 256
2 256
Model Test User Model:
Test statistic 15.190
Degrees of freedom 2
P-value (Chi-square) 0.001
Test statistic for each group:
1 4.571
2 10.620
Model Test Baseline Model:
Test statistic 493.713
Degrees of freedom 12
P-value 0.000
User Model versus Baseline Model:
Comparative Fit Index (CFI) 0.973
Tucker-Lewis Index (TLI) 0.836
Loglikelihood and Information Criteria:
Loglikelihood user model (H0) -851.940
Loglikelihood unrestricted model (H1) -844.345
Akaike (AIC) 1747.881
Bayesian (BIC) 1841.124
Sample-size adjusted Bayesian (SABIC) 1771.292
Root Mean Square Error of Approximation:
RMSEA 0.161
90 Percent confidence interval - lower 0.092
90 Percent confidence interval - upper 0.240
P-value H_0: RMSEA <= 0.050 0.006
P-value H_0: RMSEA >= 0.080 0.971
Standardized Root Mean Square Residual:
SRMR 0.029
Parameter Estimates:
Standard errors Standard
Information Expected
Information saturated (h1) model Structured
Group 1 [1]:
Regressions:
Estimate Std.Err z-value P(>|z|)
ActiveEn ~
TaskMst (b1.1) 0.499 0.032 15.420 0.000
EgoSoc (b2.1) 0.056 0.021 2.728 0.006
TaskMast ~
IntMot (a1.1) 0.409 0.049 8.289 0.000
EgoSoc ~
IntMot (a2.1) -0.278 0.085 -3.266 0.001
Covariances:
Estimate Std.Err z-value P(>|z|)
.TaskMast ~~
.EgoSoc 0.080 0.020 3.956 0.000
Intercepts:
Estimate Std.Err z-value P(>|z|)
.ActiveEn 0.523 0.111 4.735 0.000
.TaskMast 2.035 0.143 14.221 0.000
.EgoSoc 3.272 0.247 13.262 0.000
Variances:
Estimate Std.Err z-value P(>|z|)
.ActiveEn 0.060 0.005 11.314 0.000
.TaskMast 0.181 0.016 11.314 0.000
.EgoSoc 0.538 0.048 11.314 0.000
Group 2 [2]:
Regressions:
Estimate Std.Err z-value P(>|z|)
ActiveEn ~
TaskMst (b1.2) 0.269 0.031 8.632 0.000
EgoSoc (b2.2) 0.195 0.020 9.621 0.000
TaskMast ~
IntMot (a1.2) 0.373 0.053 7.095 0.000
EgoSoc ~
IntMot (a2.2) -0.268 0.087 -3.090 0.002
Covariances:
Estimate Std.Err z-value P(>|z|)
.TaskMast ~~
.EgoSoc 0.071 0.021 3.450 0.001
Intercepts:
Estimate Std.Err z-value P(>|z|)
.ActiveEn 0.538 0.107 5.043 0.000
.TaskMast 2.137 0.151 14.132 0.000
.EgoSoc 3.201 0.250 12.823 0.000
Variances:
Estimate Std.Err z-value P(>|z|)
.ActiveEn 0.057 0.005 11.314 0.000
.TaskMast 0.196 0.017 11.314 0.000
.EgoSoc 0.533 0.047 11.314 0.000
Defined Parameters:
Estimate Std.Err z-value P(>|z|)
im.tm.ind.g1 0.204 0.028 7.301 0.000
im.es.ind.g1 -0.016 0.007 -2.094 0.036
im.total.nd.g1 0.188 0.030 6.219 0.000
im.tm.ind.g2 0.100 0.018 5.481 0.000
im.es.ind.g2 -0.052 0.018 -2.942 0.003
im.total.nd.g2 0.048 0.028 1.728 0.084
You can also use the group_by_groups()
function from the semhelpinghands
package to get a quick overview of just the parameter estimates to see if there are any that appear particularly different:
group_by_groups(fit_g1)
lhs op rhs est_1 est_2
1 ActiveEn ~ EgoSoc 0.056 0.195
2 ActiveEn ~ TaskMast 0.499 0.269
3 EgoSoc ~ IntMot -0.278 -0.268
4 TaskMast ~ IntMot 0.409 0.373
5 ActiveEn ~~ ActiveEn 0.060 0.057
6 EgoSoc ~~ EgoSoc 0.538 0.533
7 IntMot ~~ IntMot 0.290 0.276
8 TaskMast ~~ EgoSoc 0.080 0.071
9 TaskMast ~~ TaskMast 0.181 0.196
10 ActiveEn ~1 0.523 0.538
11 EgoSoc ~1 3.272 3.201
12 IntMot ~1 2.850 2.828
13 TaskMast ~1 2.035 2.137
Are there any estimates that look like they might differ across groups?
Note that multiple group analyses include the variable means/intercepts as well, which are denoted with e.g., ActiveEn ~ 1
. This means we can also test if the groups have equivalent average levels of the variables (where for endogenous variables, this is actually the intercept, or the expected value when all predictors equal 0).
Specifying a More Restrictive Model
To restrict certain parameters to be equivalent across groups, we need to assign the same label to the parameter in both groups. For example, the label of the path from Task Mastery to Active Engagement is now c(b1, b1)
. We might do this if we want to test the Null hypothesis of no group difference in the strength of an association between two variables. In this case, we might want to see if our new sample replicates the findings found in the original sample.
<- '
mod_g2 ActiveEn ~ c(b1, b1)*TaskMast + c(b2, b2)*EgoSoc
TaskMast ~ c(a1, a1)*IntMot
EgoSoc ~ c(a2, a2)*IntMot
TaskMast ~~ EgoSoc
# Group = 1 and 2 because no separate effects
# indirect and total effect between IntMot > AciveEn
im.tm.ind.g1 := a1*b1
im.es.ind.g1 := a2*b2
im.total.ind.g1 := (a1*b1) + (a2*b2)
'
When estimating this model, you still need to include group = "sample"
:
<- sem(model = mod_g2, data = meece_all, group = "sample") fit_g2
When we look at the group_by_groups
output, all the regression estimates should be equivalent across groups. They are estimated as though they are one parameter (instead of two, one per group). You can see the difference reflected in the degrees of freedom, which are much higher for this (more constrained) model:
group_by_groups(fit_g2)
lhs op rhs est_1 est_2
1 ActiveEn ~ EgoSoc 0.128 0.128
2 ActiveEn ~ TaskMast 0.379 0.379
3 EgoSoc ~ IntMot -0.274 -0.274
4 TaskMast ~ IntMot 0.392 0.392
5 ActiveEn ~~ ActiveEn 0.066 0.062
6 EgoSoc ~~ EgoSoc 0.538 0.533
7 IntMot ~~ IntMot 0.290 0.276
8 TaskMast ~~ EgoSoc 0.080 0.071
9 TaskMast ~~ TaskMast 0.181 0.196
10 ActiveEn ~1 0.730 0.351
11 EgoSoc ~1 3.259 3.216
12 IntMot ~1 2.850 2.828
13 TaskMast ~1 2.082 2.083
Are there parameters that are still estimated separately for each group?
Model Comparison using the Chi-square difference test
We can use the Chi-square difference test to compare the two multiple group models. In this case, a significant Chi-square difference test would indicate that the restrictive model (where all regression paths are specified to be equivalent across groups) fits the data worse than the free model, implying that there are likely some differences between the groups in terms of the associations between variables.
<- compareFit(fit_g1, fit_g2)
comp_g12 summary(comp_g12)
################### Nested Model Comparison #########################
Chi-Squared Difference Test
Df AIC BIC Chisq Chisq diff RMSEA Df diff Pr(>Chisq)
fit_g1 2 1747.9 1841.1 15.190
fit_g2 6 1782.0 1858.3 57.309 42.118 0.19294 4 1.577e-08 ***
---
Signif. codes: 0 '***' 0.001 '**' 0.01 '*' 0.05 '.' 0.1 ' ' 1
####################### Model Fit Indices ###########################
chisq df pvalue rmsea cfi tli srmr aic bic
fit_g1 15.190† 2 .001 .161† .973† .836† .029† 1747.881† 1841.124†
fit_g2 57.309 6 .000 .183 .893 .787 .071 1781.999 1858.289
################## Differences in Fit Indices #######################
df rmsea cfi tli srmr aic bic
fit_g2 - fit_g1 4 0.022 -0.079 -0.049 0.042 34.118 17.165
What can we conclude based on the chi-square difference test?
Looking at local fit per group
We can use information about the local fit for each group to identify variable pairs for which the model is fitting differently across groups. Here, we will only look at correlation residuals (in the interest of space):
residuals(fit_g2, type = "cor")
$`1`
$`1`$type
[1] "cor.bollen"
$`1`$cov
ActvEn TskMst EgoSoc IntMot
ActiveEn 0.000
TaskMast 0.123 0.000
EgoSoc -0.149 -0.006 0.000
IntMot 0.187 0.015 -0.003 0.000
$`1`$mean
ActiveEn TaskMast EgoSoc IntMot
0 0 0 0
$`2`
$`2`$type
[1] "cor.bollen"
$`2`$cov
ActvEn TskMst EgoSoc IntMot
ActiveEn 0.000
TaskMast -0.125 0.000
EgoSoc 0.157 0.007 0.000
IntMot 0.034 -0.017 0.004 0.000
$`2`$mean
ActiveEn TaskMast EgoSoc IntMot
0 0 0 0
Do you see any inconsistencies in the pattern across groups?
4.12 Summary
In this R lab, you were introduced to the steps involved in model evaluation and comparison, including using model comparison methods to compare multiple groups. Below, you’ll find two Bonus sections that demonstrate how to test whether specific direct or indirect effects are equivalent across groups and how to request and interpret modification indices. In the next R Lab, you will learn all about latent variables and confirmatory factor analysis, which will also require you to apply your knowledge of model evaluation and comparison.
4.13 Bonus 1: Testing the equivalence of specific direct and indirect effects
Sometimes, you may want to test the equivalence of specific paths, instead of groups of parameters as we did above. To do so, we can use the Wald test. For this comparison, we start with the unconstrained model (saved in fit_g1
), in which all direct effects are estimated separately for each group. With the code below, we test if the direct effect from Task Mastery > Active Engagement is equivalent across the two age groups:
lavTestWald(fit_g1, constraints = "b1.g1 == b1.g2")
$stat
[1] 26.17973
$df
[1] 1
$p.value
[1] 3.110712e-07
$se
[1] "standard"
The Wald test evaluates the Null hypothesis assigned to the constraints argument in the function. Here it tests the Null hypothesis that the two direct effects are equivalent across groups. Thus, a significant results (p < .05) indicates that we should reject the null and conclude that there is evidence that the two direct effects are different.
What would happen if we used the second model, in which all paths are constrained across groups, as input for the Wald test?
Next, I will describe using the Wald test to test if the indirect effect is equivalent across groups. For this comparison, we again start with the unconstrained model (saved in fit_g1
), in which all direct effects are estimated separately for each group. With the code below, we test if the indirect effect from Intrinsic Motivation > Task Mastery > Active Engagement is equivalent across the two age groups:
lavTestWald(fit_g1, constraints = "im.tm.ind.g1 == im.tm.ind.g2")
$stat
[1] 9.62194
$df
[1] 1
$p.value
[1] 0.001922665
$se
[1] "standard"
The Wald test evaluates the Null hypothesis assigned to the constraints argument in the function. Here it tests the Null hypothesis that the two indirect effects are equivalent. Thus, a significant results (p < .05) indicates that we should reject the null and conclude that there is evidence that the two indirect effects are different.
Note that failing to reject the Wald test only tells us that the indirect effect is equivalent across the two groups. It could be possible that there are still differences in each of the (unconstrained) direct paths that make up the indirect effect. As a simple math example, the direct effect paths could be 4 and 5 for group 1 (indirect effect is 4x5 = 20), but 2 and 10 for group 2 (indirect effect is 2x10 = 20). Thus, it may also be important to test the equivalence of the direct effects.
4.14 Bonus 2: Modification Indices
Although I do not recommend relying purely on modification indices to guide your model building process, I do want you to know how to examine them. For this exercise, we will use our original model specification to see what kind of path the modification indices would have suggested. The modificationindices()
function below requests modification indices that are at least 10 (indicating that Chi-square is expected to be lowered by 10 if the suggested parameter is added), by adding minimum.value = 10
(this value is arbitrary, but can help you filter out modifications that only improve the Chi-square by a smidge). By including .sort = TRUE
, the results will be sorted from largest modification index (biggest impact on model fit) to smallest.
modificationIndices(fit1, sort. = TRUE, minimum.value = 10)
lhs op rhs mi epc sepc.lv sepc.all sepc.nox
35 SciAtt ~ TaskMast 23.398 0.358 0.358 0.282 0.282
19 TaskMast ~~ SciAtt 23.398 0.065 0.065 0.250 0.250
26 TaskMast ~ SciAtt 23.398 0.255 0.255 0.324 0.324
18 TaskMast ~~ IntMot 23.398 -0.103 -0.103 -0.447 -0.447
31 IntMot ~ TaskMast 23.398 -0.567 -0.567 -0.504 -0.504
17 TaskMast ~~ EgoSoc 21.223 0.091 0.091 0.288 0.288
25 TaskMast ~ EgoSoc 16.102 0.143 0.143 0.223 0.223
28 EgoSoc ~ TaskMast 10.876 0.330 0.330 0.211 0.211
27 EgoSoc ~ ActiveEn 10.622 0.647 0.647 0.298 0.298
Based on our restrictions, the algorithm identified 10 modifications. The first three columns of output show you what modification is suggested, the fourth column (mi
) gives you the modification index and the final columns give you an idea of what the expected parameter change will be if that parameter is included in the model (can be useful if you want to know if it would result in a large effect size). Let’s look at them.
- The largest modification index suggests a path from Task Mastery Goals to Science Attitudes. This path is not in line with the theory that suggests that attitudes develop before goal orientations emerge. In addition, adding this path would turn Science Attitudes from exogenous to endogenous. In this new role, the covariance between Intrinsic Motivation and Science Attitudes would turn into a residual covariance, which violates one of the assumptions of path models.
- The second to fifth modification indices are of the same magnitude as the first, but suggest very different associations. The second and third suggest that Task Mastery and Science attitudes should be allowed to covary, or that Science Attitudes should predict Task Mastery (this is a path we do end up including in a later modification). The fourth and fifth suggest that Task Mastery and Intrinsic Motivation should be associated through either a (residual) covariance or a directed path to Intrinsic Motivation. Each of these parameters would result in a non-recursive model with feedback loops.
- Finally, the sixth modification index suggests the modification we made (based on theory): Add the covariance between Task Mastery and Ego-Social Goals. Although this modification does not result in the largest expected decline in the Chi-square, it is the first one that makes theoretical sense.
I hope that this example helps illustrate how modification indices are entirely data driven and that many of the modifications will make no sense from a theoretical or statistical point of view.