4 Model Evaluation and Comparison

Note

You can download the R code used in this lab by right-clicking this link and selecting “Save Link As…” in the drop-down menu: modelevaluation.R

4.1 Loading R Packages

Load the required packages for this lab into your R environment:

library(rio)
library(lavaan)
library(semTools)
library(semhelpinghands)

4.2 Loading Data

Load the data into your environment. We will use the same data we’ve used in the previous Lab:

meece <- import(file = "data/meece.csv")

4.3 Model 1

Model Specification

Next, we will specify a bare bones model that aligns with the central hypotheses of the authors behind these data. We will take a model building approach. The central hypotheses are:

Science Attitudes affect Active Cognitive Engagement through the intermediate variable Ego-Social Goals
Intrinsic Motivation affects Active Cognitive Engagement through the intermediate variable Task Mastery Goals

mod1 <- '
ActiveEn ~ b1*TaskMast + b2*EgoSoc

TaskMast ~ a1*IntMot
EgoSoc ~ a2*SciAtt


SciAtt ~~ IntMot

# indirect and total effect between IntMot > AciveEn
im.tm.ind := a1*b1

# indirect effects between SciAtt > AciveEn
sa.es.ind := a2*b2
'

What are some additional theory-informed hypotheses we can list?

Model Estimation

Some of the local fit options are not available when bootstrapping is used during estimation, so for now, we will estimate the model without bootstrapping. Effects that are defined after estimation (such as indirect effects) do not affect the fit of a model to the data, so this decision will not affect our model evaluation choices. Once the model evaluation process is complete, we can re-estimate the final model with bootstrapping.

fit1 <- sem(model = mod1, data = meece)

Model Evaluation

Here, we will follow the steps described in the textbook.

Global Fit

First, we will test the exact fit of the model to the data using the Chi-square model test. We can find this statistic by including fit.measures = T in the summary() function and finding the section labeled Model Test User Model. Note that I also set estimates = F, so that the parameter estimates are not included in the output (mostly to reduce the length of the output):

summary(fit1, fit.measures = T, estimates = F)

lavaan 0.6-19 ended normally after 18 iterations

  Estimator                                         ML
  Optimization method                           NLMINB
  Number of model parameters                        10

  Number of observations                           256

Model Test User Model:
                                                      
  Test statistic                                56.448
  Degrees of freedom                                 5
  P-value (Chi-square)                           0.000

Model Test Baseline Model:

  Test statistic                               398.422
  Degrees of freedom                                10
  P-value                                        0.000

User Model versus Baseline Model:

  Comparative Fit Index (CFI)                    0.868
  Tucker-Lewis Index (TLI)                       0.735

Loglikelihood and Information Criteria:

  Loglikelihood user model (H0)               -828.169
  Loglikelihood unrestricted model (H1)       -799.945
                                                      
  Akaike (AIC)                                1676.338
  Bayesian (BIC)                              1711.790
  Sample-size adjusted Bayesian (SABIC)       1680.087

Root Mean Square Error of Approximation:

  RMSEA                                          0.200
  90 Percent confidence interval - lower         0.155
  90 Percent confidence interval - upper         0.249
  P-value H_0: RMSEA <= 0.050                    0.000
  P-value H_0: RMSEA >= 0.080                    1.000

Standardized Root Mean Square Residual:

  SRMR                                           0.094

What does the Chi-square test tell us about the exact fit of this model?

Other fit indices that are reported here:

Model Test Baseline Model: This is the Chi-square of the worst model, that is used to compute the CFI/TLI. This Chi-square should always be larger than the Model Test User Model.
CFI/TLI: Goodness-of-fit indices, values closer to 1 indicate better fit
Loglikelihood and Information Criteria: This information is used to compute the AIC and BIC (and SABIC), which help you compare non-nested models
Root Mean Square Error of Approximation: Badness-of-fit index, values closer to 0 indicate better fit. This section helps you use the 90% confidence interval to compute the p-value for H0: RMSEA is <= .05 (if p > .05, this indicates good fit) and H0 = RMSEA >= .08 (if p > .05, this indicates poor fit).
SRMR: Badness-of-fit index, values closer to 0 indicate better fit.

Local Fit

To identify where the friction between the data and the model might be, we can look at local fit using the covariance residuals. It can be helpful to compare results across different methods:

residuals(fit1, type = "standardized")

$type
[1] "standardized"

$cov
         ActvEn TskMst EgoSoc IntMot SciAtt
ActiveEn  2.905                            
TaskMast  2.905  0.000                     
EgoSoc    2.905  2.905  0.000              
IntMot    1.658  0.000 -2.346  0.000       
SciAtt    3.026  4.630  0.000  0.000  0.000

residuals(fit1, type = "normalized")

$type
[1] "normalized"

$cov
         ActvEn TskMst EgoSoc IntMot SciAtt
ActiveEn  0.311                            
TaskMast  0.263  0.000                     
EgoSoc    1.779  2.635  0.000              
IntMot    0.971  0.000 -1.908  0.000       
SciAtt    2.303  3.208  0.000  0.000  0.000

residuals(fit1, type = "cor.bollen")

$type
[1] "cor.bollen"

$cov
         ActvEn TskMst EgoSoc IntMot SciAtt
ActiveEn  0.000                            
TaskMast  0.011  0.000                     
EgoSoc    0.112  0.166  0.000              
IntMot    0.060  0.000 -0.122  0.000       
SciAtt    0.148  0.222  0.000  0.000  0.000

What patterns can you see in the covariance residuals? What path could we add to the model?

4.4 Model 2

Model Respecification

Lets add the hypothesis that Task Mastery Goals and Ego-Social Goals are correlated/associated

mod2 <- '
ActiveEn ~ b1*TaskMast + b2*EgoSoc

TaskMast ~ a1*IntMot
EgoSoc ~ a2*SciAtt


TaskMast ~~ EgoSoc

SciAtt ~~ IntMot

# indirect and total effect between IntMot > AciveEn
im.tm.ind := a1*b1

# indirect effects between SciAtt > AciveEn
sa.es.ind := a2*b2
'

We can estimate the model and save it in fit2:

fit2 <- sem(model = mod2, data = meece)

Model Evaluation

We will use the same sources of information to evaluate the fit of this modified model.

Global Fit

summary(fit2, fit.measures = T,  estimates = F)

lavaan 0.6-19 ended normally after 24 iterations

  Estimator                                         ML
  Optimization method                           NLMINB
  Number of model parameters                        11

  Number of observations                           256

Model Test User Model:
                                                      
  Test statistic                                32.567
  Degrees of freedom                                 4
  P-value (Chi-square)                           0.000

Model Test Baseline Model:

  Test statistic                               398.422
  Degrees of freedom                                10
  P-value                                        0.000

User Model versus Baseline Model:

  Comparative Fit Index (CFI)                    0.926
  Tucker-Lewis Index (TLI)                       0.816

Loglikelihood and Information Criteria:

  Loglikelihood user model (H0)               -816.229
  Loglikelihood unrestricted model (H1)       -799.945
                                                      
  Akaike (AIC)                                1654.457
  Bayesian (BIC)                              1693.454
  Sample-size adjusted Bayesian (SABIC)       1658.581

Root Mean Square Error of Approximation:

  RMSEA                                          0.167
  90 Percent confidence interval - lower         0.117
  90 Percent confidence interval - upper         0.222
  P-value H_0: RMSEA <= 0.050                    0.000
  P-value H_0: RMSEA >= 0.080                    0.997

Standardized Root Mean Square Residual:

  SRMR                                           0.080

What does the Chi-square test tell us about the exact fit of this model?

Local Fit

We can examine to what extend the added covariance between Task Mastery and Ego-Social Goals has resolved the local fit issues (only showing one method here):

residuals(fit2, type = "cor.bollen")

$type
[1] "cor.bollen"

$cov
         ActvEn TskMst EgoSoc IntMot SciAtt
ActiveEn  0.000                            
TaskMast -0.009  0.000                     
EgoSoc   -0.054 -0.078  0.000              
IntMot    0.058 -0.017 -0.083  0.000       
SciAtt    0.153  0.213  0.069  0.000  0.000

What patterns can you see in the covariance residuals? What path could we add to the model?

4.5 Model Comparison of Model 1 and Model 2

We can also use the Chi-square difference test to compare the original and modified model to each other. Here, the original model is more constraint and the modified model is less constraint. The Chi-square difference test, tests the Null hypothesis: Chi-square constrained == Chi-square free.

If the Chi-square difference test is significant, it means that the contrained model fits the data worse than the free model, which means that adding the additional path helped us improve the global fit.
If the Chi-square difference test is not significant, it means that the constrained mode fits the data equally well as the free model, which means that adding the additional path did not improve model fit.

To ensure that our models are nested, we use the net() function from the semTools package. This function checks if two models are nested and if two models are equivalent:

net(fit1, fit2)


        If cell [R, C] is TRUE, the model in row R is nested within column C.

        If the models also have the same degrees of freedom, they are equivalent.

        NA indicates the model in column C did not converge when fit to the
        implied means and covariance matrix from the model in row R.

        The hidden diagonal is TRUE because any model is equivalent to itself.
        The upper triangle is hidden because for models with the same degrees
        of freedom, cell [C, R] == cell [R, C].  For all models with different
        degrees of freedom, the upper diagonal is all FALSE because models with
        fewer degrees of freedom (i.e., more parameters) cannot be nested
        within models with more degrees of freedom (i.e., fewer parameters).
        
              fit2 fit1
fit2 (df = 4)          
fit1 (df = 5) TRUE

Next, we can use the compareFit() function from the semTools package to compare the models using the Chi-square difference test:

comp12 <- compareFit(fit1, fit2)
summary(comp12)

################### Nested Model Comparison #########################

Chi-Squared Difference Test

     Df    AIC    BIC  Chisq Chisq diff   RMSEA Df diff Pr(>Chisq)    
fit2  4 1654.5 1693.5 32.568                                          
fit1  5 1676.3 1711.8 56.448     23.881 0.29896       1  1.025e-06 ***
---
Signif. codes:  0 '***' 0.001 '**' 0.01 '*' 0.05 '.' 0.1 ' ' 1

####################### Model Fit Indices ###########################
       chisq df pvalue rmsea   cfi   tli  srmr       aic       bic
fit2 32.567†  4   .000 .167† .926† .816† .080† 1654.457† 1693.454†
fit1 56.448   5   .000 .200  .868  .735  .094  1676.338  1711.790 

################## Differences in Fit Indices #######################
            df rmsea    cfi    tli  srmr    aic    bic
fit1 - fit2  1 0.033 -0.059 -0.081 0.014 21.881 18.336

Does the modified model result in an improvement in global fit?

4.6 Model 3

Model Respecification

We can keep building out model by including a path from Science Attitudes to Task Mastery Goals, which means that we hypothesize that Science Attitudes indirectly affect Active Cognitive Engagement through both Ego-Social and Task Mastery Goals.

mod3 <- '
ActiveEn ~ b1*TaskMast + b2*EgoSoc

TaskMast ~ a1*IntMot + a3*SciAtt
EgoSoc ~ a2*SciAtt


TaskMast ~~ EgoSoc

SciAtt ~~ IntMot

# indirect and total effect between IntMot > AciveEn
im.tm.ind := a1*b1

# indirect effects between SciAtt > AciveEn
sa.es.ind := a2*b2
sa.tm.ind := a3*b1
sa.total.ind := (a2*b2) + (a3*b1)
'

We can estimate the model and save it in fit3:

fit3 <- sem(model = mod3, data = meece)

Model Evaluation

We will use the same sources of information to evaluate the fit of this modified model.

Global Fit

summary(fit3, fit.measures = T, estimates = F)

lavaan 0.6-19 ended normally after 23 iterations

  Estimator                                         ML
  Optimization method                           NLMINB
  Number of model parameters                        12

  Number of observations                           256

Model Test User Model:
                                                      
  Test statistic                                11.194
  Degrees of freedom                                 3
  P-value (Chi-square)                           0.011

Model Test Baseline Model:

  Test statistic                               398.422
  Degrees of freedom                                10
  P-value                                        0.000

User Model versus Baseline Model:

  Comparative Fit Index (CFI)                    0.979
  Tucker-Lewis Index (TLI)                       0.930

Loglikelihood and Information Criteria:

  Loglikelihood user model (H0)               -805.542
  Loglikelihood unrestricted model (H1)       -799.945
                                                      
  Akaike (AIC)                                1635.084
  Bayesian (BIC)                              1677.626
  Sample-size adjusted Bayesian (SABIC)       1639.583

Root Mean Square Error of Approximation:

  RMSEA                                          0.103
  90 Percent confidence interval - lower         0.044
  90 Percent confidence interval - upper         0.171
  P-value H_0: RMSEA <= 0.050                    0.067
  P-value H_0: RMSEA >= 0.080                    0.775

Standardized Root Mean Square Residual:

  SRMR                                           0.037

What does the Chi-square test tell us about the exact fit of this model?

Local Fit

residuals(fit3, type = "cor.bollen")

$type
[1] "cor.bollen"

$cov
         ActvEn TskMst EgoSoc IntMot SciAtt
ActiveEn  0.000                            
TaskMast -0.006  0.000                     
EgoSoc   -0.025 -0.037  0.000              
IntMot    0.047 -0.025 -0.122  0.000       
SciAtt    0.001  0.004  0.000  0.000  0.000

What patterns can you see in the covariance residuals? What path could we add to the model?

4.7 Model Comparison of Model 2 and Model 3

comp23 <- compareFit(fit3, fit2)
summary(comp23)

################### Nested Model Comparison #########################

Chi-Squared Difference Test

     Df    AIC    BIC  Chisq Chisq diff  RMSEA Df diff Pr(>Chisq)    
fit3  3 1635.1 1677.6 11.194                                         
fit2  4 1654.5 1693.5 32.568     21.373 0.2821       1   3.78e-06 ***
---
Signif. codes:  0 '***' 0.001 '**' 0.01 '*' 0.05 '.' 0.1 ' ' 1

####################### Model Fit Indices ###########################
       chisq df pvalue rmsea   cfi   tli  srmr       aic       bic
fit3 11.194†  3   .011 .103† .979† .930† .037† 1635.084† 1677.626†
fit2 32.567   4   .000 .167  .926  .816  .080  1654.457  1693.454 

################## Differences in Fit Indices #######################
            df rmsea    cfi    tli  srmr    aic    bic
fit2 - fit3  1 0.064 -0.052 -0.114 0.043 19.373 15.828

Does the modified model result in an improvement in global fit?

4.8 Model 4

Model Respecification

Lets add the path from Intrinsic Motivation to Ego-Social Goals, which matches the hypothesis that Intrinsic Motivation indirectly affects Cognitive Engagement through both Task Mastery and Ego-Social Goals:

mod4 <- '
ActiveEn ~ b1*TaskMast + b2*EgoSoc

TaskMast ~ a1*IntMot + a3*SciAtt
EgoSoc ~ a2*SciAtt + a4*IntMot


TaskMast ~~ EgoSoc

SciAtt ~~ IntMot

# indirect and total effect between IntMot > AciveEn
im.tm.ind := a1*b1
im.es.ind := a4*b1
im.total.ind := (a1*b1) + (a4*b1)

# indirect effects between SciAtt > AciveEn
sa.es.ind := a2*b2
sa.tm.ind := a3*b1
sa.total.ind := (a2*b2) + (a3*b1)
'

We can estimate the model and save it in fit4:

fit4 <- sem(model = mod4, data = meece)

Model Evaluation

We will use the same sources of information to evaluate the fit of this modified model.

Global Fit

summary(fit4, fit.measures = T, estimates = F)

lavaan 0.6-19 ended normally after 23 iterations

  Estimator                                         ML
  Optimization method                           NLMINB
  Number of model parameters                        13

  Number of observations                           256

Model Test User Model:
                                                      
  Test statistic                                 5.506
  Degrees of freedom                                 2
  P-value (Chi-square)                           0.064

Model Test Baseline Model:

  Test statistic                               398.422
  Degrees of freedom                                10
  P-value                                        0.000

User Model versus Baseline Model:

  Comparative Fit Index (CFI)                    0.991
  Tucker-Lewis Index (TLI)                       0.955

Loglikelihood and Information Criteria:

  Loglikelihood user model (H0)               -802.698
  Loglikelihood unrestricted model (H1)       -799.945
                                                      
  Akaike (AIC)                                1631.396
  Bayesian (BIC)                              1677.483
  Sample-size adjusted Bayesian (SABIC)       1636.270

Root Mean Square Error of Approximation:

  RMSEA                                          0.083
  90 Percent confidence interval - lower         0.000
  90 Percent confidence interval - upper         0.169
  P-value H_0: RMSEA <= 0.050                    0.186
  P-value H_0: RMSEA >= 0.080                    0.612

Standardized Root Mean Square Residual:

  SRMR                                           0.021

What does the Chi-square test tell us about the exact fit of this model?

Local Fit

residuals(fit4, type = "cor.bollen")

$type
[1] "cor.bollen"

$cov
         ActvEn TskMst EgoSoc IntMot SciAtt
ActiveEn  0.000                            
TaskMast  0.000  0.000                     
EgoSoc    0.000  0.000  0.000              
IntMot    0.079  0.000  0.000  0.000       
SciAtt   -0.002  0.000  0.000  0.000  0.000

What patterns can you see in the covariance residuals? What path could we add to the model?

4.9 Model Comparison of Model 3 and Model 4

comp34 <- compareFit(fit3, fit4)
summary(comp34)

################### Nested Model Comparison #########################

Chi-Squared Difference Test

     Df    AIC    BIC   Chisq Chisq diff   RMSEA Df diff Pr(>Chisq)  
fit4  2 1631.4 1677.5  5.5063                                        
fit3  3 1635.1 1677.6 11.1941     5.6878 0.13532       1    0.01708 *
---
Signif. codes:  0 '***' 0.001 '**' 0.01 '*' 0.05 '.' 0.1 ' ' 1

####################### Model Fit Indices ###########################
       chisq df pvalue rmsea   cfi   tli  srmr       aic       bic
fit4  5.506†  2   .064 .083† .991† .955† .021† 1631.396† 1677.483†
fit3 11.194   3   .011 .103  .979  .930  .037  1635.084  1677.626 

################## Differences in Fit Indices #######################
            df rmsea    cfi    tli  srmr   aic   bic
fit3 - fit4  1 0.021 -0.012 -0.025 0.017 3.688 0.143

Does the modified model result in an improvement in global fit?

4.10 Parameter Estimate Interpretation

Typically, you would look at the estimates of newly added paths at each step of the model building process, to ensure that the estimates align with theoretical expectations. Here, we kind of skipped that, but we will take a quick look now. Please refer back to the Indirect Effects R Lab to read more about interpreting indirect effects.

fit4 <- sem(model = mod4, data = meece,
            se = "bootstrap", bootstrap = 1000,
            iseed = 8789)

First, we can look at the (unstandardized) parameter estimates:

parameterEstimates(fit4, boot.ci.type = "bca.simple", 
                   ci = TRUE, se = TRUE, 
                   zstat = FALSE, pvalue = FALSE,
                   output = "text")


Regressions:
                   Estimate  Std.Err ci.lower ci.upper
  ActiveEn ~                                          
    TaskMast  (b1)    0.499    0.035    0.430    0.572
    EgoSoc    (b2)    0.056    0.022    0.012    0.099
  TaskMast ~                                          
    IntMot    (a1)    0.248    0.064    0.128    0.372
    SciAtt    (a3)    0.255    0.051    0.153    0.358
  EgoSoc ~                                            
    SciAtt    (a2)   -0.050    0.086   -0.214    0.126
    IntMot    (a4)   -0.246    0.104   -0.454   -0.036

Covariances:
                   Estimate  Std.Err ci.lower ci.upper
 .TaskMast ~~                                         
   .EgoSoc            0.083    0.017    0.049    0.120
  IntMot ~~                                           
    SciAtt            0.184    0.024    0.138    0.238

Variances:
                   Estimate  Std.Err ci.lower ci.upper
   .ActiveEn          0.060    0.005    0.052    0.073
   .TaskMast          0.164    0.014    0.137    0.192
   .EgoSoc            0.537    0.046    0.458    0.641
    IntMot            0.290    0.024    0.249    0.343
    SciAtt            0.371    0.032    0.312    0.441

Defined Parameters:
                   Estimate  Std.Err ci.lower ci.upper
    im.tm.ind         0.124    0.035    0.058    0.195
    im.es.ind        -0.123    0.052   -0.229   -0.021
    im.total.ind      0.001    0.068   -0.134    0.133
    sa.es.ind        -0.003    0.005   -0.016    0.006
    sa.tm.ind         0.127    0.025    0.078    0.178
    sa.total.ind      0.124    0.027    0.071    0.178

Even though our model fits the data well, we can see that the path from Science Attitudes to Ego-Social Goals (which was part of our theoretical model at the start of all this), is not significant. We might want to explore if model trimming (i.e., removing this path) does not affect the overall model fit. However, since this path was so central to our theory, we would ideally want to collect another sample and test if the trimmed model can be replicated.

Another piece of output that we can examine are the R-squared estimates for each endogenous variable in the model. These represent the proportion of variance in the endogenous variables that can be explained by the predictors in the model. Remember, a well-fitting model can have endogenous variables with low R-squareds. This is because the model-implied variances include both the explained (R-squared) and unexplained/error (1 - R-squared) variance. Thus, a model can fit well as long as the model closely predicts the total variance in the endogenous variables, even if a large part of that variance is unexplained/error.

We can use lavInspect() to get the R-squared estimates:

lavInspect(fit4, what = "rsquare")

ActiveEn TaskMast   EgoSoc 
   0.504    0.284    0.041

For which endogenous variable does the model result in the largest unexplained/error variance?

4.11 Comparing groups

Next, we will look at a special use of model evaluation and comparison: multiple group path model. We will learn how to compare the same model across different groups. For that purpose, lets pretend that the meece study was repeated for a sample of high school students (the original sample are middle school students). These data are saved in data_meece2.csv.

You can download the data by right-clicking this link and selecting “Save Link As…” in the drop-down menu: data/meece2.csv. Make sure to save it in the folder you are using for this class.

We will add a sample variable to each data frame and then combine the two data frames together:

meece2 <- import("data/meece2.csv")

# Create group variable for each dataset
meece$sample <- 1
meece2$sample <- 2

# Combine two groups into one dataframe
meece_all <- rbind(meece, meece2)

For this example, we will estimate a simplified version of the model we’ve been working on so far, by only including Intrinsic Motivation as an exogenous variable. Compared to the single-group analysis model, we need to make some adjustments to the multiple-group scenario, specifically in how we label variables.

We initially start with a model in which all paths are estimated separately for each group. Now, each path needs a label for each group, and we use c() to list these labels for each path. For example, for the path from Task Mastery to Active Engagement, we use c(b1.g1, b1.g2). We also need to define indirect (and if present total) effects for each group separately.

mod_g1 <- '
ActiveEn ~ c(b1.g1, b1.g2)*TaskMast + c(b2.g1, b2.g2)*EgoSoc

TaskMast ~ c(a1.g1, a1.g2)*IntMot
EgoSoc ~ c(a2.g1, a2.g2)*IntMot

TaskMast ~~ EgoSoc

# Group = 1
# indirect and total effect between IntMot > AciveEn
im.tm.ind.g1 := a1.g1*b1.g1
im.es.ind.g1 := a2.g1*b2.g1
im.total.ind.g1 := (a1.g1*b1.g1) + (a2.g1*b2.g1)

# Group = 2
# indirect and total effect between IntMot > AciveEn
im.tm.ind.g2 := a1.g2*b1.g2
im.es.ind.g2 := a2.g2*b2.g2
im.total.ind.g2 := (a1.g2*b1.g2) + (a2.g2*b2.g2)
'

When we estimate this model, we need to tell lavaan what variable contains the group assignment for each participant by adding group = "group". In your data, your group variable might have a different name. Again, we’re going to skip bootstrapping because I am focused on demonstrating how you can compare estimates across groups using the Chi-square difference test:

fit_g1 <- sem(model = mod_g1, data = meece_all, group = "sample")

If you use the summary() function, you will get parameter estimates per group, printed out sequentially. For complex models, this can result in a very long output object:

summary(fit_g1, fit.measures = T)

lavaan 0.6-19 ended normally after 42 iterations

  Estimator                                         ML
  Optimization method                           NLMINB
  Number of model parameters                        22

  Number of observations per group:                   
    1                                              256
    2                                              256

Model Test User Model:
                                                      
  Test statistic                                15.190
  Degrees of freedom                                 2
  P-value (Chi-square)                           0.001
  Test statistic for each group:
    1                                            4.571
    2                                           10.620

Model Test Baseline Model:

  Test statistic                               493.713
  Degrees of freedom                                12
  P-value                                        0.000

User Model versus Baseline Model:

  Comparative Fit Index (CFI)                    0.973
  Tucker-Lewis Index (TLI)                       0.836

Loglikelihood and Information Criteria:

  Loglikelihood user model (H0)               -851.940
  Loglikelihood unrestricted model (H1)       -844.345
                                                      
  Akaike (AIC)                                1747.881
  Bayesian (BIC)                              1841.124
  Sample-size adjusted Bayesian (SABIC)       1771.292

Root Mean Square Error of Approximation:

  RMSEA                                          0.161
  90 Percent confidence interval - lower         0.092
  90 Percent confidence interval - upper         0.240
  P-value H_0: RMSEA <= 0.050                    0.006
  P-value H_0: RMSEA >= 0.080                    0.971

Standardized Root Mean Square Residual:

  SRMR                                           0.029

Parameter Estimates:

  Standard errors                             Standard
  Information                                 Expected
  Information saturated (h1) model          Structured


Group 1 [1]:

Regressions:
                   Estimate  Std.Err  z-value  P(>|z|)
  ActiveEn ~                                          
    TaskMst (b1.1)    0.499    0.032   15.420    0.000
    EgoSoc  (b2.1)    0.056    0.021    2.728    0.006
  TaskMast ~                                          
    IntMot  (a1.1)    0.409    0.049    8.289    0.000
  EgoSoc ~                                            
    IntMot  (a2.1)   -0.278    0.085   -3.266    0.001

Covariances:
                   Estimate  Std.Err  z-value  P(>|z|)
 .TaskMast ~~                                         
   .EgoSoc            0.080    0.020    3.956    0.000

Intercepts:
                   Estimate  Std.Err  z-value  P(>|z|)
   .ActiveEn          0.523    0.111    4.735    0.000
   .TaskMast          2.035    0.143   14.221    0.000
   .EgoSoc            3.272    0.247   13.262    0.000

Variances:
                   Estimate  Std.Err  z-value  P(>|z|)
   .ActiveEn          0.060    0.005   11.314    0.000
   .TaskMast          0.181    0.016   11.314    0.000
   .EgoSoc            0.538    0.048   11.314    0.000


Group 2 [2]:

Regressions:
                   Estimate  Std.Err  z-value  P(>|z|)
  ActiveEn ~                                          
    TaskMst (b1.2)    0.269    0.031    8.632    0.000
    EgoSoc  (b2.2)    0.195    0.020    9.621    0.000
  TaskMast ~                                          
    IntMot  (a1.2)    0.373    0.053    7.095    0.000
  EgoSoc ~                                            
    IntMot  (a2.2)   -0.268    0.087   -3.090    0.002

Covariances:
                   Estimate  Std.Err  z-value  P(>|z|)
 .TaskMast ~~                                         
   .EgoSoc            0.071    0.021    3.450    0.001

Intercepts:
                   Estimate  Std.Err  z-value  P(>|z|)
   .ActiveEn          0.538    0.107    5.043    0.000
   .TaskMast          2.137    0.151   14.132    0.000
   .EgoSoc            3.201    0.250   12.823    0.000

Variances:
                   Estimate  Std.Err  z-value  P(>|z|)
   .ActiveEn          0.057    0.005   11.314    0.000
   .TaskMast          0.196    0.017   11.314    0.000
   .EgoSoc            0.533    0.047   11.314    0.000

Defined Parameters:
                   Estimate  Std.Err  z-value  P(>|z|)
    im.tm.ind.g1      0.204    0.028    7.301    0.000
    im.es.ind.g1     -0.016    0.007   -2.094    0.036
    im.total.nd.g1    0.188    0.030    6.219    0.000
    im.tm.ind.g2      0.100    0.018    5.481    0.000
    im.es.ind.g2     -0.052    0.018   -2.942    0.003
    im.total.nd.g2    0.048    0.028    1.728    0.084

You can also use the group_by_groups() function from the semhelpinghands package to get a quick overview of just the parameter estimates to see if there are any that appear particularly different:

group_by_groups(fit_g1)

        lhs op      rhs  est_1  est_2
1  ActiveEn  ~   EgoSoc  0.056  0.195
2  ActiveEn  ~ TaskMast  0.499  0.269
3    EgoSoc  ~   IntMot -0.278 -0.268
4  TaskMast  ~   IntMot  0.409  0.373
5  ActiveEn ~~ ActiveEn  0.060  0.057
6    EgoSoc ~~   EgoSoc  0.538  0.533
7    IntMot ~~   IntMot  0.290  0.276
8  TaskMast ~~   EgoSoc  0.080  0.071
9  TaskMast ~~ TaskMast  0.181  0.196
10 ActiveEn ~1           0.523  0.538
11   EgoSoc ~1           3.272  3.201
12   IntMot ~1           2.850  2.828
13 TaskMast ~1           2.035  2.137

Are there any estimates that look like they might differ across groups?

Note that multiple group analyses include the variable means/intercepts as well, which are denoted with e.g., ActiveEn ~ 1. This means we can also test if the groups have equivalent average levels of the variables (where for endogenous variables, this is actually the intercept, or the expected value when all predictors equal 0).

Specifying a More Restrictive Model

To restrict certain parameters to be equivalent across groups, we need to assign the same label to the parameter in both groups. For example, the label of the path from Task Mastery to Active Engagement is now c(b1, b1). We might do this if we want to test the Null hypothesis of no group difference in the strength of an association between two variables. In this case, we might want to see if our new sample replicates the findings found in the original sample.

mod_g2 <- '
ActiveEn ~ c(b1, b1)*TaskMast + c(b2, b2)*EgoSoc

TaskMast ~ c(a1, a1)*IntMot
EgoSoc ~ c(a2, a2)*IntMot


TaskMast ~~ EgoSoc

# Group = 1 and 2 because no separate effects
# indirect and total effect between IntMot > AciveEn
im.tm.ind.g1 := a1*b1
im.es.ind.g1 := a2*b2
im.total.ind.g1 := (a1*b1) + (a2*b2)
'

When estimating this model, you still need to include group = "sample":

fit_g2 <- sem(model = mod_g2, data = meece_all, group = "sample")

When we look at the group_by_groups output, all the regression estimates should be equivalent across groups. They are estimated as though they are one parameter (instead of two, one per group). You can see the difference reflected in the degrees of freedom, which are much higher for this (more constrained) model:

group_by_groups(fit_g2)

        lhs op      rhs  est_1  est_2
1  ActiveEn  ~   EgoSoc  0.128  0.128
2  ActiveEn  ~ TaskMast  0.379  0.379
3    EgoSoc  ~   IntMot -0.274 -0.274
4  TaskMast  ~   IntMot  0.392  0.392
5  ActiveEn ~~ ActiveEn  0.066  0.062
6    EgoSoc ~~   EgoSoc  0.538  0.533
7    IntMot ~~   IntMot  0.290  0.276
8  TaskMast ~~   EgoSoc  0.080  0.071
9  TaskMast ~~ TaskMast  0.181  0.196
10 ActiveEn ~1           0.730  0.351
11   EgoSoc ~1           3.259  3.216
12   IntMot ~1           2.850  2.828
13 TaskMast ~1           2.082  2.083

Are there parameters that are still estimated separately for each group?

Model Comparison using the Chi-square difference test

We can use the Chi-square difference test to compare the two multiple group models. In this case, a significant Chi-square difference test would indicate that the restrictive model (where all regression paths are specified to be equivalent across groups) fits the data worse than the free model, implying that there are likely some differences between the groups in terms of the associations between variables.

comp_g12 <- compareFit(fit_g1, fit_g2)
summary(comp_g12)

################### Nested Model Comparison #########################

Chi-Squared Difference Test

       Df    AIC    BIC  Chisq Chisq diff   RMSEA Df diff Pr(>Chisq)    
fit_g1  2 1747.9 1841.1 15.190                                          
fit_g2  6 1782.0 1858.3 57.309     42.118 0.19294       4  1.577e-08 ***
---
Signif. codes:  0 '***' 0.001 '**' 0.01 '*' 0.05 '.' 0.1 ' ' 1

####################### Model Fit Indices ###########################
         chisq df pvalue rmsea   cfi   tli  srmr       aic       bic
fit_g1 15.190†  2   .001 .161† .973† .836† .029† 1747.881† 1841.124†
fit_g2 57.309   6   .000 .183  .893  .787  .071  1781.999  1858.289 

################## Differences in Fit Indices #######################
                df rmsea    cfi    tli  srmr    aic    bic
fit_g2 - fit_g1  4 0.022 -0.079 -0.049 0.042 34.118 17.165

What can we conclude based on the chi-square difference test?

Looking at local fit per group

We can use information about the local fit for each group to identify variable pairs for which the model is fitting differently across groups. Here, we will only look at correlation residuals (in the interest of space):

residuals(fit_g2, type = "cor")

$`1`
$`1`$type
[1] "cor.bollen"

$`1`$cov
         ActvEn TskMst EgoSoc IntMot
ActiveEn  0.000                     
TaskMast  0.123  0.000              
EgoSoc   -0.149 -0.006  0.000       
IntMot    0.187  0.015 -0.003  0.000

$`1`$mean
ActiveEn TaskMast   EgoSoc   IntMot 
       0        0        0        0 


$`2`
$`2`$type
[1] "cor.bollen"

$`2`$cov
         ActvEn TskMst EgoSoc IntMot
ActiveEn  0.000                     
TaskMast -0.125  0.000              
EgoSoc    0.157  0.007  0.000       
IntMot    0.034 -0.017  0.004  0.000

$`2`$mean
ActiveEn TaskMast   EgoSoc   IntMot 
       0        0        0        0

Do you see any inconsistencies in the pattern across groups?

4.12 Summary

In this R lab, you were introduced to the steps involved in model evaluation and comparison, including using model comparison methods to compare multiple groups. Below, you’ll find two Bonus sections that demonstrate how to test whether specific direct or indirect effects are equivalent across groups and how to request and interpret modification indices. In the next R Lab, you will learn all about latent variables and confirmatory factor analysis, which will also require you to apply your knowledge of model evaluation and comparison.

4.13 Bonus 1: Testing the equivalence of specific direct and indirect effects

Sometimes, you may want to test the equivalence of specific paths, instead of groups of parameters as we did above. To do so, we can use the Wald test. For this comparison, we start with the unconstrained model (saved in fit_g1), in which all direct effects are estimated separately for each group. With the code below, we test if the direct effect from Task Mastery > Active Engagement is equivalent across the two age groups:

lavTestWald(fit_g1, constraints = "b1.g1 == b1.g2")

$stat
[1] 26.17973

$df
[1] 1

$p.value
[1] 3.110712e-07

$se
[1] "standard"

The Wald test evaluates the Null hypothesis assigned to the constraints argument in the function. Here it tests the Null hypothesis that the two direct effects are equivalent across groups. Thus, a significant results (p < .05) indicates that we should reject the null and conclude that there is evidence that the two direct effects are different.

What would happen if we used the second model, in which all paths are constrained across groups, as input for the Wald test?

Next, I will describe using the Wald test to test if the indirect effect is equivalent across groups. For this comparison, we again start with the unconstrained model (saved in fit_g1), in which all direct effects are estimated separately for each group. With the code below, we test if the indirect effect from Intrinsic Motivation > Task Mastery > Active Engagement is equivalent across the two age groups:

lavTestWald(fit_g1, constraints = "im.tm.ind.g1 == im.tm.ind.g2")

$stat
[1] 9.62194

$df
[1] 1

$p.value
[1] 0.001922665

$se
[1] "standard"

The Wald test evaluates the Null hypothesis assigned to the constraints argument in the function. Here it tests the Null hypothesis that the two indirect effects are equivalent. Thus, a significant results (p < .05) indicates that we should reject the null and conclude that there is evidence that the two indirect effects are different.

Note

Note that failing to reject the Wald test only tells us that the indirect effect is equivalent across the two groups. It could be possible that there are still differences in each of the (unconstrained) direct paths that make up the indirect effect. As a simple math example, the direct effect paths could be 4 and 5 for group 1 (indirect effect is 4x5 = 20), but 2 and 10 for group 2 (indirect effect is 2x10 = 20). Thus, it may also be important to test the equivalence of the direct effects.

4.14 Bonus 2: Modification Indices

Although I do not recommend relying purely on modification indices to guide your model building process, I do want you to know how to examine them. For this exercise, we will use our original model specification to see what kind of path the modification indices would have suggested. The modificationindices() function below requests modification indices that are at least 10 (indicating that Chi-square is expected to be lowered by 10 if the suggested parameter is added), by adding minimum.value = 10 (this value is arbitrary, but can help you filter out modifications that only improve the Chi-square by a smidge). By including .sort = TRUE, the results will be sorted from largest modification index (biggest impact on model fit) to smallest.

modificationIndices(fit1, sort. = TRUE, minimum.value = 10)

        lhs op      rhs     mi    epc sepc.lv sepc.all sepc.nox
35   SciAtt  ~ TaskMast 23.398  0.358   0.358    0.282    0.282
19 TaskMast ~~   SciAtt 23.398  0.065   0.065    0.250    0.250
26 TaskMast  ~   SciAtt 23.398  0.255   0.255    0.324    0.324
18 TaskMast ~~   IntMot 23.398 -0.103  -0.103   -0.447   -0.447
31   IntMot  ~ TaskMast 23.398 -0.567  -0.567   -0.504   -0.504
17 TaskMast ~~   EgoSoc 21.223  0.091   0.091    0.288    0.288
25 TaskMast  ~   EgoSoc 16.102  0.143   0.143    0.223    0.223
28   EgoSoc  ~ TaskMast 10.876  0.330   0.330    0.211    0.211
27   EgoSoc  ~ ActiveEn 10.622  0.647   0.647    0.298    0.298

Based on our restrictions, the algorithm identified 10 modifications. The first three columns of output show you what modification is suggested, the fourth column (mi) gives you the modification index and the final columns give you an idea of what the expected parameter change will be if that parameter is included in the model (can be useful if you want to know if it would result in a large effect size). Let’s look at them.

The largest modification index suggests a path from Task Mastery Goals to Science Attitudes. This path is not in line with the theory that suggests that attitudes develop before goal orientations emerge. In addition, adding this path would turn Science Attitudes from exogenous to endogenous. In this new role, the covariance between Intrinsic Motivation and Science Attitudes would turn into a residual covariance, which violates one of the assumptions of path models.
The second to fifth modification indices are of the same magnitude as the first, but suggest very different associations. The second and third suggest that Task Mastery and Science attitudes should be allowed to covary, or that Science Attitudes should predict Task Mastery (this is a path we do end up including in a later modification). The fourth and fifth suggest that Task Mastery and Intrinsic Motivation should be associated through either a (residual) covariance or a directed path to Intrinsic Motivation. Each of these parameters would result in a non-recursive model with feedback loops.
Finally, the sixth modification index suggests the modification we made (based on theory): Add the covariance between Task Mastery and Ego-Social Goals. Although this modification does not result in the largest expected decline in the Chi-square, it is the first one that makes theoretical sense.

I hope that this example helps illustrate how modification indices are entirely data driven and that many of the modifications will make no sense from a theoretical or statistical point of view.