7  Measurement Invariance Testing with Continuous Indicators

Note

You can download the R code used in this lab by right-clicking this link and selecting “Save Link As…” in the drop-down menu: measurementinvariance_cont.R

7.1 Loading R packages

library(rio)
library(lavaan)
library(semTools)
library(semhelpinghands)
library(tidyverse)

7.2 Loading Data

Load the data into your environment. For this lab we will use a dataset of N = 1000 individuals who completed the Depression-Anxiety-Stress Scales (DASS-21). For this lab, we will focus on the Anxiety and Stress subscales.

You can download the data by right-clicking this link and selecting “Save Link As…” in the drop-down menu: data/DASS21.csv. Make sure to save it in the folder you are using for this class.

DASS21 <- import(file = "data/DASS21.csv")

DASS21$engnat <- factor(DASS21$engnat, 
                        levels = c("EnglishNative", "ELL"))

We will see if there is measurement invariance across participants whose native language is English (1) versus participants who have a different native language/are ELL (2). The table below shows the sample size in each group:

table(DASS21$engnat)

EnglishNative           ELL 
          500           500 

Here is an overview of the questions on the Anxiety subscale:

  • q2: I was aware of dryness in my mouth
  • q4: I experienced breathing difficulty (eg, excessively rapid breathing, breathlessness in the absence of physical exertion)
  • q7: I experienced trembling (e.g., in the hands)
  • q9: I was worried about situations in which I might panic and make a fool of myself.
  • q15: I felt I was close to panic.
  • q19: I was aware of the action of my heart in the absence of physical exertion (eg, sense of heart rate increase, heart missing a beat).
  • q20: I felt scared without any good reason.

And the questions on the Stress subscale:

  • q1: I found it hard to wind down.
  • q6: I tended to over-react to situations.
  • q8: I felt that I was using a lot of nervous energy.
  • q11: I found myself getting agitated.
  • q12: I found it difficult to relax.
  • q14: I was intolerant of anything that kept me from getting on with what I was doing.
  • q18: I felt that I was rather touchy.

7.3 Statistical Decision Criteria

For this lab, I will use the following statistical decision criteria for the model comparison tests between levels of invariance. These are based on the knowledge that our samples are relatively large (N = 500 per group), and are informed by the textbook and Chen (2007):

  • Look at Chi-square Difference test.
    • If not significant: retain next level of invariance
    • If significant, look at other fit information
  • Are there any patterns of problematic correlation residuals > |.10|? Reject next level of invariance, test partial invariance
  • Does CFI decrease ≤ 0.010 or does RMSEA increase by ≥ 0.015? Reject next level of invariance, test partial invariance

7.4 Configural Invariance

We will specify a two-factor CFA using lavaan syntax. Based on previous research, this two-factor model already includes residual covariances between two pairs of items. What these items have in common is that they focus on physical symptoms of anxiety, which causes them to share more commmon variance with each other (especially with q4) than with the other anxiety items.

cfa_config <- '
anxiety =~ q2 + q4 + q7 + q9 + q15 + q19 + q20
stress =~ q1 + q6 + q8 + q11 + q12 + q14 + q18

q4 ~~ q19
q4 ~~ q7
'

We are going to use a function from the semTools package that will help us set the scale for our latent factors and apply the equality constraint labels in later steps of the measurement invariance testing procedure. This saves us A LOT of typing, especially when moving towards partial invariance. As all indicators are responded to on the same response scale, will use the effect-coding method for scaling the latent factors. This will result in latent factors that have the same mean and variance as the average indicator.

fit.config <- measEq.syntax(configural.model = cfa_config, 
                            data = DASS21, 
                            group = "engnat", 
                            ID.fac = "effects",
                            meanstructure = TRUE, 
                            return.fit = TRUE,
                            estimator = "mlr")

# Print out the model syntax, so you can
# see what semTools is helping us do:
cat(as.character(fit.config@call$model))
## LOADINGS:

anxiety =~ c(NA, NA)*q2 + c(lambda.1_1.g1, lambda.1_1.g2)*q2
anxiety =~ c(NA, NA)*q4 + c(lambda.2_1.g1, lambda.2_1.g2)*q4
anxiety =~ c(NA, NA)*q7 + c(lambda.3_1.g1, lambda.3_1.g2)*q7
anxiety =~ c(NA, NA)*q9 + c(lambda.4_1.g1, lambda.4_1.g2)*q9
anxiety =~ c(NA, NA)*q15 + c(lambda.5_1.g1, lambda.5_1.g2)*q15
anxiety =~ c(NA, NA)*q19 + c(lambda.6_1.g1, lambda.6_1.g2)*q19
anxiety =~ c(NA, NA)*q20 + c(lambda.7_1.g1, lambda.7_1.g2)*q20
stress =~ c(NA, NA)*q1 + c(lambda.8_2.g1, lambda.8_2.g2)*q1
stress =~ c(NA, NA)*q6 + c(lambda.9_2.g1, lambda.9_2.g2)*q6
stress =~ c(NA, NA)*q8 + c(lambda.10_2.g1, lambda.10_2.g2)*q8
stress =~ c(NA, NA)*q11 + c(lambda.11_2.g1, lambda.11_2.g2)*q11
stress =~ c(NA, NA)*q12 + c(lambda.12_2.g1, lambda.12_2.g2)*q12
stress =~ c(NA, NA)*q14 + c(lambda.13_2.g1, lambda.13_2.g2)*q14
stress =~ c(NA, NA)*q18 + c(lambda.14_2.g1, lambda.14_2.g2)*q18

## INTERCEPTS:

q2 ~ c(NA, NA)*1 + c(nu.1.g1, nu.1.g2)*1
q4 ~ c(NA, NA)*1 + c(nu.2.g1, nu.2.g2)*1
q7 ~ c(NA, NA)*1 + c(nu.3.g1, nu.3.g2)*1
q9 ~ c(NA, NA)*1 + c(nu.4.g1, nu.4.g2)*1
q15 ~ c(NA, NA)*1 + c(nu.5.g1, nu.5.g2)*1
q19 ~ c(NA, NA)*1 + c(nu.6.g1, nu.6.g2)*1
q20 ~ c(NA, NA)*1 + c(nu.7.g1, nu.7.g2)*1
q1 ~ c(NA, NA)*1 + c(nu.8.g1, nu.8.g2)*1
q6 ~ c(NA, NA)*1 + c(nu.9.g1, nu.9.g2)*1
q8 ~ c(NA, NA)*1 + c(nu.10.g1, nu.10.g2)*1
q11 ~ c(NA, NA)*1 + c(nu.11.g1, nu.11.g2)*1
q12 ~ c(NA, NA)*1 + c(nu.12.g1, nu.12.g2)*1
q14 ~ c(NA, NA)*1 + c(nu.13.g1, nu.13.g2)*1
q18 ~ c(NA, NA)*1 + c(nu.14.g1, nu.14.g2)*1

## UNIQUE-FACTOR VARIANCES:

q2 ~~ c(NA, NA)*q2 + c(theta.1_1.g1, theta.1_1.g2)*q2
q4 ~~ c(NA, NA)*q4 + c(theta.2_2.g1, theta.2_2.g2)*q4
q7 ~~ c(NA, NA)*q7 + c(theta.3_3.g1, theta.3_3.g2)*q7
q9 ~~ c(NA, NA)*q9 + c(theta.4_4.g1, theta.4_4.g2)*q9
q15 ~~ c(NA, NA)*q15 + c(theta.5_5.g1, theta.5_5.g2)*q15
q19 ~~ c(NA, NA)*q19 + c(theta.6_6.g1, theta.6_6.g2)*q19
q20 ~~ c(NA, NA)*q20 + c(theta.7_7.g1, theta.7_7.g2)*q20
q1 ~~ c(NA, NA)*q1 + c(theta.8_8.g1, theta.8_8.g2)*q1
q6 ~~ c(NA, NA)*q6 + c(theta.9_9.g1, theta.9_9.g2)*q6
q8 ~~ c(NA, NA)*q8 + c(theta.10_10.g1, theta.10_10.g2)*q8
q11 ~~ c(NA, NA)*q11 + c(theta.11_11.g1, theta.11_11.g2)*q11
q12 ~~ c(NA, NA)*q12 + c(theta.12_12.g1, theta.12_12.g2)*q12
q14 ~~ c(NA, NA)*q14 + c(theta.13_13.g1, theta.13_13.g2)*q14
q18 ~~ c(NA, NA)*q18 + c(theta.14_14.g1, theta.14_14.g2)*q18

## UNIQUE-FACTOR COVARIANCES:

q4 ~~ c(NA, NA)*q7 + c(theta.3_2.g1, theta.3_2.g2)*q7
q4 ~~ c(NA, NA)*q19 + c(theta.6_2.g1, theta.6_2.g2)*q19

## LATENT MEANS/INTERCEPTS:

anxiety ~ c(NA, NA)*1 + c(alpha.1.g1, alpha.1.g2)*1
stress ~ c(NA, NA)*1 + c(alpha.2.g1, alpha.2.g2)*1

## COMMON-FACTOR VARIANCES:

anxiety ~~ c(NA, NA)*anxiety + c(psi.1_1.g1, psi.1_1.g2)*anxiety
stress ~~ c(NA, NA)*stress + c(psi.2_2.g1, psi.2_2.g2)*stress

## COMMON-FACTOR COVARIANCES:

anxiety ~~ c(NA, NA)*stress + c(psi.2_1.g1, psi.2_1.g2)*stress

## MODEL CONSTRAINTS:

lambda.1_1.g1 == 7 - lambda.2_1.g1 - lambda.3_1.g1 - lambda.4_1.g1 - lambda.5_1.g1 - lambda.6_1.g1 - lambda.7_1.g1
lambda.1_1.g2 == 7 - lambda.2_1.g2 - lambda.3_1.g2 - lambda.4_1.g2 - lambda.5_1.g2 - lambda.6_1.g2 - lambda.7_1.g2
nu.1.g1 == 7 - nu.2.g1 - nu.3.g1 - nu.4.g1 - nu.5.g1 - nu.6.g1 - nu.7.g1
nu.1.g2 == 7 - nu.2.g2 - nu.3.g2 - nu.4.g2 - nu.5.g2 - nu.6.g2 - nu.7.g2
lambda.8_2.g1 == 7 - lambda.9_2.g1 - lambda.10_2.g1 - lambda.11_2.g1 - lambda.12_2.g1 - lambda.13_2.g1 - lambda.14_2.g1
lambda.8_2.g2 == 7 - lambda.9_2.g2 - lambda.10_2.g2 - lambda.11_2.g2 - lambda.12_2.g2 - lambda.13_2.g2 - lambda.14_2.g2
nu.8.g1 == 7 - nu.9.g1 - nu.10.g1 - nu.11.g1 - nu.12.g1 - nu.13.g1 - nu.14.g1
nu.8.g2 == 7 - nu.9.g2 - nu.10.g2 - nu.11.g2 - nu.12.g2 - nu.13.g2 - nu.14.g2

To see if the configural model is tenable, we will first inspect indices of global (exact and approximate) fit:

summary(fit.config, fit.measures = T, estimates = F)
lavaan 0.6-19 ended normally after 55 iterations

  Estimator                                         ML
  Optimization method                           NLMINB
  Number of model parameters                        98
  Number of equality constraints                     8

  Number of observations per group:                   
    EnglishNative                                  500
    ELL                                            500

Model Test User Model:
                                              Standard      Scaled
  Test Statistic                               373.938     315.353
  Degrees of freedom                               148         148
  P-value (Chi-square)                           0.000       0.000
  Scaling correction factor                                  1.186
    Yuan-Bentler correction (Mplus variant)                       
  Test statistic for each group:
    EnglishNative                              189.027     189.027
    ELL                                        126.326     126.326

Model Test Baseline Model:

  Test statistic                              6539.400    5341.435
  Degrees of freedom                               182         182
  P-value                                        0.000       0.000
  Scaling correction factor                                  1.224

User Model versus Baseline Model:

  Comparative Fit Index (CFI)                    0.964       0.968
  Tucker-Lewis Index (TLI)                       0.956       0.960
                                                                  
  Robust Comparative Fit Index (CFI)                         0.969
  Robust Tucker-Lewis Index (TLI)                            0.961

Loglikelihood and Information Criteria:

  Loglikelihood user model (H0)             -17763.167  -17763.167
  Scaling correction factor                                  0.893
      for the MLR correction                                      
  Loglikelihood unrestricted model (H1)     -17576.198  -17576.198
  Scaling correction factor                                  1.105
      for the MLR correction                                      
                                                                  
  Akaike (AIC)                               35706.333   35706.333
  Bayesian (BIC)                             36148.031   36148.031
  Sample-size adjusted Bayesian (SABIC)      35862.186   35862.186

Root Mean Square Error of Approximation:

  RMSEA                                          0.055       0.048
  90 Percent confidence interval - lower         0.048       0.041
  90 Percent confidence interval - upper         0.062       0.054
  P-value H_0: RMSEA <= 0.050                    0.104       0.719
  P-value H_0: RMSEA >= 0.080                    0.000       0.000
                                                                  
  Robust RMSEA                                               0.052
  90 Percent confidence interval - lower                     0.044
  90 Percent confidence interval - upper                     0.060
  P-value H_0: Robust RMSEA <= 0.050                         0.345
  P-value H_0: Robust RMSEA >= 0.080                         0.000

Standardized Root Mean Square Residual:

  SRMR                                           0.032       0.032

The Chi-square test is significant, but the approximate fit indices look OK. Our sample is relatively large, so we will look at the local fit indices to see if any remaining misfit is trivial, or if there are major issues:

residuals(fit.config, type = "cor.bollen")
$EnglishNative
$EnglishNative$type
[1] "cor.bollen"

$EnglishNative$cov
        q2     q4     q7     q9    q15    q19    q20     q1     q6     q8
q2   0.000                                                               
q4   0.081  0.000                                                        
q7   0.059  0.020  0.000                                                 
q9  -0.023 -0.056  0.015  0.000                                          
q15 -0.053  0.019 -0.011 -0.004  0.000                                   
q19  0.057  0.003  0.076 -0.037 -0.026  0.000                            
q20 -0.051  0.023 -0.031  0.039  0.008 -0.023  0.000                     
q1  -0.010 -0.050 -0.036 -0.018  0.003  0.019 -0.014  0.000              
q6   0.001 -0.044 -0.035  0.040  0.011 -0.008 -0.023 -0.033  0.000       
q8   0.032  0.019  0.103  0.060  0.092  0.045  0.076 -0.028 -0.024  0.000
q11  0.018 -0.051 -0.073  0.009 -0.024 -0.023 -0.005 -0.012  0.056 -0.034
q12 -0.023 -0.012 -0.050 -0.035  0.022 -0.024 -0.043  0.092 -0.013 -0.006
q14  0.065 -0.030 -0.001 -0.013 -0.031  0.033 -0.017  0.031  0.001 -0.089
q18  0.051  0.021  0.002  0.019 -0.048  0.016 -0.003 -0.037  0.054 -0.021
       q11    q12    q14    q18
q2                             
q4                             
q7                             
q9                             
q15                            
q19                            
q20                            
q1                             
q6                             
q8                             
q11  0.000                     
q12 -0.017  0.000              
q14  0.096 -0.027  0.000       
q18  0.006 -0.011  0.036  0.000

$EnglishNative$mean
 q2  q4  q7  q9 q15 q19 q20  q1  q6  q8 q11 q12 q14 q18 
  0   0   0   0   0   0   0   0   0   0   0   0   0   0 


$ELL
$ELL$type
[1] "cor.bollen"

$ELL$cov
        q2     q4     q7     q9    q15    q19    q20     q1     q6     q8
q2   0.000                                                               
q4   0.076  0.000                                                        
q7   0.035  0.009  0.000                                                 
q9  -0.015 -0.032 -0.027  0.000                                          
q15 -0.040  0.017  0.011  0.035  0.000                                   
q19  0.030  0.007  0.041 -0.073  0.008  0.000                            
q20  0.002 -0.048 -0.017  0.017  0.009 -0.040  0.000                     
q1  -0.020 -0.011 -0.029 -0.055 -0.045  0.007 -0.023  0.000              
q6   0.003  0.001 -0.021 -0.005 -0.019 -0.013  0.004  0.038  0.000       
q8   0.037  0.038  0.078  0.077  0.066  0.069  0.037 -0.055 -0.047  0.000
q11  0.030 -0.021 -0.058 -0.008 -0.016  0.018  0.028  0.032 -0.024 -0.014
q12 -0.034 -0.001 -0.028 -0.023 -0.022  0.037  0.001  0.066 -0.012 -0.008
q14 -0.020 -0.002 -0.005  0.002 -0.053 -0.020 -0.009  0.027  0.032 -0.006
q18  0.015  0.006  0.019  0.023 -0.047  0.044  0.020 -0.036  0.086 -0.036
       q11    q12    q14    q18
q2                             
q4                             
q7                             
q9                             
q15                            
q19                            
q20                            
q1                             
q6                             
q8                             
q11  0.000                     
q12 -0.001  0.000              
q14 -0.002 -0.001  0.000       
q18  0.025 -0.034  0.004  0.000

$ELL$mean
 q2  q4  q7  q9 q15 q19 q20  q1  q6  q8 q11 q12 q14 q18 
  0   0   0   0   0   0   0   0   0   0   0   0   0   0 

Overall, the correlation residuals look good. Only the correlation residual of q7 and q8 was > .1 (it was .103) in the English native speaker group. There is no a priori justification why these two items, which load onto different factors, share an omitted cause (that could be accommodated via a residual covariance). In addition, the correlation residual was relatively close to .1, indicating that adding the residual covariance would likely only have a small impact on global model fit. Based on these results, we will retain the configural invariance model, and examine weak (metric) invariance.

Parameter Estimate Interpretation

Using the function below, we can compare the estimates across groups and see that all the measurement-related parameters are still varying across groups (as we have not included any equality constraints yet).

group_by_groups(fit.config)
       lhs op     rhs est_EnglishNative est_ELL
1  anxiety =~     q15             1.183   1.207
2  anxiety =~     q19             0.902   0.964
3  anxiety =~      q2             0.781   0.817
4  anxiety =~     q20             1.102   1.162
5  anxiety =~      q4             1.015   0.904
6  anxiety =~      q7             0.935   0.906
7  anxiety =~      q9             1.082   1.041
8   stress =~      q1             1.063   0.996
9   stress =~     q11             0.997   1.011
10  stress =~     q12             1.097   1.090
11  stress =~     q14             0.834   0.927
12  stress =~     q18             0.893   0.819
13  stress =~      q6             1.037   1.052
14  stress =~      q8             1.081   1.105
15 anxiety ~~ anxiety             0.560   0.494
16 anxiety ~~  stress             0.508   0.454
17      q1 ~~      q1             0.503   0.434
18     q11 ~~     q11             0.541   0.501
19     q12 ~~     q12             0.483   0.499
20     q14 ~~     q14             0.695   0.568
21     q15 ~~     q15             0.400   0.403
22     q18 ~~     q18             0.804   0.749
23     q19 ~~     q19             0.817   0.755
24      q2 ~~      q2             1.036   0.886
25     q20 ~~     q20             0.616   0.486
26      q4 ~~     q19             0.210   0.182
27      q4 ~~      q4             0.636   0.672
28      q4 ~~      q7             0.028   0.136
29      q6 ~~      q6             0.538   0.562
30      q7 ~~      q7             0.649   0.684
31      q8 ~~      q8             0.571   0.476
32      q9 ~~      q9             0.660   0.604
33  stress ~~  stress             0.569   0.505
34 anxiety ~1                     1.308   1.180
35      q1 ~1                     0.880   0.886
36     q11 ~1                     1.135   0.922
37     q12 ~1                     0.937   0.905
38     q14 ~1                     1.043   1.029
39     q15 ~1                     0.879   0.662
40     q18 ~1                     1.094   1.439
41     q19 ~1                     1.117   1.047
42      q2 ~1                     1.121   1.312
43     q20 ~1                     0.925   0.900
44      q4 ~1                     0.764   0.836
45      q6 ~1                     1.038   1.005
46      q7 ~1                     0.869   0.851
47      q8 ~1                     0.874   0.814
48      q9 ~1                     1.324   1.392
49  stress ~1                     1.549   1.351

7.5 Weak (Metric) Invariance

This is where the semTools function really comes in handy. All we need to do to estimate the Weak Invariance model is change the arguments of the function a little bit:

fit.weak <- measEq.syntax(configural.model = cfa_config, 
                          data = DASS21, 
                          group = "engnat", 
                          ID.fac = "effects", 
                          meanstructure = TRUE, 
                          return.fit = TRUE,
                          estimator = "mlr",
                          group.equal = "loadings")

# Print out the model syntax, so you can
# see what semTools is helping us do:
# cat(as.character(fit.weak@call$model))

In the model syntax above (if you remove the # and run the line of code), you can see that each factor loading now has the same label across the two groups, ensuring that they are constraint to be equal.

Model Comparison

To test if we can retain the Weak Invariance model, we compare its fit to that of the Configural Invariance model:

comp_12 <- compareFit(fit.config, fit.weak)
summary(comp_12)
################### Nested Model Comparison #########################

Scaled Chi-Squared Difference Test (method = "satorra.bentler.2001")

lavaan->unknown():  
   lavaan NOTE: The "Chisq" column contains standard test statistics, not the 
   robust test that should be reported per model. A robust difference test is 
   a function of two standard (not robust) statistics.
            Df   AIC   BIC  Chisq Chisq diff Df diff Pr(>Chisq)
fit.config 148 35706 36148 373.94                              
fit.weak   160 35691 36074 382.85     11.021      12     0.5271

####################### Model Fit Indices ###########################
           chisq.scaled df.scaled pvalue.scaled rmsea.robust cfi.robust
fit.config     315.353†       148          .000        .052       .969 
fit.weak       330.756        160          .000        .050†      .969†
           tli.robust  srmr        aic        bic
fit.config      .961  .032† 35706.333  36148.031 
fit.weak        .964† .036  35691.248† 36074.053†

################## Differences in Fit Indices #######################
                      df.scaled rmsea.robust cfi.robust tli.robust  srmr
fit.weak - fit.config        12       -0.002          0      0.003 0.004
                          aic     bic
fit.weak - fit.config -15.086 -73.979

What does the Chi-square difference test result tell us?

Parameter Estimate Interpretation

Using the function below, we can compare the estimates across groups and see that all the loadings are now exactly the same (because we constrained them to be). Other parameters are still varying across groups.

group_by_groups(fit.weak)
       lhs op     rhs est_EnglishNative est_ELL
1  anxiety =~     q15             1.192   1.192
2  anxiety =~     q19             0.930   0.930
3  anxiety =~      q2             0.798   0.798
4  anxiety =~     q20             1.130   1.130
5  anxiety =~      q4             0.964   0.964
6  anxiety =~      q7             0.925   0.925
7  anxiety =~      q9             1.061   1.061
8   stress =~      q1             1.028   1.028
9   stress =~     q11             1.004   1.004
10  stress =~     q12             1.093   1.093
11  stress =~     q14             0.882   0.882
12  stress =~     q18             0.857   0.857
13  stress =~      q6             1.044   1.044
14  stress =~      q8             1.093   1.093
15 anxiety ~~ anxiety             0.559   0.499
16 anxiety ~~  stress             0.508   0.456
17      q1 ~~      q1             0.508   0.430
18     q11 ~~     q11             0.539   0.502
19     q12 ~~     q12             0.486   0.498
20     q14 ~~     q14             0.692   0.574
21     q15 ~~     q15             0.400   0.403
22     q18 ~~     q18             0.808   0.747
23     q19 ~~     q19             0.814   0.758
24      q2 ~~      q2             1.036   0.886
25     q20 ~~     q20             0.613   0.491
26      q4 ~~     q19             0.211   0.179
27      q4 ~~      q4             0.645   0.666
28      q4 ~~      q7             0.033   0.132
29      q6 ~~      q6             0.537   0.562
30      q7 ~~      q7             0.651   0.681
31      q8 ~~      q8             0.570   0.479
32      q9 ~~      q9             0.663   0.601
33  stress ~~  stress             0.569   0.505
34 anxiety ~1                     1.308   1.180
35      q1 ~1                     0.934   0.842
36     q11 ~1                     1.123   0.931
37     q12 ~1                     0.944   0.901
38     q14 ~1                     0.968   1.090
39     q15 ~1                     0.867   0.679
40     q18 ~1                     1.149   1.388
41     q19 ~1                     1.080   1.087
42      q2 ~1                     1.099   1.335
43     q20 ~1                     0.888   0.937
44      q4 ~1                     0.831   0.765
45      q6 ~1                     1.027   1.015
46      q7 ~1                     0.882   0.828
47      q8 ~1                     0.856   0.831
48      q9 ~1                     1.353   1.369
49  stress ~1                     1.549   1.351

When we retain the Weak Invariance model, we can conclude that the measured concepts (Anxiety and Stress) have the same meaning across groups, because the item-factor associations can be considered equivalent (the same amount of common variance is extracted across groups).

7.6 Strong (Scalar) Invariance

Again, all we need to do to estimate the Weak Invariance model is change the arguments of the function a little bit:

fit.strong <- measEq.syntax(configural.model = cfa_config, 
                            data = DASS21, 
                            group = "engnat", 
                            ID.fac = "effects", 
                            meanstructure = TRUE, 
                            return.fit = TRUE,
                            estimator = "mlr",
                            group.equal = c("loadings", "intercepts"))

# Print out the model syntax, so you can
# see what semTools is helping us do:
# cat(as.character(fit.strong@call$model))

Model Comparison

To test if we can retain the Strong Invariance model, we compare its fit to that of the Weak Invariance model:

comp_23 <- compareFit(fit.weak, fit.strong)
summary(comp_23)
################### Nested Model Comparison #########################

Scaled Chi-Squared Difference Test (method = "satorra.bentler.2001")

lavaan->unknown():  
   lavaan NOTE: The "Chisq" column contains standard test statistics, not the 
   robust test that should be reported per model. A robust difference test is 
   a function of two standard (not robust) statistics.
            Df   AIC   BIC  Chisq Chisq diff Df diff Pr(>Chisq)    
fit.weak   160 35691 36074 382.85                                  
fit.strong 172 35748 36072 463.60     81.276      12  2.356e-12 ***
---
Signif. codes:  0 '***' 0.001 '**' 0.01 '*' 0.05 '.' 0.1 ' ' 1

####################### Model Fit Indices ###########################
           chisq.scaled df.scaled pvalue.scaled rmsea.robust cfi.robust
fit.weak       330.756†       160          .000        .050†      .969†
fit.strong     404.514        172          .000        .056       .958 
           tli.robust  srmr        aic        bic
fit.weak        .964† .036† 35691.248† 36074.053 
fit.strong      .955  .042  35747.993  36071.905†

################## Differences in Fit Indices #######################
                      df.scaled rmsea.robust cfi.robust tli.robust  srmr    aic
fit.strong - fit.weak        12        0.006     -0.011     -0.009 0.006 56.745
                         bic
fit.strong - fit.weak -2.148

What does the Chi-square difference test result tell us?

Local Fit Comparison

Since the fit worsened significantly when testing Strong Invariance, we will focus on problematic residuals in the mean vectors, since constraining the intercepts will affect model fit by changing the model implied means.

residuals(fit.strong, type = "cor.bollen")$EnglishNative$mean
    q2     q4     q7     q9    q15    q19    q20     q1     q6     q8    q11 
-0.119  0.014  0.009 -0.023  0.067 -0.017 -0.042  0.032 -0.006 -0.002  0.080 
   q12    q14    q18 
 0.006 -0.074 -0.118 
residuals(fit.strong, type = "cor.bollen")$ELL$mean
    q2     q4     q7     q9    q15    q19    q20     q1     q6     q8    q11 
 0.108 -0.018 -0.012  0.022 -0.069  0.017  0.035 -0.030  0.007  0.002 -0.078 
   q12    q14    q18 
-0.007  0.064  0.116 

You will notice that mean residuals for a specific indicator are negative for one group and positive for the other. When we constrain intercepts to be equal across groups, the estimate tends to find a compromise between the two group intercepts. Thus, one group will always have an observed mean that is above that estimate while the other has an observed mean that is below that estimate.

Which intercepts appear to be different across groups?

7.7 Partial Strong Invariance (Round 1)

Since the model fit indicated some issues with retaining the Strong Invariance model, we will now examine Partial Strong Invariance. Remember what we already saw in the local fit evaluation of the mean residuals. Next, we will use the Backwards MI approach, using modification indices to evaluate which equality constraint would improve fit most if removed from the model. When working with equality constraints, we cannot use the modificationindices() function. Instead, we need to use the lavTestScore() function. The output provides the same information, but in a less accessible format. In the code below, I am excluding the first four rows of output (by asking for 5:32) to hide the effect-coding equality constraints, which take up a lot of space. You can omit the [5:32,] part.

lavTestScore(fit.strong)$uni[5:32,]

univariate score tests:

     lhs op   rhs     X2 df p.value
5   .p1. == .p50. 17.694  1   0.000
6   .p2. == .p51.  1.809  1   0.179
7   .p3. == .p52.  0.150  1   0.698
8   .p4. == .p53.  0.708  1   0.400
9   .p5. == .p54. 16.959  1   0.000
10  .p6. == .p55.  1.459  1   0.227
11  .p7. == .p56.  4.825  1   0.028
12  .p8. == .p57.  3.574  1   0.059
13  .p9. == .p58.  0.147  1   0.701
14 .p10. == .p59.  0.035  1   0.852
15 .p11. == .p60. 13.575  1   0.000
16 .p12. == .p61.  0.131  1   0.717
17 .p13. == .p62. 10.311  1   0.001
18 .p14. == .p63. 18.197  1   0.000
19 .p15. == .p64. 18.309  1   0.000
20 .p16. == .p65.  0.738  1   0.390
21 .p17. == .p66.  0.116  1   0.734
22 .p18. == .p67.  1.210  1   0.271
23 .p19. == .p68. 19.397  1   0.000
24 .p20. == .p69.  0.813  1   0.367
25 .p21. == .p70.  4.146  1   0.042
26 .p22. == .p71.  2.654  1   0.103
27 .p23. == .p72.  0.109  1   0.741
28 .p24. == .p73.  0.007  1   0.932
29 .p25. == .p74. 15.167  1   0.000
30 .p26. == .p75.  0.126  1   0.723
31 .p27. == .p76.  8.698  1   0.003
32 .p28. == .p77. 22.118  1   0.000

Releasing the equality constraint of .p28 and .p77 is associated with the largest expected change in the model Chi-square test. But what are these parameters? We can add some arguments to the lavTestScore() function to get a table with the expected parameter changes for released constraints, which has more information about the specific parameters. I will filter out only the intercepts so the output is not too long:

lavTestScore(fit.strong, epc = TRUE, standardized = FALSE)$epc %>% 
  filter(str_detect(label, "nu."))
Warning: lavaan->lavTestScore():  
   se is not `standard'; not implemented yet; falling back to ordinary score 
   test

expected parameter changes (epc) and expected parameter values (epv):

   lhs op rhs block group free label plabel    est    epc    epv
1   q2 ~1         1     1   15  nu.1  .p15.  0.474 -0.136  0.339
2   q4 ~1         1     1   16  nu.2  .p16. -0.180 -0.083 -0.263
3   q7 ~1         1     1   17  nu.3  .p17. -0.082  0.005 -0.077
4   q9 ~1         1     1   18  nu.4  .p18.  0.301 -0.074  0.226
5  q15 ~1         1     1   19  nu.5  .p19. -0.457  0.147 -0.310
6  q19 ~1         1     1   20  nu.6  .p20.  0.152  0.054  0.205
7  q20 ~1         1     1   21  nu.7  .p21. -0.207  0.021 -0.186
8   q1 ~1         1     1   22  nu.8  .p22. -0.171 -0.025 -0.196
9   q6 ~1         1     1   23  nu.9  .p23. -0.027  0.018 -0.009
10  q8 ~1         1     1   24 nu.10  .p24. -0.260  0.039 -0.221
11 q11 ~1         1     1   25 nu.11  .p25. -0.031  0.162  0.131
12 q12 ~1         1     1   26 nu.12  .p26. -0.187  0.018 -0.169
13 q14 ~1         1     1   27 nu.13  .p27.  0.189  0.017  0.205
14 q18 ~1         1     1   28 nu.14  .p28.  0.488 -0.292  0.196
15  q2 ~1         2     2   64  nu.1  .p64.  0.474  0.024  0.498
16  q4 ~1         2     2   65  nu.2  .p65. -0.180  0.116 -0.064
17  q7 ~1         2     2   66  nu.3  .p66. -0.082  0.031 -0.050
18  q9 ~1         2     2   67  nu.4  .p67.  0.301  0.055  0.356
19 q15 ~1         2     2   68  nu.5  .p68. -0.457 -0.081 -0.538
20 q19 ~1         2     2   69  nu.6  .p69.  0.152 -0.066  0.085
21 q20 ~1         2     2   70  nu.7  .p70. -0.207 -0.048 -0.255
22  q1 ~1         2     2   71  nu.8  .p71. -0.171  0.065 -0.107
23  q6 ~1         2     2   72  nu.9  .p72. -0.027 -0.019 -0.046
24  q8 ~1         2     2   73 nu.10  .p73. -0.260 -0.029 -0.289
25 q11 ~1         2     2   74 nu.11  .p74. -0.031 -0.053 -0.084
26 q12 ~1         2     2   75 nu.12  .p75. -0.187  0.000 -0.187
27 q14 ~1         2     2   76 nu.13  .p76.  0.189 -0.085  0.104
28 q18 ~1         2     2   77 nu.14  .p77.  0.488  0.139  0.627

From this table, we can see that .p28 and .p77 are the parameter labels for the intercept of q18, which also had a larger correlation residual. This question is worded as follows: ‘I felt that I was rather touchy’. The output above tells us something about what the intercept for this item would be if it were freely estimated across groups (in the epv [expected parameter value] column).

From this information, we can see that the intercept for English Native speakers would decrease from .488 to .196. This indicates that, for English Native speakers with a true value of 0 on Stress, their expected response to this item is close to 0 (or the average intercept across all indicators of Stress).

In contrast, for the ELL group, we can see that the intercept would increase from .488 to .627. This indicates that, for ELL speakers with a true value of 0 on Stress, their expected response to this item is further away from 0 (so higher that the average intercept of all indicators of Stress).

In other words, when English Native Speakers and ELL Speakers experience the same underlying levels of stress, ELL Speakers respond with a higher response option to ‘I felt that I was rather touchy’.

To test if releasing this constraint improves the fit of the Partial Strong Model sufficiently, we need to estimate the partial model. To do so, we can use the group.partial argument in the measEq.syntax() function:

fit.partial1 <- measEq.syntax(configural.model = cfa_config, 
                              data = DASS21, 
                              group = "engnat", 
                              ID.fac = "effects", 
                              meanstructure = TRUE, 
                              return.fit = TRUE,
                              estimator = "mlr",
                              group.equal = c("loadings", "intercepts"),
                              group.partial = "q18 ~ 1")

# Print out the model syntax, so you can
# see what semTools is helping us do:
# cat(as.character(fit.partial1@call$model))

Model Comparison

Let’s compare the fit of this model to the Weak Invariance model

comp_23b <- compareFit(fit.weak, fit.partial1)
summary(comp_23b)
################### Nested Model Comparison #########################

Scaled Chi-Squared Difference Test (method = "satorra.bentler.2001")

lavaan->unknown():  
   lavaan NOTE: The "Chisq" column contains standard test statistics, not the 
   robust test that should be reported per model. A robust difference test is 
   a function of two standard (not robust) statistics.
              Df   AIC   BIC  Chisq Chisq diff Df diff Pr(>Chisq)    
fit.weak     160 35691 36074 382.85                                  
fit.partial1 171 35728 36056 441.15     58.446      11    1.8e-08 ***
---
Signif. codes:  0 '***' 0.001 '**' 0.01 '*' 0.05 '.' 0.1 ' ' 1

####################### Model Fit Indices ###########################
             chisq.scaled df.scaled pvalue.scaled rmsea.robust cfi.robust
fit.weak         330.756†       160          .000        .050†      .969†
fit.partial1     384.540        171          .000        .054       .961 
             tli.robust  srmr        aic        bic
fit.weak          .964† .036† 35691.248† 36074.053 
fit.partial1      .959  .041  35727.542  36056.361†

################## Differences in Fit Indices #######################
                        df.scaled rmsea.robust cfi.robust tli.robust  srmr
fit.partial1 - fit.weak        11        0.004     -0.007     -0.006 0.004
                           aic     bic
fit.partial1 - fit.weak 36.294 -17.692

What do the results indicate?

Local Fit Evaluation

Since the fit worsened significantly when testing Strong Invariance, we will focus on problematic residuals in the mean vectors.

residuals(fit.partial1, type = "cor.bollen")$EnglishNative$mean
    q2     q4     q7     q9    q15    q19    q20     q1     q6     q8    q11 
-0.119  0.014  0.009 -0.023  0.067 -0.017 -0.041  0.021 -0.017 -0.013  0.069 
   q12    q14    q18 
-0.004 -0.083  0.000 
residuals(fit.partial1, type = "cor.bollen")$ELL$mean
    q2     q4     q7     q9    q15    q19    q20     q1     q6     q8    q11 
 0.108 -0.018 -0.013  0.022 -0.069  0.017  0.035 -0.019  0.018  0.012 -0.067 
   q12    q14    q18 
 0.005  0.072  0.000 

What do you think the next indicator is that we need to focus on?

7.8 Partial Strong Invariance (Round 2)

Since the model fit indicated some remaining issues with retaining the Partial Strong Invariance model, we will do another round of partial invariance testing:

lavTestScore(fit.partial1)$uni[5:31,]
Warning: lavaan->lavTestScore():  
   se is not `standard'; not implemented yet; falling back to ordinary score 
   test

univariate score tests:

     lhs op   rhs     X2 df p.value
5   .p1. == .p50. 17.693  1   0.000
6   .p2. == .p51.  1.813  1   0.178
7   .p3. == .p52.  0.151  1   0.697
8   .p4. == .p53.  0.706  1   0.401
9   .p5. == .p54. 16.953  1   0.000
10  .p6. == .p55.  1.459  1   0.227
11  .p7. == .p56.  4.822  1   0.028
12  .p8. == .p57.  1.808  1   0.179
13  .p9. == .p58.  0.825  1   0.364
14 .p10. == .p59.  0.567  1   0.451
15 .p11. == .p60. 10.119  1   0.001
16 .p12. == .p61.  0.049  1   0.825
17 .p13. == .p62. 12.957  1   0.000
18 .p14. == .p63.  0.903  1   0.342
19 .p15. == .p64. 18.311  1   0.000
20 .p16. == .p65.  0.739  1   0.390
21 .p17. == .p66.  0.116  1   0.733
22 .p18. == .p67.  1.208  1   0.272
23 .p19. == .p68. 19.398  1   0.000
24 .p20. == .p69.  0.812  1   0.367
25 .p21. == .p70.  4.142  1   0.042
26 .p22. == .p71.  1.116  1   0.291
27 .p23. == .p72.  0.761  1   0.383
28 .p24. == .p73.  0.448  1   0.503
29 .p25. == .p74. 11.478  1   0.001
30 .p26. == .p75.  0.063  1   0.802
31 .p27. == .p76. 11.226  1   0.001

Releasing the equality constraint of .p19 and .p68 is associated with the largest expected change in the model Chi-square test (19.398), followed closely by the constraint of .p15 and .p64 (18.311). Let’s look at the alternative lavTestScore() output to find out what intercepts these are.

lavTestScore(fit.partial1, epc = TRUE, standardized = FALSE)$epc %>% 
  filter(str_detect(label, "nu."))
Warning: lavaan->lavTestScore():  
   se is not `standard'; not implemented yet; falling back to ordinary score 
   test

expected parameter changes (epc) and expected parameter values (epv):

   lhs op rhs block group free    label plabel    est    epc    epv
1   q2 ~1         1     1   15     nu.1  .p15.  0.474 -0.136  0.338
2   q4 ~1         1     1   16     nu.2  .p16. -0.181 -0.083 -0.263
3   q7 ~1         1     1   17     nu.3  .p17. -0.082  0.005 -0.077
4   q9 ~1         1     1   18     nu.4  .p18.  0.300 -0.074  0.226
5  q15 ~1         1     1   19     nu.5  .p19. -0.456  0.147 -0.309
6  q19 ~1         1     1   20     nu.6  .p20.  0.151  0.054  0.205
7  q20 ~1         1     1   21     nu.7  .p21. -0.207  0.022 -0.186
8   q1 ~1         1     1   22     nu.8  .p22. -0.089 -0.048 -0.137
9   q6 ~1         1     1   23     nu.9  .p23.  0.054 -0.008  0.046
10  q8 ~1         1     1   24    nu.10  .p24. -0.171  0.010 -0.161
11 q11 ~1         1     1   25    nu.11  .p25.  0.044  0.139  0.184
12 q12 ~1         1     1   26    nu.12  .p26. -0.099 -0.010 -0.109
13 q14 ~1         1     1   27    nu.13  .p27.  0.262 -0.012  0.249
14 q18 ~1         1     1   28 nu.14.g1  .p28.  0.327 -0.088  0.239
15  q2 ~1         2     2   64     nu.1  .p64.  0.474  0.024  0.498
16  q4 ~1         2     2   65     nu.2  .p65. -0.181  0.116 -0.064
17  q7 ~1         2     2   66     nu.3  .p66. -0.082  0.031 -0.050
18  q9 ~1         2     2   67     nu.4  .p67.  0.300  0.055  0.356
19 q15 ~1         2     2   68     nu.5  .p68. -0.456 -0.081 -0.537
20 q19 ~1         2     2   69     nu.6  .p69.  0.151 -0.066  0.085
21 q20 ~1         2     2   70     nu.7  .p70. -0.207 -0.048 -0.255
22  q1 ~1         2     2   71     nu.8  .p71. -0.089  0.068 -0.021
23  q6 ~1         2     2   72     nu.9  .p72.  0.054 -0.012  0.042
24  q8 ~1         2     2   73    nu.10  .p73. -0.171 -0.022 -0.194
25 q11 ~1         2     2   74    nu.11  .p74.  0.044 -0.044  0.000
26 q12 ~1         2     2   75    nu.12  .p75. -0.099  0.007 -0.093
27 q14 ~1         2     2   76    nu.13  .p76.  0.262 -0.078  0.183
28 q18 ~1         2     2   77 nu.14.g2  .p77.  0.607  0.085  0.692

From this table, we can see that .p19 and .p68 are the parameter labels for the intercept of q15. This question did not necessarily have a large mean residual (that was q2, which is associated with the second largest drop in Chi Square based on the table above). However, if we look at the epv column, we can see that the intercepts are expected to be more different across groups for q15 compared to q2.

Item q15 is worded as follows: ‘I felt I was close to panic’, and if released, the intercept would be lower for ELL speakers than for English Native speakers. This means that at a true level of 0 on Anxiety, ELL report less feelings of ‘panic’ compared to English Native speakers.

Item q2 is worded as follows: ‘I was aware of dryness in my mouth’, and if released, the intercept would be higher for ELL speakers than for English Native speakers. This means that at a true level of 0 on Anxiety, ELL report more experience of ‘dry mouth’ than English Native speakers.

I don’t have much theoretical justification for freeing either of these intercepts. Maybe they are both kind of strange statements for ELL speakers? We are definitely going into a more exploratory mode of invariance testing here. Given the fact that the expected change in fit between these two options is very similar, but the larger mean residual was associated with q2, I’ve decided to release the intercept of q2 next. Note that you might make a different decision.

To test if releasing this constraint improves the fit of the Partial Strong Model sufficiently, we need to estimate a second partial model. To do so, we can use the group.partial argument in the measEq.syntax() function and add this second intercept:

fit.partial2 <- measEq.syntax(configural.model = cfa_config, 
                              data = DASS21, 
                              group = "engnat", 
                              ID.fac = "effects", 
                              meanstructure = TRUE, 
                              return.fit = TRUE,
                              estimator = "mlr",
                              group.equal = c("loadings", "intercepts"),
                              group.partial = c("q18~1", "q2~1"))

# Print out the model syntax, so you can
# see what semTools is helping us do:
# cat(as.character(fit.partial2@call$model))

Let’s compare the fit of this model to the Weak Invariance model

comp_23c <- compareFit(fit.weak, fit.partial2)
summary(comp_23c)
################### Nested Model Comparison #########################

Scaled Chi-Squared Difference Test (method = "satorra.bentler.2001")

lavaan->unknown():  
   lavaan NOTE: The "Chisq" column contains standard test statistics, not the 
   robust test that should be reported per model. A robust difference test is 
   a function of two standard (not robust) statistics.
              Df   AIC   BIC  Chisq Chisq diff Df diff Pr(>Chisq)    
fit.weak     160 35691 36074 382.85                                  
fit.partial2 170 35711 36045 422.62     39.744      10   1.88e-05 ***
---
Signif. codes:  0 '***' 0.001 '**' 0.01 '*' 0.05 '.' 0.1 ' ' 1

####################### Model Fit Indices ###########################
             chisq.scaled df.scaled pvalue.scaled rmsea.robust cfi.robust
fit.weak         330.756†       160          .000        .050†      .969†
fit.partial2     368.046        170          .000        .052       .964 
             tli.robust  srmr        aic        bic
fit.weak          .964† .036† 35691.248† 36074.053 
fit.partial2      .961  .039  35711.011  36044.738†

################## Differences in Fit Indices #######################
                        df.scaled rmsea.robust cfi.robust tli.robust  srmr
fit.partial2 - fit.weak        10        0.002     -0.005     -0.003 0.003
                           aic     bic
fit.partial2 - fit.weak 19.763 -29.314

What do the results indicate?

Local Fit Evaluation

Again, we will focus on the residuals of the mean vectors.

residuals(fit.partial2, type = "cor.bollen")$EnglishNative$mean
    q2     q4     q7     q9    q15    q19    q20     q1     q6     q8    q11 
 0.000  0.006  0.001 -0.032  0.056 -0.025 -0.052  0.021 -0.017 -0.013  0.069 
   q12    q14    q18 
-0.004 -0.083  0.000 
residuals(fit.partial2, type = "cor.bollen")$ELL$mean
    q2     q4     q7     q9    q15    q19    q20     q1     q6     q8    q11 
 0.000 -0.008 -0.003  0.032 -0.058  0.025  0.044 -0.019  0.018  0.012 -0.067 
   q12    q14    q18 
 0.005  0.072  0.000 

This time, we don’t see any problems with the mean vector in terms of local fit.

CFI and RMSEA Comparison

Next, we will examine if the CFI and RMSEA change to a lesser extent than our criteria (decrease in CFI ≤ 0.010, increase in RMSEA ≤ 0.015):

# Change in CFI
cat(paste0("Change in CFI: ", comp_23c@fit.diff$cfi.robust))
Change in CFI: -0.00471114055415356
# Change in RMSEA
cat(paste0("Change in RMSEA: ", comp_23c@fit.diff$rmsea.robust))
Change in RMSEA: 0.00201878813551178

These are much smaller than our cutoff criteria (see statistical decision criteria above). They are also smaller than other suggested criteria (e.g., decrease in CFI ≤ 0.002 or increase in RMSEA ≤ 0.010). Based on this information, I will retain this partial strong invariant model with two freely estimated intercepts.

7.9 Parameter Comparison Across Groups

Now that we have finalized the invariance testing procedure (note that you could test higher levels of invariance, but that these offer few practical benefits), we can continue to compare latent factor variances, covariances, and means across groups.

Factor Variances

First, we will test if the factor variances of the Anxiety and Stress factors are equivalent across groups. We will do so by adding lv.variances to the group.equal argument of the previous partial strong invariance model.

fit.partial2.lv <- measEq.syntax(configural.model = cfa_config, 
                              data = DASS21, 
                              group = "engnat", 
                              ID.fac = "effects", 
                              meanstructure = TRUE, 
                              return.fit = TRUE,
                              estimator = "mlr",
                              group.equal = c("loadings", "intercepts", 
                                          "lv.variances"),
                              group.partial = c("q18~1", "q2~1"))

# Print out the model syntax, so you can
# see what semTools is helping us do:
# cat(as.character(fit.partial2.lv@call$model))

We can now compare the fit of this model to the previous partial strong invariance model:

comp_3c4 <- compareFit(fit.partial2, fit.partial2.lv)
summary(comp_3c4)
################### Nested Model Comparison #########################

Scaled Chi-Squared Difference Test (method = "satorra.bentler.2001")

lavaan->unknown():  
   lavaan NOTE: The "Chisq" column contains standard test statistics, not the 
   robust test that should be reported per model. A robust difference test is 
   a function of two standard (not robust) statistics.
                 Df   AIC   BIC  Chisq Chisq diff Df diff Pr(>Chisq)
fit.partial2    170 35711 36045 422.62                              
fit.partial2.lv 172 35709 36033 424.20     2.4853       2     0.2886

####################### Model Fit Indices ###########################
                chisq.scaled df.scaled pvalue.scaled rmsea.robust cfi.robust
fit.partial2        368.046†       170          .000        .052       .964†
fit.partial2.lv     371.348        172          .000        .051†      .964 
                tli.robust  srmr        aic        bic
fit.partial2         .961  .039† 35711.011  36044.738 
fit.partial2.lv      .962† .047  35708.592† 36032.504†

################## Differences in Fit Indices #######################
                               df.scaled rmsea.robust cfi.robust tli.robust
fit.partial2.lv - fit.partial2         2            0          0          0
                                srmr    aic     bic
fit.partial2.lv - fit.partial2 0.008 -2.419 -12.234

What does the Chi-square Difference test tell us?

Factor Covariances

Next, we will test if the factor covariance between Anxiety and Stress is equivalent across groups. We will do so by adding lv.covariances to the group.equal argument of the previous partial strong invariance model with equivalent factor variances:

fit.partial2.lcv <- measEq.syntax(configural.model = cfa_config, 
                              data = DASS21, 
                              group = "engnat", 
                              ID.fac = "effects", 
                              meanstructure = TRUE, 
                              return.fit = TRUE,
                              estimator = "mlr",
                              group.equal = c("loadings", "intercepts", 
                                          "lv.variances",
                                          "lv.covariances"),
                              group.partial = c("q18~1", "q2~1"))

# Print out the model syntax, so you can
# see what semTools is helping us do:
# cat(as.character(fit.partial2.lcv@call$model))

We can now compare the fit of this model to the previous partial strong invariance model with equivalent factor variances:

comp_45 <- compareFit(fit.partial2.lv, fit.partial2.lcv)
summary(comp_45)
################### Nested Model Comparison #########################

Scaled Chi-Squared Difference Test (method = "satorra.bentler.2001")

lavaan->unknown():  
   lavaan NOTE: The "Chisq" column contains standard test statistics, not the 
   robust test that should be reported per model. A robust difference test is 
   a function of two standard (not robust) statistics.
                  Df   AIC   BIC  Chisq Chisq diff Df diff Pr(>Chisq)
fit.partial2.lv  172 35709 36033 424.20                              
fit.partial2.lcv 173 35707 36026 424.58    0.27108       1     0.6026

####################### Model Fit Indices ###########################
                 chisq.scaled df.scaled pvalue.scaled rmsea.robust cfi.robust
fit.partial2.lv      371.348        172          .000        .051       .964 
fit.partial2.lcv     371.157†       173          .000        .051†      .964†
                 tli.robust  srmr        aic        bic
fit.partial2.lv       .962  .047  35708.592  36032.504 
fit.partial2.lcv      .962† .046† 35706.978† 36025.982†

################## Differences in Fit Indices #######################
                                   df.scaled rmsea.robust cfi.robust tli.robust
fit.partial2.lcv - fit.partial2.lv         1            0          0          0
                                     srmr    aic    bic
fit.partial2.lcv - fit.partial2.lv -0.001 -1.614 -6.522

What does the Chi-square Difference test tell us?

Factor Means

Finally, we can compare the Anxiety and Stress latent factor means to each other. We can use the effect size formula from the book chapter to do so (which can be interpreted as a Cohen’s d effect size). For each of the latent factors, we need the latent mean estimates and an estimate of the factor variance (in our case, the factor variance is equivalent, but the book describes options that can be used if the factor variances are not equivalent). We will find these estimates using the group_by_groups function from the semhelpinghands package:

group_by_groups(fit.partial2.lcv) %>% 
  filter((op == "~1" & (lhs == "stress" | lhs == "anxiety")) | 
           op == "~~" & (lhs == "stress" | lhs == "anxiety"))
      lhs op     rhs est_EnglishNative est_ELL
1 anxiety ~~ anxiety             0.527   0.527
2 anxiety ~~  stress             0.481   0.481
3  stress ~~  stress             0.535   0.535
4 anxiety ~1                     2.269   2.088
5  stress ~1                     2.506   2.261

Here are the computations of the effect size for the mean difference in Anxiety:

# mean in English Native Speaker: 2.269
# mean in ELL Speaker: 2.088
# variance: 0.527

(2.269 - 2.088) / sqrt(0.527)
[1] 0.2493293

This indicates that the mean of the Anxiety factor of the English Native Speaker sample is about 0.25 standard deviations higher than that of the ELL Speaker sample.

Here are the computations of the effect size for the mean difference in Stress:

# mean in English Native Speaker: 2.506
# mean in ELL Speaker: 2.261
# variance: 0.535

(2.506 - 2.261) / sqrt(0.535)
[1] 0.3349571

This indicates that the mean of the Stress factor of the English Native Speaker sample is about 0.33 standard deviations higher than that of the ELL Speaker sample.

We can also test whether the mean differences are significant using a Wald test. This test is based on the difference in model chi-square that occurs when two parameters (in this case the latent factor means) are constrained to be equivalent.

# H0: Average anxiety is equivalent across 
# English Native and ELL speakers
lavTestWald(fit.partial2.lcv, constraints = "alpha.1.g1 == alpha.1.g2")
$stat
[1] 12.99259

$df
[1] 1

$p.value
[1] 0.0003127267

$se
[1] "robust.huber.white"
# H0: Average stress is equivalent across 
# English Native and ELL speakers
lavTestWald(fit.partial2.lcv, constraints = "alpha.2.g1 == alpha.2.g2")
$stat
[1] 23.89415

$df
[1] 1

$p.value
[1] 1.017804e-06

$se
[1] "robust.huber.white"

Based on the results above, both tests were significant (p < .05), so the mean differences are significant between groups.

7.10 Conclusion

In this example, we found evidence of configural and weak invariance of the Anxiety and Stress factors from the DASS-21 over English native and ELL speaker samples. There is somewhat less convincing evidence of strong invariance, in that most, but not all, intercepts may be equivalent across groups. This means that the English native and ELL speaker samples did not have the same relative standing on one anxiety and one stress item, given the same position on the underlying conceptual variable (i.e., Anxiety/Stress).

In a paper, you would report all of the above, adding all unstandardized estimates + SEs of the final retained model (in a table).

7.11 Summary

In this R lab, you were introduced to the steps involved in testing measurement invariance with continuous indicators. Below, you’ll find a Bonus section that demonstrates how group mean differences based on the observed item responses differ from what we found using the CFA approach. In the next R Lab, you will learn all about structural after measurement models, which separate the measurement from the structural model in the estimation steps.

7.12 Bonus: Compare Findings to Mean Difference Based on Observed Mean Scores

The code below shows you how you can compare the two groups using mean scores based on the observed items (what you might typically have done), using t-tests. In this case, both Cohen’s D estimates are smaller compared to the difference estimates based on the latent factors. Thus, if we had ignored the measurement model and measurement invariance question, we would have concluded that the differences between native English speakers and ELL speaker were smaller.

# Create mean score variables:
DASS21 <- DASS21 %>%
  rowwise() %>%
  mutate(anx_mean = mean(c(q2, q4, q7, q9, q15, q19, q20)),
         str_mean = mean(c(q1, q6, q8, q11, q12, q14, q18)))

# Test significance of mean difference (both significant)
t.test(anx_mean ~ engnat, data = DASS21)

    Welch Two Sample t-test

data:  anx_mean by engnat
t = 2.5351, df = 994.7, p-value = 0.01139
alternative hypothesis: true difference in means between group EnglishNative and group ELL is not equal to 0
95 percent confidence interval:
 0.02891907 0.22708093
sample estimates:
mean in group EnglishNative           mean in group ELL 
                   2.307714                    2.179714 
t.test(str_mean ~ engnat, data = DASS21)

    Welch Two Sample t-test

data:  str_mean by engnat
t = 3.965, df = 994.77, p-value = 7.864e-05
alternative hypothesis: true difference in means between group EnglishNative and group ELL is not equal to 0
95 percent confidence interval:
 0.09971799 0.29513916
sample estimates:
mean in group EnglishNative           mean in group ELL 
                   2.548857                    2.351429 
# Get cohen's D 
# (both under-estimated compared to latent mean comparison):
psych::cohen.d(DASS21$anx_mean, group = DASS21$engnat)
Call: psych::cohen.d(x = DASS21$anx_mean, group = DASS21$engnat)
Cohen d statistic of difference between two means
     lower effect upper
[1,] -0.28  -0.16 -0.04

Multivariate (Mahalanobis) distance between groups
[1] 0.16
r equivalent of difference between two means
 data 
-0.08 
psych::cohen.d(DASS21$str_mean, group = DASS21$engnat)
Call: psych::cohen.d(x = DASS21$str_mean, group = DASS21$engnat)
Cohen d statistic of difference between two means
     lower effect upper
[1,] -0.38  -0.25 -0.13

Multivariate (Mahalanobis) distance between groups
[1] 0.25
r equivalent of difference between two means
 data 
-0.12