library(rio)
library(lavaan)
library(semTools)
library(semhelpinghands)
library(tidyverse)
7 Measurement Invariance Testing with Continuous Indicators
You can download the R code used in this lab by right-clicking this link and selecting “Save Link As…” in the drop-down menu: measurementinvariance_cont.R
7.1 Loading R packages
7.2 Loading Data
Load the data into your environment. For this lab we will use a dataset of N = 1000 individuals who completed the Depression-Anxiety-Stress Scales (DASS-21). For this lab, we will focus on the Anxiety and Stress subscales.
You can download the data by right-clicking this link and selecting “Save Link As…” in the drop-down menu: data/DASS21.csv. Make sure to save it in the folder you are using for this class.
<- import(file = "data/DASS21.csv")
DASS21
$engnat <- factor(DASS21$engnat,
DASS21levels = c("EnglishNative", "ELL"))
We will see if there is measurement invariance across participants whose native language is English (1) versus participants who have a different native language/are ELL (2). The table below shows the sample size in each group:
table(DASS21$engnat)
EnglishNative ELL
500 500
Here is an overview of the questions on the Anxiety subscale:
- q2: I was aware of dryness in my mouth
- q4: I experienced breathing difficulty (eg, excessively rapid breathing, breathlessness in the absence of physical exertion)
- q7: I experienced trembling (e.g., in the hands)
- q9: I was worried about situations in which I might panic and make a fool of myself.
- q15: I felt I was close to panic.
- q19: I was aware of the action of my heart in the absence of physical exertion (eg, sense of heart rate increase, heart missing a beat).
- q20: I felt scared without any good reason.
And the questions on the Stress subscale:
- q1: I found it hard to wind down.
- q6: I tended to over-react to situations.
- q8: I felt that I was using a lot of nervous energy.
- q11: I found myself getting agitated.
- q12: I found it difficult to relax.
- q14: I was intolerant of anything that kept me from getting on with what I was doing.
- q18: I felt that I was rather touchy.
7.3 Statistical Decision Criteria
For this lab, I will use the following statistical decision criteria for the model comparison tests between levels of invariance. These are based on the knowledge that our samples are relatively large (N = 500 per group), and are informed by the textbook and Chen (2007):
- Look at Chi-square Difference test.
- If not significant: retain next level of invariance
- If significant, look at other fit information
- Are there any patterns of problematic correlation residuals > |.10|? Reject next level of invariance, test partial invariance
- Does CFI decrease ≤ 0.010 or does RMSEA increase by ≥ 0.015? Reject next level of invariance, test partial invariance
7.4 Configural Invariance
We will specify a two-factor CFA using lavaan syntax. Based on previous research, this two-factor model already includes residual covariances between two pairs of items. What these items have in common is that they focus on physical symptoms of anxiety, which causes them to share more commmon variance with each other (especially with q4) than with the other anxiety items.
<- '
cfa_config anxiety =~ q2 + q4 + q7 + q9 + q15 + q19 + q20
stress =~ q1 + q6 + q8 + q11 + q12 + q14 + q18
q4 ~~ q19
q4 ~~ q7
'
We are going to use a function from the semTools
package that will help us set the scale for our latent factors and apply the equality constraint labels in later steps of the measurement invariance testing procedure. This saves us A LOT of typing, especially when moving towards partial invariance. As all indicators are responded to on the same response scale, will use the effect-coding method for scaling the latent factors. This will result in latent factors that have the same mean and variance as the average indicator.
<- measEq.syntax(configural.model = cfa_config,
fit.config data = DASS21,
group = "engnat",
ID.fac = "effects",
meanstructure = TRUE,
return.fit = TRUE,
estimator = "mlr")
# Print out the model syntax, so you can
# see what semTools is helping us do:
cat(as.character(fit.config@call$model))
## LOADINGS:
anxiety =~ c(NA, NA)*q2 + c(lambda.1_1.g1, lambda.1_1.g2)*q2
anxiety =~ c(NA, NA)*q4 + c(lambda.2_1.g1, lambda.2_1.g2)*q4
anxiety =~ c(NA, NA)*q7 + c(lambda.3_1.g1, lambda.3_1.g2)*q7
anxiety =~ c(NA, NA)*q9 + c(lambda.4_1.g1, lambda.4_1.g2)*q9
anxiety =~ c(NA, NA)*q15 + c(lambda.5_1.g1, lambda.5_1.g2)*q15
anxiety =~ c(NA, NA)*q19 + c(lambda.6_1.g1, lambda.6_1.g2)*q19
anxiety =~ c(NA, NA)*q20 + c(lambda.7_1.g1, lambda.7_1.g2)*q20
stress =~ c(NA, NA)*q1 + c(lambda.8_2.g1, lambda.8_2.g2)*q1
stress =~ c(NA, NA)*q6 + c(lambda.9_2.g1, lambda.9_2.g2)*q6
stress =~ c(NA, NA)*q8 + c(lambda.10_2.g1, lambda.10_2.g2)*q8
stress =~ c(NA, NA)*q11 + c(lambda.11_2.g1, lambda.11_2.g2)*q11
stress =~ c(NA, NA)*q12 + c(lambda.12_2.g1, lambda.12_2.g2)*q12
stress =~ c(NA, NA)*q14 + c(lambda.13_2.g1, lambda.13_2.g2)*q14
stress =~ c(NA, NA)*q18 + c(lambda.14_2.g1, lambda.14_2.g2)*q18
## INTERCEPTS:
q2 ~ c(NA, NA)*1 + c(nu.1.g1, nu.1.g2)*1
q4 ~ c(NA, NA)*1 + c(nu.2.g1, nu.2.g2)*1
q7 ~ c(NA, NA)*1 + c(nu.3.g1, nu.3.g2)*1
q9 ~ c(NA, NA)*1 + c(nu.4.g1, nu.4.g2)*1
q15 ~ c(NA, NA)*1 + c(nu.5.g1, nu.5.g2)*1
q19 ~ c(NA, NA)*1 + c(nu.6.g1, nu.6.g2)*1
q20 ~ c(NA, NA)*1 + c(nu.7.g1, nu.7.g2)*1
q1 ~ c(NA, NA)*1 + c(nu.8.g1, nu.8.g2)*1
q6 ~ c(NA, NA)*1 + c(nu.9.g1, nu.9.g2)*1
q8 ~ c(NA, NA)*1 + c(nu.10.g1, nu.10.g2)*1
q11 ~ c(NA, NA)*1 + c(nu.11.g1, nu.11.g2)*1
q12 ~ c(NA, NA)*1 + c(nu.12.g1, nu.12.g2)*1
q14 ~ c(NA, NA)*1 + c(nu.13.g1, nu.13.g2)*1
q18 ~ c(NA, NA)*1 + c(nu.14.g1, nu.14.g2)*1
## UNIQUE-FACTOR VARIANCES:
q2 ~~ c(NA, NA)*q2 + c(theta.1_1.g1, theta.1_1.g2)*q2
q4 ~~ c(NA, NA)*q4 + c(theta.2_2.g1, theta.2_2.g2)*q4
q7 ~~ c(NA, NA)*q7 + c(theta.3_3.g1, theta.3_3.g2)*q7
q9 ~~ c(NA, NA)*q9 + c(theta.4_4.g1, theta.4_4.g2)*q9
q15 ~~ c(NA, NA)*q15 + c(theta.5_5.g1, theta.5_5.g2)*q15
q19 ~~ c(NA, NA)*q19 + c(theta.6_6.g1, theta.6_6.g2)*q19
q20 ~~ c(NA, NA)*q20 + c(theta.7_7.g1, theta.7_7.g2)*q20
q1 ~~ c(NA, NA)*q1 + c(theta.8_8.g1, theta.8_8.g2)*q1
q6 ~~ c(NA, NA)*q6 + c(theta.9_9.g1, theta.9_9.g2)*q6
q8 ~~ c(NA, NA)*q8 + c(theta.10_10.g1, theta.10_10.g2)*q8
q11 ~~ c(NA, NA)*q11 + c(theta.11_11.g1, theta.11_11.g2)*q11
q12 ~~ c(NA, NA)*q12 + c(theta.12_12.g1, theta.12_12.g2)*q12
q14 ~~ c(NA, NA)*q14 + c(theta.13_13.g1, theta.13_13.g2)*q14
q18 ~~ c(NA, NA)*q18 + c(theta.14_14.g1, theta.14_14.g2)*q18
## UNIQUE-FACTOR COVARIANCES:
q4 ~~ c(NA, NA)*q7 + c(theta.3_2.g1, theta.3_2.g2)*q7
q4 ~~ c(NA, NA)*q19 + c(theta.6_2.g1, theta.6_2.g2)*q19
## LATENT MEANS/INTERCEPTS:
anxiety ~ c(NA, NA)*1 + c(alpha.1.g1, alpha.1.g2)*1
stress ~ c(NA, NA)*1 + c(alpha.2.g1, alpha.2.g2)*1
## COMMON-FACTOR VARIANCES:
anxiety ~~ c(NA, NA)*anxiety + c(psi.1_1.g1, psi.1_1.g2)*anxiety
stress ~~ c(NA, NA)*stress + c(psi.2_2.g1, psi.2_2.g2)*stress
## COMMON-FACTOR COVARIANCES:
anxiety ~~ c(NA, NA)*stress + c(psi.2_1.g1, psi.2_1.g2)*stress
## MODEL CONSTRAINTS:
lambda.1_1.g1 == 7 - lambda.2_1.g1 - lambda.3_1.g1 - lambda.4_1.g1 - lambda.5_1.g1 - lambda.6_1.g1 - lambda.7_1.g1
lambda.1_1.g2 == 7 - lambda.2_1.g2 - lambda.3_1.g2 - lambda.4_1.g2 - lambda.5_1.g2 - lambda.6_1.g2 - lambda.7_1.g2
nu.1.g1 == 7 - nu.2.g1 - nu.3.g1 - nu.4.g1 - nu.5.g1 - nu.6.g1 - nu.7.g1
nu.1.g2 == 7 - nu.2.g2 - nu.3.g2 - nu.4.g2 - nu.5.g2 - nu.6.g2 - nu.7.g2
lambda.8_2.g1 == 7 - lambda.9_2.g1 - lambda.10_2.g1 - lambda.11_2.g1 - lambda.12_2.g1 - lambda.13_2.g1 - lambda.14_2.g1
lambda.8_2.g2 == 7 - lambda.9_2.g2 - lambda.10_2.g2 - lambda.11_2.g2 - lambda.12_2.g2 - lambda.13_2.g2 - lambda.14_2.g2
nu.8.g1 == 7 - nu.9.g1 - nu.10.g1 - nu.11.g1 - nu.12.g1 - nu.13.g1 - nu.14.g1
nu.8.g2 == 7 - nu.9.g2 - nu.10.g2 - nu.11.g2 - nu.12.g2 - nu.13.g2 - nu.14.g2
To see if the configural model is tenable, we will first inspect indices of global (exact and approximate) fit:
summary(fit.config, fit.measures = T, estimates = F)
lavaan 0.6-19 ended normally after 55 iterations
Estimator ML
Optimization method NLMINB
Number of model parameters 98
Number of equality constraints 8
Number of observations per group:
EnglishNative 500
ELL 500
Model Test User Model:
Standard Scaled
Test Statistic 373.938 315.353
Degrees of freedom 148 148
P-value (Chi-square) 0.000 0.000
Scaling correction factor 1.186
Yuan-Bentler correction (Mplus variant)
Test statistic for each group:
EnglishNative 189.027 189.027
ELL 126.326 126.326
Model Test Baseline Model:
Test statistic 6539.400 5341.435
Degrees of freedom 182 182
P-value 0.000 0.000
Scaling correction factor 1.224
User Model versus Baseline Model:
Comparative Fit Index (CFI) 0.964 0.968
Tucker-Lewis Index (TLI) 0.956 0.960
Robust Comparative Fit Index (CFI) 0.969
Robust Tucker-Lewis Index (TLI) 0.961
Loglikelihood and Information Criteria:
Loglikelihood user model (H0) -17763.167 -17763.167
Scaling correction factor 0.893
for the MLR correction
Loglikelihood unrestricted model (H1) -17576.198 -17576.198
Scaling correction factor 1.105
for the MLR correction
Akaike (AIC) 35706.333 35706.333
Bayesian (BIC) 36148.031 36148.031
Sample-size adjusted Bayesian (SABIC) 35862.186 35862.186
Root Mean Square Error of Approximation:
RMSEA 0.055 0.048
90 Percent confidence interval - lower 0.048 0.041
90 Percent confidence interval - upper 0.062 0.054
P-value H_0: RMSEA <= 0.050 0.104 0.719
P-value H_0: RMSEA >= 0.080 0.000 0.000
Robust RMSEA 0.052
90 Percent confidence interval - lower 0.044
90 Percent confidence interval - upper 0.060
P-value H_0: Robust RMSEA <= 0.050 0.345
P-value H_0: Robust RMSEA >= 0.080 0.000
Standardized Root Mean Square Residual:
SRMR 0.032 0.032
The Chi-square test is significant, but the approximate fit indices look OK. Our sample is relatively large, so we will look at the local fit indices to see if any remaining misfit is trivial, or if there are major issues:
residuals(fit.config, type = "cor.bollen")
$EnglishNative
$EnglishNative$type
[1] "cor.bollen"
$EnglishNative$cov
q2 q4 q7 q9 q15 q19 q20 q1 q6 q8
q2 0.000
q4 0.081 0.000
q7 0.059 0.020 0.000
q9 -0.023 -0.056 0.015 0.000
q15 -0.053 0.019 -0.011 -0.004 0.000
q19 0.057 0.003 0.076 -0.037 -0.026 0.000
q20 -0.051 0.023 -0.031 0.039 0.008 -0.023 0.000
q1 -0.010 -0.050 -0.036 -0.018 0.003 0.019 -0.014 0.000
q6 0.001 -0.044 -0.035 0.040 0.011 -0.008 -0.023 -0.033 0.000
q8 0.032 0.019 0.103 0.060 0.092 0.045 0.076 -0.028 -0.024 0.000
q11 0.018 -0.051 -0.073 0.009 -0.024 -0.023 -0.005 -0.012 0.056 -0.034
q12 -0.023 -0.012 -0.050 -0.035 0.022 -0.024 -0.043 0.092 -0.013 -0.006
q14 0.065 -0.030 -0.001 -0.013 -0.031 0.033 -0.017 0.031 0.001 -0.089
q18 0.051 0.021 0.002 0.019 -0.048 0.016 -0.003 -0.037 0.054 -0.021
q11 q12 q14 q18
q2
q4
q7
q9
q15
q19
q20
q1
q6
q8
q11 0.000
q12 -0.017 0.000
q14 0.096 -0.027 0.000
q18 0.006 -0.011 0.036 0.000
$EnglishNative$mean
q2 q4 q7 q9 q15 q19 q20 q1 q6 q8 q11 q12 q14 q18
0 0 0 0 0 0 0 0 0 0 0 0 0 0
$ELL
$ELL$type
[1] "cor.bollen"
$ELL$cov
q2 q4 q7 q9 q15 q19 q20 q1 q6 q8
q2 0.000
q4 0.076 0.000
q7 0.035 0.009 0.000
q9 -0.015 -0.032 -0.027 0.000
q15 -0.040 0.017 0.011 0.035 0.000
q19 0.030 0.007 0.041 -0.073 0.008 0.000
q20 0.002 -0.048 -0.017 0.017 0.009 -0.040 0.000
q1 -0.020 -0.011 -0.029 -0.055 -0.045 0.007 -0.023 0.000
q6 0.003 0.001 -0.021 -0.005 -0.019 -0.013 0.004 0.038 0.000
q8 0.037 0.038 0.078 0.077 0.066 0.069 0.037 -0.055 -0.047 0.000
q11 0.030 -0.021 -0.058 -0.008 -0.016 0.018 0.028 0.032 -0.024 -0.014
q12 -0.034 -0.001 -0.028 -0.023 -0.022 0.037 0.001 0.066 -0.012 -0.008
q14 -0.020 -0.002 -0.005 0.002 -0.053 -0.020 -0.009 0.027 0.032 -0.006
q18 0.015 0.006 0.019 0.023 -0.047 0.044 0.020 -0.036 0.086 -0.036
q11 q12 q14 q18
q2
q4
q7
q9
q15
q19
q20
q1
q6
q8
q11 0.000
q12 -0.001 0.000
q14 -0.002 -0.001 0.000
q18 0.025 -0.034 0.004 0.000
$ELL$mean
q2 q4 q7 q9 q15 q19 q20 q1 q6 q8 q11 q12 q14 q18
0 0 0 0 0 0 0 0 0 0 0 0 0 0
Overall, the correlation residuals look good. Only the correlation residual of q7
and q8
was > .1 (it was .103) in the English native speaker group. There is no a priori justification why these two items, which load onto different factors, share an omitted cause (that could be accommodated via a residual covariance). In addition, the correlation residual was relatively close to .1, indicating that adding the residual covariance would likely only have a small impact on global model fit. Based on these results, we will retain the configural invariance model, and examine weak (metric) invariance.
Parameter Estimate Interpretation
Using the function below, we can compare the estimates across groups and see that all the measurement-related parameters are still varying across groups (as we have not included any equality constraints yet).
group_by_groups(fit.config)
lhs op rhs est_EnglishNative est_ELL
1 anxiety =~ q15 1.183 1.207
2 anxiety =~ q19 0.902 0.964
3 anxiety =~ q2 0.781 0.817
4 anxiety =~ q20 1.102 1.162
5 anxiety =~ q4 1.015 0.904
6 anxiety =~ q7 0.935 0.906
7 anxiety =~ q9 1.082 1.041
8 stress =~ q1 1.063 0.996
9 stress =~ q11 0.997 1.011
10 stress =~ q12 1.097 1.090
11 stress =~ q14 0.834 0.927
12 stress =~ q18 0.893 0.819
13 stress =~ q6 1.037 1.052
14 stress =~ q8 1.081 1.105
15 anxiety ~~ anxiety 0.560 0.494
16 anxiety ~~ stress 0.508 0.454
17 q1 ~~ q1 0.503 0.434
18 q11 ~~ q11 0.541 0.501
19 q12 ~~ q12 0.483 0.499
20 q14 ~~ q14 0.695 0.568
21 q15 ~~ q15 0.400 0.403
22 q18 ~~ q18 0.804 0.749
23 q19 ~~ q19 0.817 0.755
24 q2 ~~ q2 1.036 0.886
25 q20 ~~ q20 0.616 0.486
26 q4 ~~ q19 0.210 0.182
27 q4 ~~ q4 0.636 0.672
28 q4 ~~ q7 0.028 0.136
29 q6 ~~ q6 0.538 0.562
30 q7 ~~ q7 0.649 0.684
31 q8 ~~ q8 0.571 0.476
32 q9 ~~ q9 0.660 0.604
33 stress ~~ stress 0.569 0.505
34 anxiety ~1 1.308 1.180
35 q1 ~1 0.880 0.886
36 q11 ~1 1.135 0.922
37 q12 ~1 0.937 0.905
38 q14 ~1 1.043 1.029
39 q15 ~1 0.879 0.662
40 q18 ~1 1.094 1.439
41 q19 ~1 1.117 1.047
42 q2 ~1 1.121 1.312
43 q20 ~1 0.925 0.900
44 q4 ~1 0.764 0.836
45 q6 ~1 1.038 1.005
46 q7 ~1 0.869 0.851
47 q8 ~1 0.874 0.814
48 q9 ~1 1.324 1.392
49 stress ~1 1.549 1.351
7.5 Weak (Metric) Invariance
This is where the semTools
function really comes in handy. All we need to do to estimate the Weak Invariance model is change the arguments of the function a little bit:
<- measEq.syntax(configural.model = cfa_config,
fit.weak data = DASS21,
group = "engnat",
ID.fac = "effects",
meanstructure = TRUE,
return.fit = TRUE,
estimator = "mlr",
group.equal = "loadings")
# Print out the model syntax, so you can
# see what semTools is helping us do:
# cat(as.character(fit.weak@call$model))
In the model syntax above (if you remove the # and run the line of code), you can see that each factor loading now has the same label across the two groups, ensuring that they are constraint to be equal.
Model Comparison
To test if we can retain the Weak Invariance model, we compare its fit to that of the Configural Invariance model:
<- compareFit(fit.config, fit.weak)
comp_12 summary(comp_12)
################### Nested Model Comparison #########################
Scaled Chi-Squared Difference Test (method = "satorra.bentler.2001")
lavaan->unknown():
lavaan NOTE: The "Chisq" column contains standard test statistics, not the
robust test that should be reported per model. A robust difference test is
a function of two standard (not robust) statistics.
Df AIC BIC Chisq Chisq diff Df diff Pr(>Chisq)
fit.config 148 35706 36148 373.94
fit.weak 160 35691 36074 382.85 11.021 12 0.5271
####################### Model Fit Indices ###########################
chisq.scaled df.scaled pvalue.scaled rmsea.robust cfi.robust
fit.config 315.353† 148 .000 .052 .969
fit.weak 330.756 160 .000 .050† .969†
tli.robust srmr aic bic
fit.config .961 .032† 35706.333 36148.031
fit.weak .964† .036 35691.248† 36074.053†
################## Differences in Fit Indices #######################
df.scaled rmsea.robust cfi.robust tli.robust srmr
fit.weak - fit.config 12 -0.002 0 0.003 0.004
aic bic
fit.weak - fit.config -15.086 -73.979
What does the Chi-square difference test result tell us?
Parameter Estimate Interpretation
Using the function below, we can compare the estimates across groups and see that all the loadings are now exactly the same (because we constrained them to be). Other parameters are still varying across groups.
group_by_groups(fit.weak)
lhs op rhs est_EnglishNative est_ELL
1 anxiety =~ q15 1.192 1.192
2 anxiety =~ q19 0.930 0.930
3 anxiety =~ q2 0.798 0.798
4 anxiety =~ q20 1.130 1.130
5 anxiety =~ q4 0.964 0.964
6 anxiety =~ q7 0.925 0.925
7 anxiety =~ q9 1.061 1.061
8 stress =~ q1 1.028 1.028
9 stress =~ q11 1.004 1.004
10 stress =~ q12 1.093 1.093
11 stress =~ q14 0.882 0.882
12 stress =~ q18 0.857 0.857
13 stress =~ q6 1.044 1.044
14 stress =~ q8 1.093 1.093
15 anxiety ~~ anxiety 0.559 0.499
16 anxiety ~~ stress 0.508 0.456
17 q1 ~~ q1 0.508 0.430
18 q11 ~~ q11 0.539 0.502
19 q12 ~~ q12 0.486 0.498
20 q14 ~~ q14 0.692 0.574
21 q15 ~~ q15 0.400 0.403
22 q18 ~~ q18 0.808 0.747
23 q19 ~~ q19 0.814 0.758
24 q2 ~~ q2 1.036 0.886
25 q20 ~~ q20 0.613 0.491
26 q4 ~~ q19 0.211 0.179
27 q4 ~~ q4 0.645 0.666
28 q4 ~~ q7 0.033 0.132
29 q6 ~~ q6 0.537 0.562
30 q7 ~~ q7 0.651 0.681
31 q8 ~~ q8 0.570 0.479
32 q9 ~~ q9 0.663 0.601
33 stress ~~ stress 0.569 0.505
34 anxiety ~1 1.308 1.180
35 q1 ~1 0.934 0.842
36 q11 ~1 1.123 0.931
37 q12 ~1 0.944 0.901
38 q14 ~1 0.968 1.090
39 q15 ~1 0.867 0.679
40 q18 ~1 1.149 1.388
41 q19 ~1 1.080 1.087
42 q2 ~1 1.099 1.335
43 q20 ~1 0.888 0.937
44 q4 ~1 0.831 0.765
45 q6 ~1 1.027 1.015
46 q7 ~1 0.882 0.828
47 q8 ~1 0.856 0.831
48 q9 ~1 1.353 1.369
49 stress ~1 1.549 1.351
When we retain the Weak Invariance model, we can conclude that the measured concepts (Anxiety and Stress) have the same meaning across groups, because the item-factor associations can be considered equivalent (the same amount of common variance is extracted across groups).
7.6 Strong (Scalar) Invariance
Again, all we need to do to estimate the Weak Invariance model is change the arguments of the function a little bit:
<- measEq.syntax(configural.model = cfa_config,
fit.strong data = DASS21,
group = "engnat",
ID.fac = "effects",
meanstructure = TRUE,
return.fit = TRUE,
estimator = "mlr",
group.equal = c("loadings", "intercepts"))
# Print out the model syntax, so you can
# see what semTools is helping us do:
# cat(as.character(fit.strong@call$model))
Model Comparison
To test if we can retain the Strong Invariance model, we compare its fit to that of the Weak Invariance model:
<- compareFit(fit.weak, fit.strong)
comp_23 summary(comp_23)
################### Nested Model Comparison #########################
Scaled Chi-Squared Difference Test (method = "satorra.bentler.2001")
lavaan->unknown():
lavaan NOTE: The "Chisq" column contains standard test statistics, not the
robust test that should be reported per model. A robust difference test is
a function of two standard (not robust) statistics.
Df AIC BIC Chisq Chisq diff Df diff Pr(>Chisq)
fit.weak 160 35691 36074 382.85
fit.strong 172 35748 36072 463.60 81.276 12 2.356e-12 ***
---
Signif. codes: 0 '***' 0.001 '**' 0.01 '*' 0.05 '.' 0.1 ' ' 1
####################### Model Fit Indices ###########################
chisq.scaled df.scaled pvalue.scaled rmsea.robust cfi.robust
fit.weak 330.756† 160 .000 .050† .969†
fit.strong 404.514 172 .000 .056 .958
tli.robust srmr aic bic
fit.weak .964† .036† 35691.248† 36074.053
fit.strong .955 .042 35747.993 36071.905†
################## Differences in Fit Indices #######################
df.scaled rmsea.robust cfi.robust tli.robust srmr aic
fit.strong - fit.weak 12 0.006 -0.011 -0.009 0.006 56.745
bic
fit.strong - fit.weak -2.148
What does the Chi-square difference test result tell us?
Local Fit Comparison
Since the fit worsened significantly when testing Strong Invariance, we will focus on problematic residuals in the mean vectors, since constraining the intercepts will affect model fit by changing the model implied means.
residuals(fit.strong, type = "cor.bollen")$EnglishNative$mean
q2 q4 q7 q9 q15 q19 q20 q1 q6 q8 q11
-0.119 0.014 0.009 -0.023 0.067 -0.017 -0.042 0.032 -0.006 -0.002 0.080
q12 q14 q18
0.006 -0.074 -0.118
residuals(fit.strong, type = "cor.bollen")$ELL$mean
q2 q4 q7 q9 q15 q19 q20 q1 q6 q8 q11
0.108 -0.018 -0.012 0.022 -0.069 0.017 0.035 -0.030 0.007 0.002 -0.078
q12 q14 q18
-0.007 0.064 0.116
You will notice that mean residuals for a specific indicator are negative for one group and positive for the other. When we constrain intercepts to be equal across groups, the estimate tends to find a compromise between the two group intercepts. Thus, one group will always have an observed mean that is above that estimate while the other has an observed mean that is below that estimate.
Which intercepts appear to be different across groups?
7.7 Partial Strong Invariance (Round 1)
Since the model fit indicated some issues with retaining the Strong Invariance model, we will now examine Partial Strong Invariance. Remember what we already saw in the local fit evaluation of the mean residuals. Next, we will use the Backwards MI approach, using modification indices to evaluate which equality constraint would improve fit most if removed from the model. When working with equality constraints, we cannot use the modificationindices()
function. Instead, we need to use the lavTestScore()
function. The output provides the same information, but in a less accessible format. In the code below, I am excluding the first four rows of output (by asking for 5:32
) to hide the effect-coding equality constraints, which take up a lot of space. You can omit the [5:32,]
part.
lavTestScore(fit.strong)$uni[5:32,]
univariate score tests:
lhs op rhs X2 df p.value
5 .p1. == .p50. 17.694 1 0.000
6 .p2. == .p51. 1.809 1 0.179
7 .p3. == .p52. 0.150 1 0.698
8 .p4. == .p53. 0.708 1 0.400
9 .p5. == .p54. 16.959 1 0.000
10 .p6. == .p55. 1.459 1 0.227
11 .p7. == .p56. 4.825 1 0.028
12 .p8. == .p57. 3.574 1 0.059
13 .p9. == .p58. 0.147 1 0.701
14 .p10. == .p59. 0.035 1 0.852
15 .p11. == .p60. 13.575 1 0.000
16 .p12. == .p61. 0.131 1 0.717
17 .p13. == .p62. 10.311 1 0.001
18 .p14. == .p63. 18.197 1 0.000
19 .p15. == .p64. 18.309 1 0.000
20 .p16. == .p65. 0.738 1 0.390
21 .p17. == .p66. 0.116 1 0.734
22 .p18. == .p67. 1.210 1 0.271
23 .p19. == .p68. 19.397 1 0.000
24 .p20. == .p69. 0.813 1 0.367
25 .p21. == .p70. 4.146 1 0.042
26 .p22. == .p71. 2.654 1 0.103
27 .p23. == .p72. 0.109 1 0.741
28 .p24. == .p73. 0.007 1 0.932
29 .p25. == .p74. 15.167 1 0.000
30 .p26. == .p75. 0.126 1 0.723
31 .p27. == .p76. 8.698 1 0.003
32 .p28. == .p77. 22.118 1 0.000
Releasing the equality constraint of .p28 and .p77 is associated with the largest expected change in the model Chi-square test. But what are these parameters? We can add some arguments to the lavTestScore()
function to get a table with the expected parameter changes for released constraints, which has more information about the specific parameters. I will filter out only the intercepts so the output is not too long:
lavTestScore(fit.strong, epc = TRUE, standardized = FALSE)$epc %>%
filter(str_detect(label, "nu."))
Warning: lavaan->lavTestScore():
se is not `standard'; not implemented yet; falling back to ordinary score
test
expected parameter changes (epc) and expected parameter values (epv):
lhs op rhs block group free label plabel est epc epv
1 q2 ~1 1 1 15 nu.1 .p15. 0.474 -0.136 0.339
2 q4 ~1 1 1 16 nu.2 .p16. -0.180 -0.083 -0.263
3 q7 ~1 1 1 17 nu.3 .p17. -0.082 0.005 -0.077
4 q9 ~1 1 1 18 nu.4 .p18. 0.301 -0.074 0.226
5 q15 ~1 1 1 19 nu.5 .p19. -0.457 0.147 -0.310
6 q19 ~1 1 1 20 nu.6 .p20. 0.152 0.054 0.205
7 q20 ~1 1 1 21 nu.7 .p21. -0.207 0.021 -0.186
8 q1 ~1 1 1 22 nu.8 .p22. -0.171 -0.025 -0.196
9 q6 ~1 1 1 23 nu.9 .p23. -0.027 0.018 -0.009
10 q8 ~1 1 1 24 nu.10 .p24. -0.260 0.039 -0.221
11 q11 ~1 1 1 25 nu.11 .p25. -0.031 0.162 0.131
12 q12 ~1 1 1 26 nu.12 .p26. -0.187 0.018 -0.169
13 q14 ~1 1 1 27 nu.13 .p27. 0.189 0.017 0.205
14 q18 ~1 1 1 28 nu.14 .p28. 0.488 -0.292 0.196
15 q2 ~1 2 2 64 nu.1 .p64. 0.474 0.024 0.498
16 q4 ~1 2 2 65 nu.2 .p65. -0.180 0.116 -0.064
17 q7 ~1 2 2 66 nu.3 .p66. -0.082 0.031 -0.050
18 q9 ~1 2 2 67 nu.4 .p67. 0.301 0.055 0.356
19 q15 ~1 2 2 68 nu.5 .p68. -0.457 -0.081 -0.538
20 q19 ~1 2 2 69 nu.6 .p69. 0.152 -0.066 0.085
21 q20 ~1 2 2 70 nu.7 .p70. -0.207 -0.048 -0.255
22 q1 ~1 2 2 71 nu.8 .p71. -0.171 0.065 -0.107
23 q6 ~1 2 2 72 nu.9 .p72. -0.027 -0.019 -0.046
24 q8 ~1 2 2 73 nu.10 .p73. -0.260 -0.029 -0.289
25 q11 ~1 2 2 74 nu.11 .p74. -0.031 -0.053 -0.084
26 q12 ~1 2 2 75 nu.12 .p75. -0.187 0.000 -0.187
27 q14 ~1 2 2 76 nu.13 .p76. 0.189 -0.085 0.104
28 q18 ~1 2 2 77 nu.14 .p77. 0.488 0.139 0.627
From this table, we can see that .p28 and .p77 are the parameter labels for the intercept of q18
, which also had a larger correlation residual. This question is worded as follows: ‘I felt that I was rather touchy’. The output above tells us something about what the intercept for this item would be if it were freely estimated across groups (in the epv
[expected parameter value] column).
From this information, we can see that the intercept for English Native speakers would decrease from .488 to .196. This indicates that, for English Native speakers with a true value of 0 on Stress, their expected response to this item is close to 0 (or the average intercept across all indicators of Stress).
In contrast, for the ELL group, we can see that the intercept would increase from .488 to .627. This indicates that, for ELL speakers with a true value of 0 on Stress, their expected response to this item is further away from 0 (so higher that the average intercept of all indicators of Stress).
In other words, when English Native Speakers and ELL Speakers experience the same underlying levels of stress, ELL Speakers respond with a higher response option to ‘I felt that I was rather touchy’.
To test if releasing this constraint improves the fit of the Partial Strong Model sufficiently, we need to estimate the partial model. To do so, we can use the group.partial
argument in the measEq.syntax()
function:
<- measEq.syntax(configural.model = cfa_config,
fit.partial1 data = DASS21,
group = "engnat",
ID.fac = "effects",
meanstructure = TRUE,
return.fit = TRUE,
estimator = "mlr",
group.equal = c("loadings", "intercepts"),
group.partial = "q18 ~ 1")
# Print out the model syntax, so you can
# see what semTools is helping us do:
# cat(as.character(fit.partial1@call$model))
Model Comparison
Let’s compare the fit of this model to the Weak Invariance model
<- compareFit(fit.weak, fit.partial1)
comp_23b summary(comp_23b)
################### Nested Model Comparison #########################
Scaled Chi-Squared Difference Test (method = "satorra.bentler.2001")
lavaan->unknown():
lavaan NOTE: The "Chisq" column contains standard test statistics, not the
robust test that should be reported per model. A robust difference test is
a function of two standard (not robust) statistics.
Df AIC BIC Chisq Chisq diff Df diff Pr(>Chisq)
fit.weak 160 35691 36074 382.85
fit.partial1 171 35728 36056 441.15 58.446 11 1.8e-08 ***
---
Signif. codes: 0 '***' 0.001 '**' 0.01 '*' 0.05 '.' 0.1 ' ' 1
####################### Model Fit Indices ###########################
chisq.scaled df.scaled pvalue.scaled rmsea.robust cfi.robust
fit.weak 330.756† 160 .000 .050† .969†
fit.partial1 384.540 171 .000 .054 .961
tli.robust srmr aic bic
fit.weak .964† .036† 35691.248† 36074.053
fit.partial1 .959 .041 35727.542 36056.361†
################## Differences in Fit Indices #######################
df.scaled rmsea.robust cfi.robust tli.robust srmr
fit.partial1 - fit.weak 11 0.004 -0.007 -0.006 0.004
aic bic
fit.partial1 - fit.weak 36.294 -17.692
What do the results indicate?
Local Fit Evaluation
Since the fit worsened significantly when testing Strong Invariance, we will focus on problematic residuals in the mean vectors.
residuals(fit.partial1, type = "cor.bollen")$EnglishNative$mean
q2 q4 q7 q9 q15 q19 q20 q1 q6 q8 q11
-0.119 0.014 0.009 -0.023 0.067 -0.017 -0.041 0.021 -0.017 -0.013 0.069
q12 q14 q18
-0.004 -0.083 0.000
residuals(fit.partial1, type = "cor.bollen")$ELL$mean
q2 q4 q7 q9 q15 q19 q20 q1 q6 q8 q11
0.108 -0.018 -0.013 0.022 -0.069 0.017 0.035 -0.019 0.018 0.012 -0.067
q12 q14 q18
0.005 0.072 0.000
What do you think the next indicator is that we need to focus on?
7.8 Partial Strong Invariance (Round 2)
Since the model fit indicated some remaining issues with retaining the Partial Strong Invariance model, we will do another round of partial invariance testing:
lavTestScore(fit.partial1)$uni[5:31,]
Warning: lavaan->lavTestScore():
se is not `standard'; not implemented yet; falling back to ordinary score
test
univariate score tests:
lhs op rhs X2 df p.value
5 .p1. == .p50. 17.693 1 0.000
6 .p2. == .p51. 1.813 1 0.178
7 .p3. == .p52. 0.151 1 0.697
8 .p4. == .p53. 0.706 1 0.401
9 .p5. == .p54. 16.953 1 0.000
10 .p6. == .p55. 1.459 1 0.227
11 .p7. == .p56. 4.822 1 0.028
12 .p8. == .p57. 1.808 1 0.179
13 .p9. == .p58. 0.825 1 0.364
14 .p10. == .p59. 0.567 1 0.451
15 .p11. == .p60. 10.119 1 0.001
16 .p12. == .p61. 0.049 1 0.825
17 .p13. == .p62. 12.957 1 0.000
18 .p14. == .p63. 0.903 1 0.342
19 .p15. == .p64. 18.311 1 0.000
20 .p16. == .p65. 0.739 1 0.390
21 .p17. == .p66. 0.116 1 0.733
22 .p18. == .p67. 1.208 1 0.272
23 .p19. == .p68. 19.398 1 0.000
24 .p20. == .p69. 0.812 1 0.367
25 .p21. == .p70. 4.142 1 0.042
26 .p22. == .p71. 1.116 1 0.291
27 .p23. == .p72. 0.761 1 0.383
28 .p24. == .p73. 0.448 1 0.503
29 .p25. == .p74. 11.478 1 0.001
30 .p26. == .p75. 0.063 1 0.802
31 .p27. == .p76. 11.226 1 0.001
Releasing the equality constraint of .p19 and .p68 is associated with the largest expected change in the model Chi-square test (19.398), followed closely by the constraint of .p15 and .p64 (18.311). Let’s look at the alternative lavTestScore()
output to find out what intercepts these are.
lavTestScore(fit.partial1, epc = TRUE, standardized = FALSE)$epc %>%
filter(str_detect(label, "nu."))
Warning: lavaan->lavTestScore():
se is not `standard'; not implemented yet; falling back to ordinary score
test
expected parameter changes (epc) and expected parameter values (epv):
lhs op rhs block group free label plabel est epc epv
1 q2 ~1 1 1 15 nu.1 .p15. 0.474 -0.136 0.338
2 q4 ~1 1 1 16 nu.2 .p16. -0.181 -0.083 -0.263
3 q7 ~1 1 1 17 nu.3 .p17. -0.082 0.005 -0.077
4 q9 ~1 1 1 18 nu.4 .p18. 0.300 -0.074 0.226
5 q15 ~1 1 1 19 nu.5 .p19. -0.456 0.147 -0.309
6 q19 ~1 1 1 20 nu.6 .p20. 0.151 0.054 0.205
7 q20 ~1 1 1 21 nu.7 .p21. -0.207 0.022 -0.186
8 q1 ~1 1 1 22 nu.8 .p22. -0.089 -0.048 -0.137
9 q6 ~1 1 1 23 nu.9 .p23. 0.054 -0.008 0.046
10 q8 ~1 1 1 24 nu.10 .p24. -0.171 0.010 -0.161
11 q11 ~1 1 1 25 nu.11 .p25. 0.044 0.139 0.184
12 q12 ~1 1 1 26 nu.12 .p26. -0.099 -0.010 -0.109
13 q14 ~1 1 1 27 nu.13 .p27. 0.262 -0.012 0.249
14 q18 ~1 1 1 28 nu.14.g1 .p28. 0.327 -0.088 0.239
15 q2 ~1 2 2 64 nu.1 .p64. 0.474 0.024 0.498
16 q4 ~1 2 2 65 nu.2 .p65. -0.181 0.116 -0.064
17 q7 ~1 2 2 66 nu.3 .p66. -0.082 0.031 -0.050
18 q9 ~1 2 2 67 nu.4 .p67. 0.300 0.055 0.356
19 q15 ~1 2 2 68 nu.5 .p68. -0.456 -0.081 -0.537
20 q19 ~1 2 2 69 nu.6 .p69. 0.151 -0.066 0.085
21 q20 ~1 2 2 70 nu.7 .p70. -0.207 -0.048 -0.255
22 q1 ~1 2 2 71 nu.8 .p71. -0.089 0.068 -0.021
23 q6 ~1 2 2 72 nu.9 .p72. 0.054 -0.012 0.042
24 q8 ~1 2 2 73 nu.10 .p73. -0.171 -0.022 -0.194
25 q11 ~1 2 2 74 nu.11 .p74. 0.044 -0.044 0.000
26 q12 ~1 2 2 75 nu.12 .p75. -0.099 0.007 -0.093
27 q14 ~1 2 2 76 nu.13 .p76. 0.262 -0.078 0.183
28 q18 ~1 2 2 77 nu.14.g2 .p77. 0.607 0.085 0.692
From this table, we can see that .p19 and .p68 are the parameter labels for the intercept of q15
. This question did not necessarily have a large mean residual (that was q2
, which is associated with the second largest drop in Chi Square based on the table above). However, if we look at the epv
column, we can see that the intercepts are expected to be more different across groups for q15
compared to q2
.
Item q15
is worded as follows: ‘I felt I was close to panic’, and if released, the intercept would be lower for ELL speakers than for English Native speakers. This means that at a true level of 0 on Anxiety, ELL report less feelings of ‘panic’ compared to English Native speakers.
Item q2
is worded as follows: ‘I was aware of dryness in my mouth’, and if released, the intercept would be higher for ELL speakers than for English Native speakers. This means that at a true level of 0 on Anxiety, ELL report more experience of ‘dry mouth’ than English Native speakers.
I don’t have much theoretical justification for freeing either of these intercepts. Maybe they are both kind of strange statements for ELL speakers? We are definitely going into a more exploratory mode of invariance testing here. Given the fact that the expected change in fit between these two options is very similar, but the larger mean residual was associated with q2
, I’ve decided to release the intercept of q2
next. Note that you might make a different decision.
To test if releasing this constraint improves the fit of the Partial Strong Model sufficiently, we need to estimate a second partial model. To do so, we can use the group.partial
argument in the measEq.syntax()
function and add this second intercept:
<- measEq.syntax(configural.model = cfa_config,
fit.partial2 data = DASS21,
group = "engnat",
ID.fac = "effects",
meanstructure = TRUE,
return.fit = TRUE,
estimator = "mlr",
group.equal = c("loadings", "intercepts"),
group.partial = c("q18~1", "q2~1"))
# Print out the model syntax, so you can
# see what semTools is helping us do:
# cat(as.character(fit.partial2@call$model))
Let’s compare the fit of this model to the Weak Invariance model
<- compareFit(fit.weak, fit.partial2)
comp_23c summary(comp_23c)
################### Nested Model Comparison #########################
Scaled Chi-Squared Difference Test (method = "satorra.bentler.2001")
lavaan->unknown():
lavaan NOTE: The "Chisq" column contains standard test statistics, not the
robust test that should be reported per model. A robust difference test is
a function of two standard (not robust) statistics.
Df AIC BIC Chisq Chisq diff Df diff Pr(>Chisq)
fit.weak 160 35691 36074 382.85
fit.partial2 170 35711 36045 422.62 39.744 10 1.88e-05 ***
---
Signif. codes: 0 '***' 0.001 '**' 0.01 '*' 0.05 '.' 0.1 ' ' 1
####################### Model Fit Indices ###########################
chisq.scaled df.scaled pvalue.scaled rmsea.robust cfi.robust
fit.weak 330.756† 160 .000 .050† .969†
fit.partial2 368.046 170 .000 .052 .964
tli.robust srmr aic bic
fit.weak .964† .036† 35691.248† 36074.053
fit.partial2 .961 .039 35711.011 36044.738†
################## Differences in Fit Indices #######################
df.scaled rmsea.robust cfi.robust tli.robust srmr
fit.partial2 - fit.weak 10 0.002 -0.005 -0.003 0.003
aic bic
fit.partial2 - fit.weak 19.763 -29.314
What do the results indicate?
Local Fit Evaluation
Again, we will focus on the residuals of the mean vectors.
residuals(fit.partial2, type = "cor.bollen")$EnglishNative$mean
q2 q4 q7 q9 q15 q19 q20 q1 q6 q8 q11
0.000 0.006 0.001 -0.032 0.056 -0.025 -0.052 0.021 -0.017 -0.013 0.069
q12 q14 q18
-0.004 -0.083 0.000
residuals(fit.partial2, type = "cor.bollen")$ELL$mean
q2 q4 q7 q9 q15 q19 q20 q1 q6 q8 q11
0.000 -0.008 -0.003 0.032 -0.058 0.025 0.044 -0.019 0.018 0.012 -0.067
q12 q14 q18
0.005 0.072 0.000
This time, we don’t see any problems with the mean vector in terms of local fit.
CFI and RMSEA Comparison
Next, we will examine if the CFI and RMSEA change to a lesser extent than our criteria (decrease in CFI ≤ 0.010, increase in RMSEA ≤ 0.015):
# Change in CFI
cat(paste0("Change in CFI: ", comp_23c@fit.diff$cfi.robust))
Change in CFI: -0.00471114055415356
# Change in RMSEA
cat(paste0("Change in RMSEA: ", comp_23c@fit.diff$rmsea.robust))
Change in RMSEA: 0.00201878813551178
These are much smaller than our cutoff criteria (see statistical decision criteria above). They are also smaller than other suggested criteria (e.g., decrease in CFI ≤ 0.002 or increase in RMSEA ≤ 0.010). Based on this information, I will retain this partial strong invariant model with two freely estimated intercepts.
7.9 Parameter Comparison Across Groups
Now that we have finalized the invariance testing procedure (note that you could test higher levels of invariance, but that these offer few practical benefits), we can continue to compare latent factor variances, covariances, and means across groups.
Factor Variances
First, we will test if the factor variances of the Anxiety and Stress factors are equivalent across groups. We will do so by adding lv.variances
to the group.equal
argument of the previous partial strong invariance model.
<- measEq.syntax(configural.model = cfa_config,
fit.partial2.lv data = DASS21,
group = "engnat",
ID.fac = "effects",
meanstructure = TRUE,
return.fit = TRUE,
estimator = "mlr",
group.equal = c("loadings", "intercepts",
"lv.variances"),
group.partial = c("q18~1", "q2~1"))
# Print out the model syntax, so you can
# see what semTools is helping us do:
# cat(as.character(fit.partial2.lv@call$model))
We can now compare the fit of this model to the previous partial strong invariance model:
<- compareFit(fit.partial2, fit.partial2.lv)
comp_3c4 summary(comp_3c4)
################### Nested Model Comparison #########################
Scaled Chi-Squared Difference Test (method = "satorra.bentler.2001")
lavaan->unknown():
lavaan NOTE: The "Chisq" column contains standard test statistics, not the
robust test that should be reported per model. A robust difference test is
a function of two standard (not robust) statistics.
Df AIC BIC Chisq Chisq diff Df diff Pr(>Chisq)
fit.partial2 170 35711 36045 422.62
fit.partial2.lv 172 35709 36033 424.20 2.4853 2 0.2886
####################### Model Fit Indices ###########################
chisq.scaled df.scaled pvalue.scaled rmsea.robust cfi.robust
fit.partial2 368.046† 170 .000 .052 .964†
fit.partial2.lv 371.348 172 .000 .051† .964
tli.robust srmr aic bic
fit.partial2 .961 .039† 35711.011 36044.738
fit.partial2.lv .962† .047 35708.592† 36032.504†
################## Differences in Fit Indices #######################
df.scaled rmsea.robust cfi.robust tli.robust
fit.partial2.lv - fit.partial2 2 0 0 0
srmr aic bic
fit.partial2.lv - fit.partial2 0.008 -2.419 -12.234
What does the Chi-square Difference test tell us?
Factor Covariances
Next, we will test if the factor covariance between Anxiety and Stress is equivalent across groups. We will do so by adding lv.covariances
to the group.equal
argument of the previous partial strong invariance model with equivalent factor variances:
<- measEq.syntax(configural.model = cfa_config,
fit.partial2.lcv data = DASS21,
group = "engnat",
ID.fac = "effects",
meanstructure = TRUE,
return.fit = TRUE,
estimator = "mlr",
group.equal = c("loadings", "intercepts",
"lv.variances",
"lv.covariances"),
group.partial = c("q18~1", "q2~1"))
# Print out the model syntax, so you can
# see what semTools is helping us do:
# cat(as.character(fit.partial2.lcv@call$model))
We can now compare the fit of this model to the previous partial strong invariance model with equivalent factor variances:
<- compareFit(fit.partial2.lv, fit.partial2.lcv)
comp_45 summary(comp_45)
################### Nested Model Comparison #########################
Scaled Chi-Squared Difference Test (method = "satorra.bentler.2001")
lavaan->unknown():
lavaan NOTE: The "Chisq" column contains standard test statistics, not the
robust test that should be reported per model. A robust difference test is
a function of two standard (not robust) statistics.
Df AIC BIC Chisq Chisq diff Df diff Pr(>Chisq)
fit.partial2.lv 172 35709 36033 424.20
fit.partial2.lcv 173 35707 36026 424.58 0.27108 1 0.6026
####################### Model Fit Indices ###########################
chisq.scaled df.scaled pvalue.scaled rmsea.robust cfi.robust
fit.partial2.lv 371.348 172 .000 .051 .964
fit.partial2.lcv 371.157† 173 .000 .051† .964†
tli.robust srmr aic bic
fit.partial2.lv .962 .047 35708.592 36032.504
fit.partial2.lcv .962† .046† 35706.978† 36025.982†
################## Differences in Fit Indices #######################
df.scaled rmsea.robust cfi.robust tli.robust
fit.partial2.lcv - fit.partial2.lv 1 0 0 0
srmr aic bic
fit.partial2.lcv - fit.partial2.lv -0.001 -1.614 -6.522
What does the Chi-square Difference test tell us?
Factor Means
Finally, we can compare the Anxiety and Stress latent factor means to each other. We can use the effect size formula from the book chapter to do so (which can be interpreted as a Cohen’s d effect size). For each of the latent factors, we need the latent mean estimates and an estimate of the factor variance (in our case, the factor variance is equivalent, but the book describes options that can be used if the factor variances are not equivalent). We will find these estimates using the group_by_groups
function from the semhelpinghands
package:
group_by_groups(fit.partial2.lcv) %>%
filter((op == "~1" & (lhs == "stress" | lhs == "anxiety")) |
== "~~" & (lhs == "stress" | lhs == "anxiety")) op
lhs op rhs est_EnglishNative est_ELL
1 anxiety ~~ anxiety 0.527 0.527
2 anxiety ~~ stress 0.481 0.481
3 stress ~~ stress 0.535 0.535
4 anxiety ~1 2.269 2.088
5 stress ~1 2.506 2.261
Here are the computations of the effect size for the mean difference in Anxiety:
# mean in English Native Speaker: 2.269
# mean in ELL Speaker: 2.088
# variance: 0.527
2.269 - 2.088) / sqrt(0.527) (
[1] 0.2493293
This indicates that the mean of the Anxiety factor of the English Native Speaker sample is about 0.25 standard deviations higher than that of the ELL Speaker sample.
Here are the computations of the effect size for the mean difference in Stress:
# mean in English Native Speaker: 2.506
# mean in ELL Speaker: 2.261
# variance: 0.535
2.506 - 2.261) / sqrt(0.535) (
[1] 0.3349571
This indicates that the mean of the Stress factor of the English Native Speaker sample is about 0.33 standard deviations higher than that of the ELL Speaker sample.
We can also test whether the mean differences are significant using a Wald test. This test is based on the difference in model chi-square that occurs when two parameters (in this case the latent factor means) are constrained to be equivalent.
# H0: Average anxiety is equivalent across
# English Native and ELL speakers
lavTestWald(fit.partial2.lcv, constraints = "alpha.1.g1 == alpha.1.g2")
$stat
[1] 12.99259
$df
[1] 1
$p.value
[1] 0.0003127267
$se
[1] "robust.huber.white"
# H0: Average stress is equivalent across
# English Native and ELL speakers
lavTestWald(fit.partial2.lcv, constraints = "alpha.2.g1 == alpha.2.g2")
$stat
[1] 23.89415
$df
[1] 1
$p.value
[1] 1.017804e-06
$se
[1] "robust.huber.white"
Based on the results above, both tests were significant (p < .05), so the mean differences are significant between groups.
7.10 Conclusion
In this example, we found evidence of configural and weak invariance of the Anxiety and Stress factors from the DASS-21 over English native and ELL speaker samples. There is somewhat less convincing evidence of strong invariance, in that most, but not all, intercepts may be equivalent across groups. This means that the English native and ELL speaker samples did not have the same relative standing on one anxiety and one stress item, given the same position on the underlying conceptual variable (i.e., Anxiety/Stress).
In a paper, you would report all of the above, adding all unstandardized estimates + SEs of the final retained model (in a table).
7.11 Summary
In this R lab, you were introduced to the steps involved in testing measurement invariance with continuous indicators. Below, you’ll find a Bonus section that demonstrates how group mean differences based on the observed item responses differ from what we found using the CFA approach. In the next R Lab, you will learn all about structural after measurement models, which separate the measurement from the structural model in the estimation steps.
7.12 Bonus: Compare Findings to Mean Difference Based on Observed Mean Scores
The code below shows you how you can compare the two groups using mean scores based on the observed items (what you might typically have done), using t-tests. In this case, both Cohen’s D estimates are smaller compared to the difference estimates based on the latent factors. Thus, if we had ignored the measurement model and measurement invariance question, we would have concluded that the differences between native English speakers and ELL speaker were smaller.
# Create mean score variables:
<- DASS21 %>%
DASS21 rowwise() %>%
mutate(anx_mean = mean(c(q2, q4, q7, q9, q15, q19, q20)),
str_mean = mean(c(q1, q6, q8, q11, q12, q14, q18)))
# Test significance of mean difference (both significant)
t.test(anx_mean ~ engnat, data = DASS21)
Welch Two Sample t-test
data: anx_mean by engnat
t = 2.5351, df = 994.7, p-value = 0.01139
alternative hypothesis: true difference in means between group EnglishNative and group ELL is not equal to 0
95 percent confidence interval:
0.02891907 0.22708093
sample estimates:
mean in group EnglishNative mean in group ELL
2.307714 2.179714
t.test(str_mean ~ engnat, data = DASS21)
Welch Two Sample t-test
data: str_mean by engnat
t = 3.965, df = 994.77, p-value = 7.864e-05
alternative hypothesis: true difference in means between group EnglishNative and group ELL is not equal to 0
95 percent confidence interval:
0.09971799 0.29513916
sample estimates:
mean in group EnglishNative mean in group ELL
2.548857 2.351429
# Get cohen's D
# (both under-estimated compared to latent mean comparison):
::cohen.d(DASS21$anx_mean, group = DASS21$engnat) psych
Call: psych::cohen.d(x = DASS21$anx_mean, group = DASS21$engnat)
Cohen d statistic of difference between two means
lower effect upper
[1,] -0.28 -0.16 -0.04
Multivariate (Mahalanobis) distance between groups
[1] 0.16
r equivalent of difference between two means
data
-0.08
::cohen.d(DASS21$str_mean, group = DASS21$engnat) psych
Call: psych::cohen.d(x = DASS21$str_mean, group = DASS21$engnat)
Cohen d statistic of difference between two means
lower effect upper
[1,] -0.38 -0.25 -0.13
Multivariate (Mahalanobis) distance between groups
[1] 0.25
r equivalent of difference between two means
data
-0.12