library(rio)
library(lavaan)
library(semTools)
library(ggplot2)
library(dplyr)
library(tidyr)
library(semhelpinghands)
library(ggdist)
10 Measurement Invariance Testing with Ordinal Indicators
You can download the R code used in this lab by right-clicking this link and selecting “Save Link As…” in the drop-down menu: measurementinvariance_ord.R
10.1 Loading R packages
Load the required packages for this lab into your R environment.
10.2 Loading Data
Load the data into your environment. For this lab we will use a dataset of N = 1000 individuals who completed the Depression-Anxiety-Stress Scales (DASS-21). For this lab, we will focus on the Anxiety and Stress subscales. We will see if there is measurement invariance across participants whose native language is English (1) versus participants who have a different native language/ELL (2).
You can download the data by right-clicking this link and selecting “Save Link As…” in the drop-down menu: data/DASS21.csv. Make sure to save it in the folder you are using for this class.
<- import(file = "data/DASS21.csv")
DASS21
$engnat <- factor(DASS21$engnat,
DASS21levels = c("EnglishNative", "ELL"))
Here is an overview of the questions on the Anxiety subscale:
- q2: I was aware of dryness in my mouth
- q4: I experienced breathing difficulty (eg, excessively rapid breathing, breathlessness in the absence of physical exertion)
- q7: I experienced trembling (e.g., in the hands)
- q9: I was worried about situations in which I might panic and make a fool of myself.
- q15: I felt I was close to panic.
- q19: I was aware of the action of my heart in the absence of physical exertion (eg, sense of heart rate increase, heart missing a beat).
- q20: I felt scared without any good reason.
And the questions on the Stress subscale:
- q1: I found it hard to wind down.
- q6: I tended to over-react to situations.
- q8: I felt that I was using a lot of nervous energy.
- q11: I found myself getting agitated.
- q12: I found it difficult to relax.
- q14: I was intolerant of anything that kept me from getting on with what I was doing.
- q18: I felt that I was rather touchy.
You may remember that we used this data when we talked about measurement invariance with continuous indicators. In reality, the responses to these items are on a four-point Likert-type scale, ranging from Never to Almost Always. As I’ve mentioned before, with fewer than 5 response options, assuming that data can be treated as though they are continuous is tenuous at best. In addition, the response distributions for some of these items are skewed (see plots below). Thus, in this lab, we will examine if levels of measurement invariance can be retained when we properly model the data as ordinal instead of continuous.
%>%
DASS21 select(q2, q4, q7, q9, q15, q19, q20) %>%
pivot_longer(cols = q2:q20) %>%
ggplot(aes(x = value)) +
geom_bar() +
facet_wrap(vars(name)) +
ggtitle("Anxiety Subscale")
%>%
DASS21 select(q1, q6, q8, q11, q12, q14, q18) %>%
pivot_longer(cols = q1:q18) %>%
ggplot(aes(x = value)) +
geom_bar() +
facet_wrap(vars(name)) +
ggtitle("Stress Subscale")
10.3 Statistical Decision Criteria
For this lab, I will use the following statistical decision criteria for the model comparison tests between levels of invariance. These are based on the knowledge that our samples are relatively large (N = 500 per group), and are informed by the textbook and Chen (2007):
- Look at Chi-square Difference test.
- If not significant: retain next level of invariance
- If significant, look at other fit information
- Are there any patterns of problematic correlation residuals > |.10|? Reject next level of invariance, test partial invariance
- Does CFI decrease ≤ 0.010 or does RMSEA increase by ≥ 0.015? Reject next level of invariance, test partial invariance
10.4 Configural Invariance
We will specify a two-factor CFA using lavaan syntax. Based on previous research, this two-factor model already includes residual covariances between two pairs of items. What these items have in common is that they focus on physical symptoms of anxiety, which causes them to share more commmon variance with each other (especially with q4) than with the other anxiety items.
<- '
cfa_config anxiety =~ q2 + q4 + q7 + q9 + q15 + q19 + q20
stress =~ q1 + q6 + q8 + q11 + q12 + q14 + q18
q4 ~~ q19
q4 ~~ q7
'
We are going to use a function from the semTools
package that will help us set the scale for our latent factors and apply the equality constraint labels. This saves us A LOT of typing. Given the ordinal nature of our indicators, we will use the Wu & Estabrook (2016) approach for model identification, which is typically combined with the unit-variance scaling of the latent factors (so the variances of the factors are [initially] fixed to 1, factor means are [initially] fixed to zero, and all loadings are estimated in both groups). In addition, we use Delta parameterization, which scales the latent response variables to follow a standard Normal distribution.
We also have to declare that the indicators are ordinal, which we can do by adding ordered = T
. If you have a mix of ordinal and continuous items in your analysis, you need to specify which variables are ordinal: ordered = c("q1", "q2")
.
Finally, we switch the estimator to wlsmv
, which stands for weighted least sqaures-mean and variance adjusted. This is the recommended estimator for analyses with ordinal endogenous variables.
<- measEq.syntax(configural.model = cfa_config, data = DASS21,
fit.config group = "engnat", ordered = T,
ID.fac = "std.lv",
ID.cat = "Wu.Estabrook.2016",
parameterization = "delta",
meanstructure = TRUE,
return.fit = TRUE,
estimator = "wlsmv")
# Print out the model syntax, so you can
# see what semTools is helping us do:
cat(as.character(fit.config@call$model))
## LOADINGS:
anxiety =~ c(NA, NA)*q2 + c(lambda.1_1.g1, lambda.1_1.g2)*q2
anxiety =~ c(NA, NA)*q4 + c(lambda.2_1.g1, lambda.2_1.g2)*q4
anxiety =~ c(NA, NA)*q7 + c(lambda.3_1.g1, lambda.3_1.g2)*q7
anxiety =~ c(NA, NA)*q9 + c(lambda.4_1.g1, lambda.4_1.g2)*q9
anxiety =~ c(NA, NA)*q15 + c(lambda.5_1.g1, lambda.5_1.g2)*q15
anxiety =~ c(NA, NA)*q19 + c(lambda.6_1.g1, lambda.6_1.g2)*q19
anxiety =~ c(NA, NA)*q20 + c(lambda.7_1.g1, lambda.7_1.g2)*q20
stress =~ c(NA, NA)*q1 + c(lambda.8_2.g1, lambda.8_2.g2)*q1
stress =~ c(NA, NA)*q6 + c(lambda.9_2.g1, lambda.9_2.g2)*q6
stress =~ c(NA, NA)*q8 + c(lambda.10_2.g1, lambda.10_2.g2)*q8
stress =~ c(NA, NA)*q11 + c(lambda.11_2.g1, lambda.11_2.g2)*q11
stress =~ c(NA, NA)*q12 + c(lambda.12_2.g1, lambda.12_2.g2)*q12
stress =~ c(NA, NA)*q14 + c(lambda.13_2.g1, lambda.13_2.g2)*q14
stress =~ c(NA, NA)*q18 + c(lambda.14_2.g1, lambda.14_2.g2)*q18
## THRESHOLDS:
q2 | c(NA, NA)*t1 + c(q2.thr1.g1, q2.thr1.g2)*t1
q2 | c(NA, NA)*t2 + c(q2.thr2.g1, q2.thr2.g2)*t2
q2 | c(NA, NA)*t3 + c(q2.thr3.g1, q2.thr3.g2)*t3
q4 | c(NA, NA)*t1 + c(q4.thr1.g1, q4.thr1.g2)*t1
q4 | c(NA, NA)*t2 + c(q4.thr2.g1, q4.thr2.g2)*t2
q4 | c(NA, NA)*t3 + c(q4.thr3.g1, q4.thr3.g2)*t3
q7 | c(NA, NA)*t1 + c(q7.thr1.g1, q7.thr1.g2)*t1
q7 | c(NA, NA)*t2 + c(q7.thr2.g1, q7.thr2.g2)*t2
q7 | c(NA, NA)*t3 + c(q7.thr3.g1, q7.thr3.g2)*t3
q9 | c(NA, NA)*t1 + c(q9.thr1.g1, q9.thr1.g2)*t1
q9 | c(NA, NA)*t2 + c(q9.thr2.g1, q9.thr2.g2)*t2
q9 | c(NA, NA)*t3 + c(q9.thr3.g1, q9.thr3.g2)*t3
q15 | c(NA, NA)*t1 + c(q15.thr1.g1, q15.thr1.g2)*t1
q15 | c(NA, NA)*t2 + c(q15.thr2.g1, q15.thr2.g2)*t2
q15 | c(NA, NA)*t3 + c(q15.thr3.g1, q15.thr3.g2)*t3
q19 | c(NA, NA)*t1 + c(q19.thr1.g1, q19.thr1.g2)*t1
q19 | c(NA, NA)*t2 + c(q19.thr2.g1, q19.thr2.g2)*t2
q19 | c(NA, NA)*t3 + c(q19.thr3.g1, q19.thr3.g2)*t3
q20 | c(NA, NA)*t1 + c(q20.thr1.g1, q20.thr1.g2)*t1
q20 | c(NA, NA)*t2 + c(q20.thr2.g1, q20.thr2.g2)*t2
q20 | c(NA, NA)*t3 + c(q20.thr3.g1, q20.thr3.g2)*t3
q1 | c(NA, NA)*t1 + c(q1.thr1.g1, q1.thr1.g2)*t1
q1 | c(NA, NA)*t2 + c(q1.thr2.g1, q1.thr2.g2)*t2
q1 | c(NA, NA)*t3 + c(q1.thr3.g1, q1.thr3.g2)*t3
q6 | c(NA, NA)*t1 + c(q6.thr1.g1, q6.thr1.g2)*t1
q6 | c(NA, NA)*t2 + c(q6.thr2.g1, q6.thr2.g2)*t2
q6 | c(NA, NA)*t3 + c(q6.thr3.g1, q6.thr3.g2)*t3
q8 | c(NA, NA)*t1 + c(q8.thr1.g1, q8.thr1.g2)*t1
q8 | c(NA, NA)*t2 + c(q8.thr2.g1, q8.thr2.g2)*t2
q8 | c(NA, NA)*t3 + c(q8.thr3.g1, q8.thr3.g2)*t3
q11 | c(NA, NA)*t1 + c(q11.thr1.g1, q11.thr1.g2)*t1
q11 | c(NA, NA)*t2 + c(q11.thr2.g1, q11.thr2.g2)*t2
q11 | c(NA, NA)*t3 + c(q11.thr3.g1, q11.thr3.g2)*t3
q12 | c(NA, NA)*t1 + c(q12.thr1.g1, q12.thr1.g2)*t1
q12 | c(NA, NA)*t2 + c(q12.thr2.g1, q12.thr2.g2)*t2
q12 | c(NA, NA)*t3 + c(q12.thr3.g1, q12.thr3.g2)*t3
q14 | c(NA, NA)*t1 + c(q14.thr1.g1, q14.thr1.g2)*t1
q14 | c(NA, NA)*t2 + c(q14.thr2.g1, q14.thr2.g2)*t2
q14 | c(NA, NA)*t3 + c(q14.thr3.g1, q14.thr3.g2)*t3
q18 | c(NA, NA)*t1 + c(q18.thr1.g1, q18.thr1.g2)*t1
q18 | c(NA, NA)*t2 + c(q18.thr2.g1, q18.thr2.g2)*t2
q18 | c(NA, NA)*t3 + c(q18.thr3.g1, q18.thr3.g2)*t3
## INTERCEPTS:
q2 ~ c(0, 0)*1 + c(nu.1.g1, nu.1.g2)*1
q4 ~ c(0, 0)*1 + c(nu.2.g1, nu.2.g2)*1
q7 ~ c(0, 0)*1 + c(nu.3.g1, nu.3.g2)*1
q9 ~ c(0, 0)*1 + c(nu.4.g1, nu.4.g2)*1
q15 ~ c(0, 0)*1 + c(nu.5.g1, nu.5.g2)*1
q19 ~ c(0, 0)*1 + c(nu.6.g1, nu.6.g2)*1
q20 ~ c(0, 0)*1 + c(nu.7.g1, nu.7.g2)*1
q1 ~ c(0, 0)*1 + c(nu.8.g1, nu.8.g2)*1
q6 ~ c(0, 0)*1 + c(nu.9.g1, nu.9.g2)*1
q8 ~ c(0, 0)*1 + c(nu.10.g1, nu.10.g2)*1
q11 ~ c(0, 0)*1 + c(nu.11.g1, nu.11.g2)*1
q12 ~ c(0, 0)*1 + c(nu.12.g1, nu.12.g2)*1
q14 ~ c(0, 0)*1 + c(nu.13.g1, nu.13.g2)*1
q18 ~ c(0, 0)*1 + c(nu.14.g1, nu.14.g2)*1
## SCALING FACTORS:
q2 ~*~ c(1, 1)*q2
q4 ~*~ c(1, 1)*q4
q7 ~*~ c(1, 1)*q7
q9 ~*~ c(1, 1)*q9
q15 ~*~ c(1, 1)*q15
q19 ~*~ c(1, 1)*q19
q20 ~*~ c(1, 1)*q20
q1 ~*~ c(1, 1)*q1
q6 ~*~ c(1, 1)*q6
q8 ~*~ c(1, 1)*q8
q11 ~*~ c(1, 1)*q11
q12 ~*~ c(1, 1)*q12
q14 ~*~ c(1, 1)*q14
q18 ~*~ c(1, 1)*q18
## UNIQUE-FACTOR COVARIANCES:
q4 ~~ c(NA, NA)*q7 + c(theta.3_2.g1, theta.3_2.g2)*q7
q4 ~~ c(NA, NA)*q19 + c(theta.6_2.g1, theta.6_2.g2)*q19
## LATENT MEANS/INTERCEPTS:
anxiety ~ c(0, 0)*1 + c(alpha.1.g1, alpha.1.g2)*1
stress ~ c(0, 0)*1 + c(alpha.2.g1, alpha.2.g2)*1
## COMMON-FACTOR VARIANCES:
anxiety ~~ c(1, 1)*anxiety + c(psi.1_1.g1, psi.1_1.g2)*anxiety
stress ~~ c(1, 1)*stress + c(psi.2_2.g1, psi.2_2.g2)*stress
## COMMON-FACTOR COVARIANCES:
anxiety ~~ c(NA, NA)*stress + c(psi.2_1.g1, psi.2_1.g2)*stress
To see if the configural model is tenable, we will first inspect indices of global (exact and approximate) fit. With WLSMV, we focus on scaled (but NOT robust) fit indices. To save space, we will use the fitMeasures function, which prints out just the fit indices we ask for.
fitMeasures(fit.config,
fit.measures = c("chisq.scaled", "df.scaled",
"pvalue.scaled", "cfi.scaled",
"rmsea.scaled", "rmsea.ci.lower.scaled",
"rmsea.ci.upper.scaled", "srmr"))
chisq.scaled df.scaled pvalue.scaled
428.078 148.000 0.000
cfi.scaled rmsea.scaled rmsea.ci.lower.scaled
0.982 0.062 0.055
rmsea.ci.upper.scaled srmr
0.068 0.037
The Chi-square test is significant, but other indices look better. Our sample is relatively large, so we will look at the local fit indices to see if any remaining misfit is trivial, or if there are major issues. With ordinal data, we get residuals for the covariances, means, and thresholds. Right now, our model only constrains the covariances, so we will focus on that output.
residuals(fit.config, type = "cor.bollen")$EnglishNative$cov
q2 q4 q7 q9 q15 q19 q20 q1 q6 q8
q2 0.000
q4 0.081 0.000
q7 0.055 0.000 0.000
q9 -0.047 -0.051 0.020 0.000
q15 -0.086 0.026 -0.017 -0.014 0.000
q19 0.040 0.000 0.076 -0.047 -0.034 0.000
q20 -0.087 0.024 -0.040 0.032 0.007 -0.025 0.000
q1 -0.033 -0.051 -0.045 -0.018 -0.006 0.016 -0.010 0.000
q6 -0.008 -0.035 -0.036 0.040 0.009 -0.007 -0.020 -0.034 0.000
q8 0.001 0.014 0.093 0.040 0.065 0.029 0.064 -0.062 -0.061 0.000
q11 0.004 -0.048 -0.088 0.013 -0.028 -0.031 -0.004 -0.009 0.061 -0.060
q12 -0.040 0.007 -0.056 -0.037 0.018 -0.027 -0.038 0.095 -0.014 -0.037
q14 0.063 -0.036 -0.013 -0.021 -0.044 0.021 -0.024 0.026 -0.003 -0.121
q18 0.043 0.028 0.001 0.018 -0.060 0.002 -0.010 -0.043 0.057 -0.050
q11 q12 q14 q18
q2
q4
q7
q9
q15
q19
q20
q1
q6
q8
q11 0.000
q12 -0.016 0.000
q14 0.107 -0.037 0.000
q18 0.014 -0.015 0.038 0.000
residuals(fit.config, type = "cor.bollen")$ELL$cov
q2 q4 q7 q9 q15 q19 q20 q1 q6 q8
q2 0.000
q4 0.080 0.000
q7 0.027 0.000 0.000
q9 -0.024 -0.030 -0.022 0.000
q15 -0.052 0.008 0.013 0.045 0.000
q19 0.024 0.000 0.039 -0.087 0.001 0.000
q20 0.007 -0.048 -0.017 0.020 0.006 -0.052 0.000
q1 -0.029 -0.016 -0.032 -0.061 -0.051 -0.014 -0.019 0.000
q6 -0.003 0.001 -0.032 -0.001 -0.020 -0.011 0.004 0.041 0.000
q8 0.012 0.023 0.063 0.062 0.048 0.049 0.015 -0.073 -0.070 0.000
q11 0.025 -0.012 -0.056 -0.017 -0.016 0.018 0.033 0.038 -0.028 -0.031
q12 -0.055 0.004 -0.028 -0.026 -0.012 0.024 0.005 0.077 -0.016 -0.025
q14 -0.020 0.005 0.003 0.005 -0.054 -0.031 -0.005 0.039 0.047 -0.029
q18 0.002 0.005 0.007 0.015 -0.049 0.033 0.009 -0.042 0.092 -0.068
q11 q12 q14 q18
q2
q4
q7
q9
q15
q19
q20
q1
q6
q8
q11 0.000
q12 0.005 0.000
q14 0.003 0.001 0.000
q18 0.026 -0.049 0.006 0.000
Overall, the correlation residuals look good. Only the correlation residuals of q14
with q8
and q11
were > |.1| in the English native speaker group. There is no a priori justification why these three items share an omitted cause (that could be accommodated via a residual covariance). In addition, the correlation residuals were relatively close to |.1|, indicating that adding a residual covariance would likely only have a small impact on global model fit. Thus, we will retain the configural invariance model, and examine threshold invariance.
Parameter Estimate Interpretation
Using the function below, we can compare the estimates across groups and see that all the measurement-related parameters are still varying across groups (as we have not included any equality constraints yet).
group_by_groups(fit.config)
lhs op rhs est_EnglishNative est_ELL
1 anxiety =~ q15 0.869 0.854
2 anxiety =~ q19 0.660 0.683
3 anxiety =~ q2 0.574 0.566
4 anxiety =~ q20 0.782 0.809
5 anxiety =~ q4 0.742 0.686
6 anxiety =~ q7 0.720 0.677
7 anxiety =~ q9 0.772 0.735
8 stress =~ q1 0.799 0.776
9 stress =~ q11 0.760 0.763
10 stress =~ q12 0.820 0.791
11 stress =~ q14 0.656 0.692
12 stress =~ q18 0.655 0.608
13 stress =~ q6 0.779 0.742
14 stress =~ q8 0.824 0.829
15 anxiety ~~ anxiety 1.000 1.000
16 anxiety ~~ stress 0.902 0.917
17 q1 ~~ q1 0.361 0.397
18 q11 ~~ q11 0.423 0.418
19 q12 ~~ q12 0.327 0.375
20 q14 ~~ q14 0.570 0.521
21 q15 ~~ q15 0.246 0.270
22 q18 ~~ q18 0.571 0.631
23 q19 ~~ q19 0.565 0.534
24 q2 ~~ q2 0.671 0.680
25 q20 ~~ q20 0.389 0.346
26 q4 ~~ q19 0.183 0.174
27 q4 ~~ q4 0.449 0.529
28 q4 ~~ q7 0.047 0.144
29 q6 ~~ q6 0.393 0.449
30 q7 ~~ q7 0.481 0.541
31 q8 ~~ q8 0.321 0.313
32 q9 ~~ q9 0.405 0.460
33 stress ~~ stress 1.000 1.000
34 anxiety ~1 0.000 0.000
35 q1 ~1 0.000 0.000
36 q11 ~1 0.000 0.000
37 q12 ~1 0.000 0.000
38 q14 ~1 0.000 0.000
39 q15 ~1 0.000 0.000
40 q18 ~1 0.000 0.000
41 q19 ~1 0.000 0.000
42 q2 ~1 0.000 0.000
43 q20 ~1 0.000 0.000
44 q4 ~1 0.000 0.000
45 q6 ~1 0.000 0.000
46 q7 ~1 0.000 0.000
47 q8 ~1 0.000 0.000
48 q9 ~1 0.000 0.000
49 stress ~1 0.000 0.000
50 q1 | t1 -0.820 -0.713
51 q1 | t2 0.025 0.440
52 q1 | t3 0.700 1.080
53 q11 | t1 -1.054 -0.693
54 q11 | t2 -0.065 0.332
55 q11 | t3 0.530 0.986
56 q12 | t1 -0.915 -0.739
57 q12 | t2 -0.070 0.238
58 q12 | t3 0.559 0.834
59 q14 | t1 -0.693 -0.687
60 q14 | t2 0.285 0.321
61 q14 | t3 0.878 1.019
62 q15 | t1 -0.674 -0.305
63 q15 | t2 0.111 0.457
64 q15 | t3 0.772 1.071
65 q18 | t1 -0.662 -0.923
66 q18 | t2 0.050 0.045
67 q18 | t3 0.674 0.700
68 q19 | t1 -0.451 -0.353
69 q19 | t2 0.212 0.316
70 q19 | t3 0.820 0.954
71 q2 | t1 -0.217 -0.490
72 q2 | t2 0.407 0.274
73 q2 | t3 0.793 0.856
74 q20 | t1 -0.530 -0.542
75 q20 | t2 0.176 0.305
76 q20 | t3 0.726 0.900
77 q4 | t1 -0.238 -0.055
78 q4 | t2 0.418 0.631
79 q4 | t3 0.994 1.195
80 q6 | t1 -0.938 -0.779
81 q6 | t2 -0.080 0.192
82 q6 | t3 0.565 0.772
83 q7 | t1 -0.321 -0.100
84 q7 | t2 0.490 0.668
85 q7 | t3 1.019 1.136
86 q8 | t1 -0.745 -0.643
87 q8 | t2 -0.025 0.285
88 q8 | t3 0.625 0.915
89 q9 | t1 -0.849 -0.962
90 q9 | t2 -0.187 -0.010
91 q9 | t3 0.348 0.571
92 q1 ~*~ q1 1.000 1.000
93 q11 ~*~ q11 1.000 1.000
94 q12 ~*~ q12 1.000 1.000
95 q14 ~*~ q14 1.000 1.000
96 q15 ~*~ q15 1.000 1.000
97 q18 ~*~ q18 1.000 1.000
98 q19 ~*~ q19 1.000 1.000
99 q2 ~*~ q2 1.000 1.000
100 q20 ~*~ q20 1.000 1.000
101 q4 ~*~ q4 1.000 1.000
102 q6 ~*~ q6 1.000 1.000
103 q7 ~*~ q7 1.000 1.000
104 q8 ~*~ q8 1.000 1.000
105 q9 ~*~ q9 1.000 1.000
10.5 Threshold Invariance
All we need to do to estimate the Threshold Invariance model is change the arguments of the function a little bit (add group.equal = "thresholds"
):
<- measEq.syntax(configural.model = cfa_config, data = DASS21,
fit.thresh group = "engnat", ordered = T,
ID.fac = "std.lv",
ID.cat = "Wu.Estabrook.2016",
parameterization = "delta",
meanstructure = TRUE,
return.fit = TRUE,
estimator = "wlsmv",
group.equal = "thresholds")
# Print out the model syntax, so you can
# see what semTools is helping us do:
# cat(as.character(fit.thresh@call$model))
In the model syntax above (if you remove the # and run the line of code), you can see that each factor loading now has the same label across the two groups, ensuring that they are constraint to be equal.
Model Comparison
To test if we can retain the Threshold Invariance model, we compare its fit to that of the Configural Invariance model:
<- compareFit(fit.config, fit.thresh)
comp_12 @nested comp_12
Scaled Chi-Squared Difference Test (method = "satorra.2000")
lavaan->unknown():
lavaan NOTE: The "Chisq" column contains standard test statistics, not the
robust test that should be reported per model. A robust difference test is
a function of two standard (not robust) statistics.
Df AIC BIC Chisq Chisq diff Df diff Pr(>Chisq)
fit.config 148 227.63
fit.thresh 162 234.16 18.999 14 0.165
What does the Chi-square difference test result tell us?
Parameter Estimate Interpretation
Using the function below, we can easily compare the estimates across groups and see that all the thresholds are now exactly the same (because we constrained them to be).
# Look at threshold parameter estimates across groups by using filter
# to include only results with the threshold operator "|"
group_by_groups(fit.thresh) %>% filter(op == "|")
lhs op rhs est_EnglishNative est_ELL
1 q1 | t1 -0.843 -0.843
2 q1 | t2 0.070 0.070
3 q1 | t3 0.667 0.667
4 q11 | t1 -1.050 -1.050
5 q11 | t2 -0.072 -0.072
6 q11 | t3 0.535 0.535
7 q12 | t1 -0.928 -0.928
8 q12 | t2 -0.047 -0.047
9 q12 | t3 0.544 0.544
10 q14 | t1 -0.686 -0.686
11 q14 | t2 0.269 0.269
12 q14 | t3 0.890 0.890
13 q15 | t1 -0.677 -0.677
14 q15 | t2 0.115 0.115
15 q15 | t3 0.769 0.769
16 q18 | t1 -0.678 -0.678
17 q18 | t2 0.082 0.082
18 q18 | t3 0.653 0.653
19 q19 | t1 -0.449 -0.449
20 q19 | t2 0.208 0.208
21 q19 | t3 0.823 0.823
22 q2 | t1 -0.209 -0.209
23 q2 | t2 0.386 0.386
24 q2 | t3 0.807 0.807
25 q20 | t1 -0.535 -0.535
26 q20 | t2 0.187 0.187
27 q20 | t3 0.718 0.718
28 q4 | t1 -0.241 -0.241
29 q4 | t2 0.424 0.424
30 q4 | t3 0.990 0.990
31 q6 | t1 -0.953 -0.953
32 q6 | t2 -0.054 -0.054
33 q6 | t3 0.548 0.548
34 q7 | t1 -0.323 -0.323
35 q7 | t2 0.496 0.496
36 q7 | t3 1.015 1.015
37 q8 | t1 -0.763 -0.763
38 q8 | t2 0.008 0.008
39 q8 | t3 0.603 0.603
40 q9 | t1 -0.865 -0.865
41 q9 | t2 -0.155 -0.155
42 q9 | t3 0.328 0.328
When we retain the Threshold Invariance model, we can conclude that participants in both groups switch from one response option to the next at equivalent locations on the latent response variable. Confirming this level of invariance allows us to movee to the more familiar weak (loadings) and strong (intercepts) variance.
10.6 Weak (Metric) Invariance
Again, all we need to do to estimate the Loading Invariance model is change the arguments of the function a little bit:
<- measEq.syntax(configural.model = cfa_config, data = DASS21,
fit.weak group = "engnat", ordered = T,
ID.fac = "std.lv",
ID.cat = "Wu.Estabrook.2016",
parameterization = "delta",
meanstructure = TRUE,
return.fit = TRUE,
estimator = "wlsmv",
group.equal = c("thresholds","loadings"))
# Print out the model syntax, so you can
# see what semTools is helping us do:
# cat(as.character(fit.weak@call$model))
Model Comparison
To test if we can retain the Strong Invariance model, we compare its fit to that of the Weak Invariance model:
<- compareFit(fit.thresh, fit.weak)
comp_23 @nested comp_23
Scaled Chi-Squared Difference Test (method = "satorra.2000")
lavaan->unknown():
lavaan NOTE: The "Chisq" column contains standard test statistics, not the
robust test that should be reported per model. A robust difference test is
a function of two standard (not robust) statistics.
Df AIC BIC Chisq Chisq diff Df diff Pr(>Chisq)
fit.thresh 162 234.16
fit.weak 174 252.28 18.577 12 0.09925 .
---
Signif. codes: 0 '***' 0.001 '**' 0.01 '*' 0.05 '.' 0.1 ' ' 1
What does the Chi-square difference test result tell us?
When we retain the Weak Invariance model, we can conclude that the measured concepts (Anxiety and Stress) have the same meaning across groups, because the item-factor associations can be considered equivalent (the same amount of common variance is extracted across groups).
10.7 Strong (Scalar) Invariance
<- measEq.syntax(configural.model = cfa_config, data = DASS21,
fit.strong group = "engnat", ordered = T,
ID.fac = "std.lv",
ID.cat = "Wu.Estabrook.2016",
parameterization = "delta",
meanstructure = TRUE,
return.fit = TRUE,
estimator = "wlsmv",
group.equal = c("thresholds",
"loadings",
"intercepts"))
# Print out the model syntax, so you can
# see what semTools is helping us do:
# cat(as.character(fit.strong@call$model))
Model Comparison
To test if we can retain the Strong Invariance model, we compare its fit to that of the Weak Invariance model:
<- compareFit(fit.weak, fit.strong)
comp_34 @nested comp_34
Scaled Chi-Squared Difference Test (method = "satorra.2000")
lavaan->unknown():
lavaan NOTE: The "Chisq" column contains standard test statistics, not the
robust test that should be reported per model. A robust difference test is
a function of two standard (not robust) statistics.
Df AIC BIC Chisq Chisq diff Df diff Pr(>Chisq)
fit.weak 174 252.28
fit.strong 186 347.39 88.083 12 1.159e-13 ***
---
Signif. codes: 0 '***' 0.001 '**' 0.01 '*' 0.05 '.' 0.1 ' ' 1
What does the Chi-square difference test result tell us?
Local Fit Comparison
Typically, we would look at mean residuals here. However, since the observed means depend on the latent response variable intercept and the thresholds, large residuals may be due to a combinations of small misfit in each of these parameter types. So, large values may not neccessarily point towards a problem with the latent response variable intercepts.
residuals(fit.strong, type = "cor.bollen")$EnglishNative$mean
q2 q4 q7 q9 q15 q19 q20 q1 q6 q8 q11 q12 q14 q18
0 0 0 0 0 0 0 0 0 0 0 0 0 0
residuals(fit.strong, type = "cor.bollen")$ELL$mean
q2 q4 q7 q9 q15 q19 q20 q1 q6 q8 q11 q12 q14
0.094 0.122 0.119 0.126 0.143 0.109 0.128 0.203 0.199 0.210 0.195 0.209 0.169
q18
0.164
You will notice that mean residuals for a specific indicator are 0 for the first group and a different value for the second group. When we constrain intercepts of the latent response variables to be equal across groups, with ordinal variables, the latent response variable intercept will map onto the observed mean (resulting in a mean residual of 0 for that group). In the analysis, this step is implemented by fixing the intercept to 0 in both groups.
To better understand how the equivalence of the latent response variable intercepts affected local fit, we can use the difference between the mean residuals of the Weak and Strong model. This shows which mean residuals changed most as a consequence of fixing the intercepts to 0:
residuals(fit.weak, type = "cor.bollen")$ELL$mean - residuals(fit.strong, type = "cor.bollen")$ELL$mean
q2 q4 q7 q9 q15 q19 q20 q1 q6 q8 q11
-0.239 0.066 0.047 -0.009 0.171 -0.029 -0.058 0.078 0.012 0.005 0.183
q12 q14 q18
0.038 -0.140 -0.234
Which intercepts appear to be most different across groups (in an absolute sense)?
10.8 Partial Strong Invariance (Round 1)
Since the fit worsened significantly when testing Strong Invariance, we will focus on identifying problematic latent response variable intercepts. In this case, we can use the modindices function. This is different from measurement invariance with continuous indicators. With continuous indicators, we use an equivalence constraint to force the intercepts to be equivalent, and we need to use lavTestScore
to see what would happen to model fit if the equivalence constraint was removed. With ordinal indicators, latent response variable invariance is accomplished by fixing the intercepts to 0 in both groups (instead of 0 in one group and freely estimated in the other). Thus, with ordinal variables, the intercept is not estimated, so we can use modification indices to examine to what extent model fit would improve if an intercept is freely estimated.
modificationIndices(fit.strong, sort. = TRUE, op = "~1",
minimum.value = 15)
Warning: lavaan->modificationIndices():
the modindices() function ignores equality constraints; use lavTestScore()
to assess the impact of releasing one or multiple constraints.
lhs op rhs block group level mi epc sepc.lv sepc.all sepc.nox
57 q2 ~1 1 1 1 28.618 0.266 0.266 0.266 0.266
162 q2 ~1 2 2 1 28.618 -0.266 -0.266 -0.289 -0.289
70 q18 ~1 1 1 1 27.358 0.261 0.261 0.261 0.261
175 q18 ~1 2 2 1 27.358 -0.261 -0.261 -0.280 -0.280
67 q11 ~1 1 1 1 17.439 -0.214 -0.214 -0.214 -0.214
172 q11 ~1 2 2 1 17.439 0.214 0.214 0.241 0.241
166 q15 ~1 2 2 1 15.841 0.212 0.212 0.231 0.231
61 q15 ~1 1 1 1 15.840 -0.212 -0.212 -0.212 -0.212
Which intercepts appear to be different across groups?
In our previous analyses of these data (treating the data as continuous), we found that item 18 and item 2 had non-invariant intercepts. Those same intercepts emerge as problematic in this analysis.
Model Comparison
To test if releasing the constraint for item 2 improves the fit of the Partial Strong Model sufficiently, we need to estimate the partial model. To do so, we can use the group.partial
argument in the measEq.syntax()
function:
<- measEq.syntax(configural.model = cfa_config, data = DASS21,
fit.partial1 group = "engnat", ordered = T,
ID.fac = "std.lv",
ID.cat = "Wu.Estabrook.2016",
parameterization = "delta",
meanstructure = TRUE,
return.fit = TRUE,
estimator = "wlsmv",
group.equal = c("thresholds",
"loadings",
"intercepts"),
group.partial = c("q2 ~ 1"))
# Print out the model syntax, so you can
# see what semTools is helping us do:
# cat(as.character(fit.partial1@call$model))
Let’s compare the fit of this model to the Weak Invariance model
<- compareFit(fit.weak, fit.partial1)
comp_34b @nested comp_34b
Scaled Chi-Squared Difference Test (method = "satorra.2000")
lavaan->unknown():
lavaan NOTE: The "Chisq" column contains standard test statistics, not the
robust test that should be reported per model. A robust difference test is
a function of two standard (not robust) statistics.
Df AIC BIC Chisq Chisq diff Df diff Pr(>Chisq)
fit.weak 174 252.28
fit.partial1 185 318.72 64.682 11 1.236e-09 ***
---
Signif. codes: 0 '***' 0.001 '**' 0.01 '*' 0.05 '.' 0.1 ' ' 1
What do the results indicate?
10.9 Partial Strong Invariance (Round 2)
Since the model fit indicated some remaining issues with retaining the Partial Strong Invariance model, we will do another round of partial invariance testing:
modificationIndices(fit.partial1, sort. = TRUE, op = "~1",
minimum.value = 15)
Warning: lavaan->modificationIndices():
the modindices() function ignores equality constraints; use lavTestScore()
to assess the impact of releasing one or multiple constraints.
lhs op rhs block group level mi epc sepc.lv sepc.all sepc.nox
70 q18 ~1 1 1 1 27.356 0.262 0.262 0.262 0.262
175 q18 ~1 2 2 1 27.356 -0.262 -0.262 -0.280 -0.280
172 q11 ~1 2 2 1 17.437 0.214 0.214 0.241 0.241
67 q11 ~1 1 1 1 17.437 -0.214 -0.214 -0.214 -0.214
Model Comparison
The intercept of item 18 remains the most problematic. To test if releasing this constraint improves the fit of the Partial Strong Model sufficiently, we need to estimate a second partial model. To do so, we can use the group.partial
argument in the measEq.syntax()
function and add this second intercept:
<- measEq.syntax(configural.model = cfa_config, data = DASS21,
fit.partial2 group = "engnat", ordered = T,
ID.fac = "std.lv",
ID.cat = "Wu.Estabrook.2016",
parameterization = "delta",
meanstructure = TRUE,
return.fit = TRUE,
estimator = "wlsmv",
group.equal = c("thresholds",
"loadings",
"intercepts"),
group.partial = c("q2 ~ 1", "q18 ~ 1"))
# Print out the model syntax, so you can
# see what semTools is helping us do:
# cat(as.character(fit.partial2@call$model))
Let’s compare the fit of this model to the Weak Invariance model
<- compareFit(fit.weak, fit.partial2)
comp_34c print(comp_34c@nested)
Scaled Chi-Squared Difference Test (method = "satorra.2000")
lavaan->unknown():
lavaan NOTE: The "Chisq" column contains standard test statistics, not the
robust test that should be reported per model. A robust difference test is
a function of two standard (not robust) statistics.
Df AIC BIC Chisq Chisq diff Df diff Pr(>Chisq)
fit.weak 174 252.28
fit.partial2 184 291.31 39.063 10 2.475e-05 ***
---
Signif. codes: 0 '***' 0.001 '**' 0.01 '*' 0.05 '.' 0.1 ' ' 1
What do the results indicate?
10.10 Partial Strong Invariance (Round 3)
Since the model fit indicated some remaining issues with retaining the Partial Strong Invariance model, we will do another round of partial invariance testing (note that I had to lower the minimum.value
to still see modifications; you may choose to stop here and argue that remaining modifications wouldn’t improve fit enough to be warranted):
modificationIndices(fit.partial2, sort. = TRUE, op = "~1",
minimum.value = 10)
Warning: lavaan->modificationIndices():
the modindices() function ignores equality constraints; use lavTestScore()
to assess the impact of releasing one or multiple constraints.
lhs op rhs block group level mi epc sepc.lv sepc.all sepc.nox
174 q14 ~1 2 2 1 14.616 -0.192 -0.192 -0.222 -0.222
69 q14 ~1 1 1 1 14.616 0.192 0.192 0.192 0.192
67 q11 ~1 1 1 1 12.147 -0.181 -0.181 -0.181 -0.181
172 q11 ~1 2 2 1 12.147 0.181 0.181 0.203 0.203
61 q15 ~1 1 1 1 10.118 -0.171 -0.171 -0.171 -0.171
166 q15 ~1 2 2 1 10.118 0.171 0.171 0.188 0.188
Model Comparison
The intercept of item 14 is now most problematic. To test if releasing this constraint improves the fit of the Partial Strong Model sufficiently, we need to estimate a third partial model. To do so, we can use the group.partial
argument in the measEq.syntax()
function and add this second intercept:
<- measEq.syntax(configural.model = cfa_config, data = DASS21,
fit.partial3 group = "engnat", ordered = T,
ID.fac = "std.lv",
ID.cat = "Wu.Estabrook.2016",
parameterization = "delta",
meanstructure = TRUE,
return.fit = TRUE,
estimator = "wlsmv",
group.equal = c("thresholds",
"loadings",
"intercepts"),
group.partial = c("q2 ~ 1", "q18 ~ 1", "q14 ~ 1"))
# Print out the model syntax, so you can
# see what semTools is helping us do:
# cat(as.character(fit.partial3@call$model))
Let’s compare the fit of this model to the Weak Invariance model
<- compareFit(fit.weak, fit.partial3)
comp_34d print(comp_34d@nested)
Scaled Chi-Squared Difference Test (method = "satorra.2000")
lavaan->unknown():
lavaan NOTE: The "Chisq" column contains standard test statistics, not the
robust test that should be reported per model. A robust difference test is
a function of two standard (not robust) statistics.
Df AIC BIC Chisq Chisq diff Df diff Pr(>Chisq)
fit.weak 174 252.28
fit.partial3 183 276.66 24.761 9 0.003247 **
---
Signif. codes: 0 '***' 0.001 '**' 0.01 '*' 0.05 '.' 0.1 ' ' 1
What do the results indicate?
10.11 Partial Strong Invariance (Round 4)
Since the model fit indicated some remaining issues with retaining the third Partial Strong Invariance model, we will do another round of partial invariance testing:
modificationIndices(fit.partial3, sort. = TRUE, op = "~1",
minimum.value = 5)
Warning: lavaan->modificationIndices():
the modindices() function ignores equality constraints; use lavTestScore()
to assess the impact of releasing one or multiple constraints.
lhs op rhs block group level mi epc sepc.lv sepc.all sepc.nox
166 q15 ~1 2 2 1 10.116 0.171 0.171 0.188 0.188
61 q15 ~1 1 1 1 10.116 -0.171 -0.171 -0.171 -0.171
67 q11 ~1 1 1 1 8.401 -0.152 -0.152 -0.152 -0.152
172 q11 ~1 2 2 1 8.401 0.152 0.152 0.172 0.172
Model Comparison
The intercept of item 15 is now most problematic. To test if releasing this constraint improves the fit of the Partial Strong Model sufficiently, we need to estimate a second partial model. To do so, we can use the group.partial
argument in the measEq.syntax()
function and add this fourth intercept:
<-measEq.syntax(configural.model = cfa_config, data = DASS21,
fit.partial4 group = "engnat", ordered = T,
ID.fac = "std.lv",
ID.cat = "Wu.Estabrook.2016",
parameterization = "delta",
meanstructure = TRUE,
return.fit = TRUE,
estimator = "wlsmv",
group.equal = c("thresholds",
"loadings",
"intercepts"),
group.partial = c("q2 ~ 1", "q18 ~ 1",
"q14 ~ 1", "q15 ~ 1"))
# Print out the model syntax, so you can
# see what semTools is helping us do:
# cat(as.character(fit.partial4@call$model))
Let’s compare the fit of this model to the Weak Invariance model
<- compareFit(fit.weak, fit.partial4)
comp_34e print(comp_34e@nested)
Scaled Chi-Squared Difference Test (method = "satorra.2000")
lavaan->unknown():
lavaan NOTE: The "Chisq" column contains standard test statistics, not the
robust test that should be reported per model. A robust difference test is
a function of two standard (not robust) statistics.
Df AIC BIC Chisq Chisq diff Df diff Pr(>Chisq)
fit.weak 174 252.28
fit.partial4 182 266.52 14.256 8 0.07534 .
---
Signif. codes: 0 '***' 0.001 '**' 0.01 '*' 0.05 '.' 0.1 ' ' 1
What do the results indicate?
Local Fit Evaluation
Again, we will focus on the difference in residuals of the mean vectors.
residuals(fit.weak, type = "cor.bollen")$ELL$mean - residuals(fit.partial4, type = "cor.bollen")$ELL$mean
q2 q4 q7 q9 q15 q19 q20 q1 q6 q8 q11
0.002 0.070 0.052 -0.004 0.004 -0.025 -0.054 0.017 -0.047 -0.058 0.124
q12 q14 q18
-0.025 0.000 0.000
This time, we don’t see any problems with the mean vector in terms of local fit.
CFI and RMSEA Comparison
Next, we will examine if the CFI and RMSEA change to a lesser extent then our criteria (decrease in CFI ≤ 0.010, increase in RMSEA ≤ 0.015):
# Change in CFI
cat(paste0("Change in CFI: ", round(comp_34e@fit.diff$cfi.scaled, 3)))
Change in CFI: 0
# Change in RMSEA
cat(paste0("Change in RMSEA: ", round(comp_34e@fit.diff$rmsea.scaled, 3)))
Change in RMSEA: -0.002
These are much smaller than our cutoff criteria. They are also smaller than other suggested criteria (e.g., decrease in CFI ≤ 0.002 or increase in RMSEA ≤ 0.010). Based on this information, I will retain this partial strong invariant model with four freely estimated intercepts.
10.12 What do differences in intercepts mean?
It can be difficult to understand what differences in the intercept mean, especially when the thresholds are equivalent across groups. The figure below aims to help improve understanding of this situation, using item 18 as an example.
# Get item 18 intercept value for ELL group
<- parameterEstimates(fit.partial4) %>%
ell_int2 filter(op == "~1" & lhs == "q18" & group == 2) %>%
select(est) %>% as.numeric()
# Get item 18 thresholds (invariant across groups)
<- parameterEstimates(fit.partial4) %>%
ell_thresh2 filter(op == "|" & lhs == "q18" & group == 2) %>%
select(est) %>% unlist()
# Set up the latent response variable normal distribution
# for each group
tribble(
~ dist, ~args,~group,~int,
"norm", list(0, 1),"1. English",0,
"norm", list(ell_int2,1),"2. ELL",ell_int2
%>% ggplot() +
) # Change the fill color of the distribution at each threshold
# value by cutting up the distribution
::stat_slab(aes(xdist = dist, args = args,
ggdistfill = after_stat(cut(x, c(-Inf, ell_thresh2, Inf)))),
show.legend = FALSE) +
# Colors of the different sections of the latent response variable
scale_fill_manual(values = c("#494B69","#9F5B72",
"#D8707C","#FD9B41")) +
# Add vertical lines at each threshold value
geom_vline(xintercept = ell_thresh2[1]) +
geom_vline(xintercept = ell_thresh2[2]) +
geom_vline(xintercept = ell_thresh2[3]) +
# Add a dashed vertical line at the latent response variable
# intercept value
geom_vline(aes(xintercept = int), linetype = "dashed") +
# Plot both groups separately
facet_grid(rows = vars(group)) +
# Add axis labels
labs(x = "Underlying Latent Response Variable", y = "Density") +
# Change plot theme/some formatting
theme_classic() +
theme(text = element_text(size = 20),
axis.title = element_text(face = "bold"),
plot.title = element_text(face = "bold", hjust = .5))
This figure shows that, when the underlying latent factor, Stress, is equal to zero (which in this case indicates the average Stress level, because we used unit-variance factor scaling), English native speakers are most likely to select the second response option (Sometimes), whereas ELL speakers are most likely to select the third response option (Often), in response to item 18 (I felt that I was rather touchy). Thus, at average levels of stress, ELL speakers report often feeling rather touchy whereas English native speakers rerport sometimes feeling rather touchy.
10.13 Parameter Comparison Across Groups
Now that we have finalized the invariance testing procedure (note that you could test higher levels of invariance, but that these offer few practical benefits), we can continue to compare latent factor variances, covariances, and means across groups.
Factor Variances
First, we will test if the factor variances of the Anxiety and Stress factors are equivalent across groups. We will do so by adding lv.variances
to the group.equal
argument of the previous partial strong invariance model.
<- measEq.syntax(cfa_config, data = DASS21,
fit.partial4.lv ID.fac = "std.lv",
ID.cat = "Wu.Estabrook.2016",
group = "engnat",
group.equal = c("thresholds", "loadings",
"intercepts", "lv.variances"),
group.partial = c("q2 ~ 1", "q18 ~ 1",
"q14 ~ 1", "q15 ~ 1"),
ordered = T,
parameterization = "delta",
meanstructure = TRUE, return.fit = TRUE,
estimator = "wlsmv")
# Print out the model syntax, so you can
# see what semTools is helping us do:
# cat(as.character(fit.partial4.lv@call$model))
We can now compare the fit of this model to the previous partial strong invariance model:
<- compareFit(fit.partial4, fit.partial4.lv)
comp_4e5 @nested comp_4e5
Scaled Chi-Squared Difference Test (method = "satorra.2000")
lavaan->unknown():
lavaan NOTE: The "Chisq" column contains standard test statistics, not the
robust test that should be reported per model. A robust difference test is
a function of two standard (not robust) statistics.
Df AIC BIC Chisq Chisq diff Df diff Pr(>Chisq)
fit.partial4 182 266.52
fit.partial4.lv 184 292.03 6.9603 2 0.0308 *
---
Signif. codes: 0 '***' 0.001 '**' 0.01 '*' 0.05 '.' 0.1 ' ' 1
What does the Chi-square Difference test tell us?
Based on these results, we will continue with a model with freely estimated factor variances (and covariances). This still allows us to compare the means across groups. We just need to use a slightly more complicated formula to compute Cohen’s d.
Factor Means
We can use the effect size formula from the book chapter to do so (which can be interpreted as a Cohen’s d). For each of the latent factors, we need the latent mean estimates and an estimate of the factor variances. We will find these estimates using the group_by_groups
function from the semhelpinghands
package and filtering the results:
group_by_groups(fit.partial4) %>%
filter((op == "~1" & (lhs == "stress" | lhs == "anxiety")) |
== "~~" & (lhs == "stress" | lhs == "anxiety")) op
lhs op rhs est_EnglishNative est_ELL
1 anxiety ~~ anxiety 1.000 0.795
2 anxiety ~~ stress 0.902 0.722
3 stress ~~ stress 1.000 0.778
4 anxiety ~1 0.000 -0.159
5 stress ~1 0.000 -0.332
Here are the computations of the effect size for the mean difference in Anxiety:
# mean in English Native Speaker: 0 (reference group)
# mean in ELL Speaker: -0.159
# variance: 1 (reference group), 0.795 (ELL)
0 - (-.159)) / sqrt((499*1 + 499*.795)/(499 + 499)) (
[1] 0.167834
This indicates that the mean of the Anxiety factor of the English Native Speaker sample is about 0.16 standard deviations higher than that of the ELL Speaker sample.
Here are the computations of the effect size for the mean difference in Stress:
# mean in English Native Speaker: 0 (reference group)
# mean in ELL Speaker: -0.332
# variance: 1 (reference group), 0.778 (ELL)
0 - (-.332)) / sqrt((499*1 + 499*.778)/(499 + 499)) (
[1] 0.3521172
This indicates that the mean of the Stress factor of the English Native Speaker sample is about 0.35 standard deviations higher than that of the ELL Speaker sample.
We can also test whether the mean differences are significant using a Wald test. This test is based on the difference in model chi-square that occurs when two parameters (in this case the latent factor means) are constrained to be equivalent.
# H0: Average anxiety is equivalent across
# English Native and ELL speakers
lavTestWald(fit.partial4, constraints = "alpha.1.g1 == alpha.1.g2")
$stat
[1] 5.286136
$df
[1] 1
$p.value
[1] 0.02149586
$se
[1] "robust.sem"
# H0: Average stress is equivalent across
# English Native and ELL speakers
lavTestWald(fit.partial4, constraints = "alpha.2.g1 == alpha.2.g2")
$stat
[1] 25.04656
$df
[1] 1
$p.value
[1] 5.596256e-07
$se
[1] "robust.sem"
Based on the results above, both tests were significant (p < .05), so the mean differences are significant between groups.
10.14 Conclusion
In this example, we found evidence of configural, threshold, and weak invariance of the Anxiety and Stress factors from the DASS-21 over English native and ELL speaker samples. There is somewhat less convincing evidence of strong invariance, in that the majority, but not all, latent response variable intercepts may be equivalent across groups. This means that the English native and ELL speaker samples did not have the same relative standing (i.e., expected response) on four items, given the same position on the underlying conceptual variable (i.e., Anxiety/Stress).
In a paper, you would report all of the above, adding all unstandardized estimates + SEs of the final retained model (in a table).
10.15 Summary
In this R lab, you were introduced to the steps involved in testing measurement invariance with ordinal indicators. In the next R Lab, you will learn all about a very different kind of model, a latent profile analysis, which is used to detect unobserved subgroups that share a similar response pattern to a set of continuous items.