6 Measurement Invariance

Note

You can download the R code used in this lab by right-clicking this link and selecting “Save Link As…” in the drop-down menu: measurementinvariance.R

6.1 Loading R packages

In this lab, we will use the lavaan and semTools packages, which we have already installed in earlier labs. So, we can simply get those packages from the R package library:

library(lavaan)
library(semTools)

6.2 Loading data into our environment

For this lab, we will use a dataset that contains 3811 item responses to 10 items about financial well-being. You can download the data by right-clicking this link and selecting “Save Link As…” in the drop-down menu: data/finance.csv. Make sure to save it in the folder you are using for this class.

finance <- read.csv("data/finance.csv")

We can look at the variables in the dataset using describe() from the psych package. In addition to the 10 items, the dataset also includes a variable denoting whether a participant worked in the public or private sector.

psych::describe(finance, skew = FALSE)

        vars    n mean   sd median min max range   se
item1      1 3811 2.89 1.23      3   1   5     4 0.02
item2      2 3811 3.09 1.10      3   1   5     4 0.02
item3      3 3811 2.69 1.19      3   1   5     4 0.02
item4      4 3811 3.15 1.04      3   1   5     4 0.02
item5      5 3811 2.87 1.24      3   1   5     4 0.02
item6      6 3811 3.25 1.13      3   1   5     4 0.02
item7      7 3811 2.48 1.19      2   1   5     4 0.02
item8      8 3811 3.27 1.25      3   1   5     4 0.02
item9      9 3811 2.22 1.14      2   1   5     4 0.02
item10    10 3811 2.83 1.13      3   1   5     4 0.02
sector*   11 3811 1.55 0.50      2   1   2     1 0.01

These items make up the Financial Well-Being scale, which was developed by the Consumer Financial Protection Bureau (CFPB):

I could handle a major unexpected expense (P)
I am securing my financial future (P)
Because of my money situation, I will never have the things I want in life (N)
I can enjoy life because of the way I’m managing my money (P)
I am just getting by financially (N)
I am concerned that the money I have or will save won’t last (N)
Giving a gift would put a strain on my finances for the month (N)
I have money left over at the end of the month (P)
I am behind with my finances (N)
My finances control my life (N)

The items measure positive (P) and negative (N) financial well-being.

6.3 The Theoretical Measurement Model

Next, we will set up the theoretical measurement model representing the hypothesized internal structure of this measure. In this case, the construct of financial well-being is represented by two correlated sub-constructs: financial stability and financial instability.

# Two-factor CFA model
financemodel <- "positive =~ item1 + item2 + item4 + item8
                 negative =~ item3 + item5 + item6 + item7 + item9 + item10"

6.4 Step 1: Configural Invariance

To test measurement invariance (MI), we will use the lavaan package. First, we will assess configural invariance, which we can do using the code below. The main difference from previous CFAs is that we now use the group = "sector" argument to denote which variable in our data denotes what group a participant is a member of. In addition, we will tell lavaan to scale the latent factors so they have a mean of 0 and a standard deviation of 1, using the std.lv = TRUE argument (note that there are other methods for scaling the latent variable, but going into those technical details is beyond the scope of this class):

# Configural model
config <- cfa(model = financemodel, data = finance, 
              std.lv = TRUE,
              group = "sector")

To evaluate whether configural invariance holds, we can look at the summary output. In this case, I’m mostly interested in looking at the fit of the model to the data, so we will use estimates = F to omit the parameter estimates from the summary output.

summary(config, fit.measures = T, estimates = F)

lavaan 0.6.17 ended normally after 27 iterations

  Estimator                                         ML
  Optimization method                           NLMINB
  Number of model parameters                        62

  Number of observations per group:                   
    public                                        2110
    private                                       1701

Model Test User Model:
                                                      
  Test statistic                               875.170
  Degrees of freedom                                68
  P-value (Chi-square)                           0.000
  Test statistic for each group:
    public                                     465.024
    private                                    410.146

Model Test Baseline Model:

  Test statistic                             20856.369
  Degrees of freedom                                90
  P-value                                        0.000

User Model versus Baseline Model:

  Comparative Fit Index (CFI)                    0.961
  Tucker-Lewis Index (TLI)                       0.949

Loglikelihood and Information Criteria:

  Loglikelihood user model (H0)             -49649.240
  Loglikelihood unrestricted model (H1)     -49211.655
                                                      
  Akaike (AIC)                               99422.480
  Bayesian (BIC)                             99809.711
  Sample-size adjusted Bayesian (SABIC)      99612.704

Root Mean Square Error of Approximation:

  RMSEA                                          0.079
  90 Percent confidence interval - lower         0.074
  90 Percent confidence interval - upper         0.084
  P-value H_0: RMSEA <= 0.050                    0.000
  P-value H_0: RMSEA >= 0.080                    0.359

Standardized Root Mean Square Residual:

  SRMR                                           0.031

Overall, the fit of the model to the data, allowing all parameters to be freely estimated across groups, is decent: CFI > .95, TLI = .95, RMSEA = .079, 95% CI [.074, .084] SRMR < .08. Not ideal in terms of RMSEA, but otherwise okay. More importantly, this output shows us what part of the overall Model Chi-square is stemming from each of the two groups (under Test statistic for each group). If those two numbers are relatively equal, then the model fits about equally well to each group’s data. If you notice that the Chi-square contribution is much larger for one group than the other, it is an indication that you may not be able to conclude that there is configural invariance. In this case, both groups contribute about equally to the total Chi-square, indicating similar model-data fit across groups.

If configural invariance is not supported

If configural invariance is not supported, then you can start your investigation by looking at the estimates across each group to see if there are noticeable issues (e.g., a lot of low factor loadings in one group). You can follow this investigation up by running separate EFAs for each group, to examine what kind of factor structure emerges for each group. The code below shows you how to start this investigation for this example dataset, but I do not include the output to reduce the length of this (already lengthy) lab:

# Examine parameter estimates
summary(config)

# Split data is two
public <- subset(finance, sector == "public")
private <- subset(finance, sector == "private")

# Run parallel analysis for each group 
# (can be followed up by full EFA examination)
library(psych)
fa.parallel(public[,1:10], fa = "fa")
fa.parallel(private[,1:10], fa = "fa")

6.5 Step 2: Metric Invariance

Next, we will estimate the metric invariance model. To do so, we again use the cfa() function and specify our grouping variable and latent factor scale. In addition, we will add group.equal = "loadings":

# Metric model
metric <- cfa(model = financemodel, data = finance, 
              group = "sector", 
              std.lv = TRUE,
              group.equal = "loadings")

To see if the metric model fits the data significantly worse than the configural model, we will use the compareFit() function from the semTools package:

summary(compareFit(config, metric))

################### Nested Model Comparison #########################

Chi-Squared Difference Test

       Df   AIC   BIC  Chisq Chisq diff    RMSEA Df diff Pr(>Chisq)
config 68 99422 99810 875.17                                       
metric 76 99417 99754 885.35     10.179 0.011957       8     0.2527

####################### Model Fit Indices ###########################
          chisq df pvalue rmsea   cfi   tli  srmr        aic        bic
config 875.170† 68   .000 .079  .961† .949  .031† 99422.480  99809.711 
metric 885.349  76   .000 .075† .961  .954† .033  99416.660† 99753.925†

################## Differences in Fit Indices #######################
                df  rmsea cfi   tli  srmr    aic     bic
metric - config  8 -0.004   0 0.005 0.002 -5.821 -55.786

The null hypothesis that we’re testing is that the fit of the metric and configural model is equivalent (i.e., adding equality constraints to the loadings does not make the model fit worse). Thus, if the p-value associated with the Chi-square difference test is > .05, we can retain that null hypothesis and conclude that metric invariance holds for these data. If the p-value associated with the Chi-square difference test is < .05, then we need to reject the null hypothesis and conclude that the metric invariance model fit the data significantly worse, and so metric invariance does not hold.

Note: If any loadings were found to be non-invariant (i.e., there is partial metric invariance), then the intercepts of those items also need to be estimated freely across groups in the next step. In other words, you start your investigation with a model that is already partially invariant at the scalar level (see below how to run partial invariance models).

6.6 Step 3: Scalar Invariance

Next, we will evaluate if scalar invariance holds for our data. To do so, we simply add the intercepts to the group.equal = c("loadings","intercepts") argument:

# Scalar model
scalar <- cfa(model = financemodel, data = finance, 
              group = "sector", 
              std.lv = TRUE,
              group.equal = c("loadings","intercepts"))

Now, we compare the scalar model’s fit to the fit of the metric model, to see if adding the equality constraints on the intercepts results in significantly worse fit:

summary(compareFit(metric, scalar))

################### Nested Model Comparison #########################

Chi-Squared Difference Test

       Df   AIC   BIC  Chisq Chisq diff    RMSEA Df diff Pr(>Chisq)    
metric 76 99417 99754 885.35                                           
scalar 84 99453 99740 937.72     52.371 0.053951       8  1.427e-08 ***
---
Signif. codes:  0 '***' 0.001 '**' 0.01 '*' 0.05 '.' 0.1 ' ' 1

####################### Model Fit Indices ###########################
          chisq df pvalue rmsea   cfi   tli  srmr        aic        bic
metric 885.349† 76   .000 .075  .961† .954  .033† 99416.660† 99753.925 
scalar 937.720  84   .000 .073† .959  .956† .034  99453.030  99740.330†

################## Differences in Fit Indices #######################
                df  rmsea    cfi   tli  srmr    aic     bic
scalar - metric  8 -0.002 -0.002 0.002 0.001 36.371 -13.595

From the Chi-square difference test above, we can see that the scalar model fit the data significantly worse than the metric model. This means that, at least for some items, the expected response for those with an average (i.e., 0) score on the latent factor differs across groups.

Step 3B: Partial Scalar Invariance

To easily get an overview of what equality constraints should be released to result in the largest improvement in model fit, we can use the lavTestScore function from the lavaan package:

# Adjust the model for partial invariance testing
lavTestScore(scalar)

$test

total score test:

   test     X2 df p.value
1 score 62.622 20       0

$uni

univariate score tests:

     lhs op   rhs     X2 df p.value
1   .p1. == .p36.  0.125  1   0.724
2   .p2. == .p37.  0.594  1   0.441
3   .p3. == .p38.  0.323  1   0.570
4   .p4. == .p39.  0.327  1   0.567
5   .p5. == .p40.  1.733  1   0.188
6   .p6. == .p41.  1.909  1   0.167
7   .p7. == .p42.  1.219  1   0.270
8   .p8. == .p43.  2.986  1   0.084
9   .p9. == .p44.  2.129  1   0.144
10 .p10. == .p45.  0.011  1   0.916
11 .p24. == .p59. 15.296  1   0.000
12 .p25. == .p60.  0.215  1   0.643
13 .p26. == .p61. 16.963  1   0.000
14 .p27. == .p62.  0.257  1   0.612
15 .p28. == .p63.  9.145  1   0.002
16 .p29. == .p64.  0.445  1   0.505
17 .p30. == .p65.  0.101  1   0.751
18 .p31. == .p66. 21.472  1   0.000
19 .p32. == .p67.  1.433  1   0.231
20 .p33. == .p68.  4.348  1   0.037

To figure out what equality constraints map onto which item’s intercept, we can look at the parameter table of the scalar model. In the code below, I filter the output to only include intercept parameters (op = "~1"), only show parameters from group 2 (group == 2) and then to only include certain columns and rows of that output. This was mostly done to keep this PDF from becoming too large. You don’t need to do any of this filtering yourself.

# To view the entire parameter table, simply use this code 
# (remove the hashtag in front of the next line):
# parTable(scalar)

# To filter the output (this literal code will only work for 
# this example):
subset(parTable(scalar), op == "~1" & group == 2)[,c(1:4, 11:15)]

   id      lhs op rhs label plabel start    est    se
59 59    item1 ~1     .p24.  .p59. 2.733  2.994 0.025
60 60    item2 ~1     .p25.  .p60. 2.978  3.173 0.022
61 61    item4 ~1     .p26.  .p61. 3.093  3.232 0.021
62 62    item8 ~1     .p27.  .p62. 3.162  3.370 0.025
63 63    item3 ~1     .p28.  .p63. 2.708  2.638 0.024
64 64    item5 ~1     .p29.  .p64. 2.931  2.820 0.024
65 65    item6 ~1     .p30.  .p65. 3.301  3.203 0.022
66 66    item7 ~1     .p31.  .p66. 2.600  2.432 0.024
67 67    item9 ~1     .p32.  .p67. 2.256  2.174 0.022
68 68   item10 ~1     .p33.  .p68. 2.855  2.785 0.022
69 69 positive ~1            .p69. 0.000 -0.217 0.035
70 70 negative ~1            .p70. 0.000  0.121 0.035

In the table above, we can see the the intercept of Item 7 is associated with the largest potential improvement in model fit. So, we estimate a partial scalar invariance model. To release the equality constraint for the intercept of Item 7, we add the group.partial = c("item7 ~ 1") argument. Once the model is estimated, we will compare its fit to the metric model, to see if we need to release additional intercept parameters:

scalar2 <- cfa(model = financemodel, data = finance, 
               group = "sector",
               std.lv = TRUE,
               group.equal = c("loadings","intercepts"), 
               group.partial = c("item7 ~ 1")) 
# for a loading, group.partial would look like: "negative =~ item7"

summary(compareFit(metric, scalar2))

################### Nested Model Comparison #########################

Chi-Squared Difference Test

        Df   AIC   BIC  Chisq Chisq diff    RMSEA Df diff Pr(>Chisq)    
metric  76 99417 99754 885.35                                           
scalar2 83 99434 99727 916.21     30.863 0.042297       7   6.59e-05 ***
---
Signif. codes:  0 '***' 0.001 '**' 0.01 '*' 0.05 '.' 0.1 ' ' 1

####################### Model Fit Indices ###########################
           chisq df pvalue rmsea   cfi   tli  srmr        aic        bic
metric  885.349† 76   .000 .075  .961† .954  .033† 99416.660† 99753.925 
scalar2 916.212  83   .000 .073† .960  .956† .034  99433.523  99727.068†

################## Differences in Fit Indices #######################
                 df  rmsea    cfi   tli  srmr    aic     bic
scalar2 - metric  7 -0.002 -0.001 0.003 0.001 16.863 -26.857

The fit of the first partial scalar model is still worse than the metric model. So, we need to release additional equality constraints. After examining the modification indices again, we release the equality constraint of Item 4’s intercept, estimate the model again, and compare this second partial model to the metric model:

# Adjust the model for partial invariance testing
lavTestScore(scalar2)

$test

total score test:

   test     X2 df p.value
1 score 41.208 19   0.002

$uni

univariate score tests:

     lhs op   rhs     X2 df p.value
1   .p1. == .p36.  0.125  1   0.723
2   .p2. == .p37.  0.594  1   0.441
3   .p3. == .p38.  0.323  1   0.570
4   .p4. == .p39.  0.327  1   0.567
5   .p5. == .p40.  2.131  1   0.144
6   .p6. == .p41.  1.823  1   0.177
7   .p7. == .p42.  1.128  1   0.288
8   .p8. == .p43.  1.924  1   0.165
9   .p9. == .p44.  1.885  1   0.170
10 .p10. == .p45.  0.045  1   0.832
11 .p24. == .p59. 15.297  1   0.000
12 .p25. == .p60.  0.215  1   0.643
13 .p26. == .p61. 16.964  1   0.000
14 .p27. == .p62.  0.257  1   0.612
15 .p28. == .p63.  2.901  1   0.089
16 .p29. == .p64.  2.900  1   0.089
17 .p30. == .p65.  1.991  1   0.158
18 .p32. == .p67.  0.000  1   0.989
19 .p33. == .p68.  0.840  1   0.359

scalar3 <- cfa(model = financemodel, data = finance, 
               group = "sector", 
               std.lv = TRUE,
               group.equal = c("loadings","intercepts"), 
               group.partial = c("item7 ~ 1", "item4 ~ 1"))

summary(compareFit(metric, scalar3))

################### Nested Model Comparison #########################

Chi-Squared Difference Test

        Df   AIC   BIC  Chisq Chisq diff   RMSEA Df diff Pr(>Chisq)  
metric  76 99417 99754 885.35                                        
scalar3 82 99418 99718 899.16     13.812 0.02614       6     0.0318 *
---
Signif. codes:  0 '***' 0.001 '**' 0.01 '*' 0.05 '.' 0.1 ' ' 1

####################### Model Fit Indices ###########################
           chisq df pvalue rmsea   cfi   tli  srmr        aic        bic
metric  885.349† 76   .000 .075  .961† .954  .033† 99416.660† 99753.925 
scalar3 899.161  82   .000 .072† .961  .957† .033  99418.472  99718.263†

################## Differences in Fit Indices #######################
                 df  rmsea cfi   tli srmr   aic     bic
scalar3 - metric  6 -0.002   0 0.003    0 1.812 -35.661

The fit of this partial model is still worse than the metric model. So, we need to release additional equality constraints. After examining the modification indices again, we release the equality constraint of Item 1’s intercept, estimate the model again, and compare this second partial model to the metric model:

# Adjust the model for partial invariance testing
lavTestScore(scalar3)

$test

total score test:

   test     X2 df p.value
1 score 24.243 18   0.147

$uni

univariate score tests:

     lhs op   rhs    X2 df p.value
1   .p1. == .p36. 0.276  1   0.599
2   .p2. == .p37. 0.946  1   0.331
3   .p3. == .p38. 1.352  1   0.245
4   .p4. == .p39. 0.127  1   0.722
5   .p5. == .p40. 2.132  1   0.144
6   .p6. == .p41. 1.826  1   0.177
7   .p7. == .p42. 1.128  1   0.288
8   .p8. == .p43. 1.919  1   0.166
9   .p9. == .p44. 1.883  1   0.170
10 .p10. == .p45. 0.046  1   0.830
11 .p24. == .p59. 6.551  1   0.010
12 .p25. == .p60. 0.744  1   0.388
13 .p27. == .p62. 3.488  1   0.062
14 .p28. == .p63. 2.901  1   0.089
15 .p29. == .p64. 2.900  1   0.089
16 .p30. == .p65. 1.991  1   0.158
17 .p32. == .p67. 0.000  1   0.989
18 .p33. == .p68. 0.840  1   0.359

scalar4 <- cfa(model = financemodel, data = finance, 
               group = "sector", 
               std.lv = TRUE,
               group.equal = c("loadings","intercepts"), 
               group.partial = c("item7 ~ 1", "item4 ~ 1", 
                                 "item1 ~ 1"))

summary(compareFit(metric, scalar4))

################### Nested Model Comparison #########################

Chi-Squared Difference Test

        Df   AIC   BIC  Chisq Chisq diff    RMSEA Df diff Pr(>Chisq)
metric  76 99417 99754 885.35                                       
scalar4 81 99414 99720 892.61     7.2647 0.015418       5     0.2017

####################### Model Fit Indices ###########################
           chisq df pvalue rmsea   cfi   tli  srmr        aic        bic
metric  885.349† 76   .000 .075  .961† .954  .033† 99416.660  99753.925 
scalar4 892.614  81   .000 .073† .961  .957† .033  99413.925† 99719.961†

################## Differences in Fit Indices #######################
                 df  rmsea cfi   tli srmr    aic     bic
scalar4 - metric  5 -0.002   0 0.003    0 -2.735 -33.964

This time, the fit of the partial scalar model is not worse than that of the metric model (p > .05), so we can conclude that partial scalar invariance holds for these data.

6.7 Step 4 Strict Invariance (Optional)

Finally, we can examine strict invariance by constraining the residuals to be equal across groups. To do so, we simply add the residuals to the group.equal = c("loadings","intercepts","residuals) argument. Note that we keep the partial intercepts from the previous step and need to add partial residuals for those items.

#Strict model
strict <- cfa(model = financemodel, data = finance, 
              group = "sector", 
              std.lv = TRUE,
              group.equal = c("loadings","intercepts","residuals"), 
              group.partial = c("item7 ~ 1", "item4 ~ 1", 
                                "item1 ~ 1",
                                "item7 ~~ item7", 
                                "item4 ~~ item4",
                                "item1 ~~ item1"))

Similar to previous steps, we can compare the fit of this model to the previous (partial scalar) model:

summary(compareFit(scalar4, strict))

################### Nested Model Comparison #########################

Chi-Squared Difference Test

        Df   AIC   BIC  Chisq Chisq diff    RMSEA Df diff Pr(>Chisq)   
scalar4 81 99414 99720 892.61                                          
strict  88 99423 99686 915.96     23.351 0.035012       7    0.00148 **
---
Signif. codes:  0 '***' 0.001 '**' 0.01 '*' 0.05 '.' 0.1 ' ' 1

####################### Model Fit Indices ###########################
           chisq df pvalue rmsea   cfi   tli  srmr        aic        bic
scalar4 892.614† 81   .000 .073  .961† .957  .033† 99413.925† 99719.961 
strict  915.965  88   .000 .070† .960  .959† .033  99423.276  99685.593†

################## Differences in Fit Indices #######################
                 df  rmsea    cfi   tli srmr   aic     bic
strict - scalar4  7 -0.002 -0.001 0.003    0 9.351 -34.368

6.8 Step 5: Interpreting the Mean Difference between Public and Private Sector Groups (Optional)

As we’ve been able to establish partial scalar invariance, we can compare the latent factor means across groups. To do so, we use the final partial scalar model:

subset(parameterEstimates(scalar4), 
       (op == "~1" & (lhs == "positive" | lhs == "negative")))

        lhs op rhs block group label    est    se      z pvalue ci.lower
34 positive ~1         1     1        0.000 0.000     NA     NA    0.000
35 negative ~1         1     1        0.000 0.000     NA     NA    0.000
69 positive ~1         2     2       -0.216 0.037 -5.823  0.000   -0.289
70 negative ~1         2     2        0.086 0.035  2.425  0.015    0.016
   ci.upper
34    0.000
35    0.000
69   -0.143
70    0.155

To help with interpretation, the means of the latent factors in the first group (Public) are fixed to 0 and their variances are fixed to one. This constraint means that the freely estimated latent means in the second group (Private) can be interpreted as “relative to” the first group. In the output, we can see that the positive financial well-being mean is negative (and significant) whereas the negative financial well-being mean is positive (and significant). However, the factors’ variances in this group are not equal to 1 (they are slightly smaller) and so these means are not measured on the same scale as the means of the Public sector group. There are several ways to make the means comparable.

Step 5: Method 1 for making mean differences comparable

The first is to evaluate whether the latent factor variances can be constrained to equivalence without resulting in worse model fit. This would place both groups’ factors on the standardized scale (variance or sd = 1) and mean differences can be interpreted in terms of standard deviation units. To test this model, we can use the following code, adding "lv.variances" to the group.equal argument and then testing whether the resulting model fit significantly worse than the partial scalar model. Note that we do not have to meet the strict invariance level to test this latent variance level of invariance.

lvvar <- cfa(model = financemodel, data = finance, 
               group = "sector", 
               std.lv = TRUE,
               group.equal = c("loadings","intercepts", 
                               "lv.variances"), 
               group.partial = c("item7 ~ 1", "item4 ~ 1", 
                                 "item1 ~ 1"))

summary(compareFit(scalar4, lvvar))

################### Nested Model Comparison #########################

Chi-Squared Difference Test

        Df   AIC   BIC  Chisq Chisq diff RMSEA Df diff Pr(>Chisq)
scalar4 81 99414 99720 892.61                                    
lvvar   83 99410 99704 893.05    0.43956     0       2     0.8027

####################### Model Fit Indices ###########################
           chisq df pvalue rmsea   cfi   tli  srmr        aic        bic
scalar4 892.614† 81   .000 .073  .961  .957  .033† 99413.925  99719.961 
lvvar   893.053  83   .000 .072† .961† .958† .034  99410.364† 99703.910†

################## Differences in Fit Indices #######################
                df  rmsea cfi   tli  srmr   aic     bic
lvvar - scalar4  2 -0.001   0 0.001 0.001 -3.56 -16.052

Constraining the latent factor variances to equivalence across groups does not result in significantly worse model fit (p = .803), so we can use the estimates from the lvvar output to interpret mean differences between the Public and Private sector groups.

subset(parameterEstimates(lvvar), 
       (op == "~1" & (lhs == "positive" | lhs == "negative")))

        lhs op rhs block group label    est    se      z pvalue ci.lower
34 positive ~1         1     1        0.000 0.000     NA     NA    0.000
35 negative ~1         1     1        0.000 0.000     NA     NA    0.000
69 positive ~1         2     2       -0.217 0.037 -5.829  0.000   -0.290
70 negative ~1         2     2        0.087 0.036  2.422  0.015    0.016
   ci.upper
34    0.000
35    0.000
69   -0.144
70    0.157

Based on the results above, we can conclude that positive financial well-being is .22 standard deviations (SE = .04) lower for those working in the Private sector compared to those working in the Public section (p < .001). In addition, negative financial well-being is .09 standard deviations (SE = .04) higher for those working in the Private sector compared to those working in the Public section (p = .015).

Step 5: Method 2 for making mean differences comparable

If the previous method would have shown that factor variances are not comparable, then we could have used the formula for standardized mean differences (Cohen’s D) to compute mean differences.

\[Cohen's D = \frac{(m_1 - m_2)}{\sqrt\sigma^2}\]

To do so, we need the latent factor means and variances from the partial scalar model. We already know that the means and variances in the first group are 0 and 1 respectively, so we only need to know the means and variances for group 2 (i.e., Private sector):

subset(parameterEstimates(scalar4), 
       (group == 2 & 
          (op == "~1" | op == "~~") & (lhs == "positive" | lhs == "negative")))

        lhs op      rhs block group label    est    se       z pvalue ci.lower
56 positive ~~ positive     2     2        0.973 0.051  19.042  0.000    0.873
57 negative ~~ negative     2     2        0.967 0.051  18.882  0.000    0.867
58 positive ~~ negative     2     2       -0.818 0.042 -19.583  0.000   -0.899
69 positive ~1              2     2       -0.216 0.037  -5.823  0.000   -0.289
70 negative ~1              2     2        0.086 0.035   2.425  0.015    0.016
   ci.upper
56    1.074
57    1.068
58   -0.736
69   -0.143
70    0.155

When variances are not equal across groups, we can compute separate standardized mean differences using each group’s variance in the denominator, which will give us a range of plausible mean difference effect sizes (not to be confused with a confidence interval!). Here, we first compute the Cohen’s D using the first group’s variance (= 1):

# positive
((-.216) - 0) / sqrt(1)

[1] -0.216

# negative
((.086) - 0) / sqrt(1)

[1] 0.086

Next, we compute the Cohen’s D using the second group’s variance estimates:

# positive
((-.216) - 0) / sqrt(.973)

[1] -0.2189764

# negative
((.086) - 0) / sqrt(.967)

[1] 0.08745511

So, depending on the standardizer (which variance) used, the Cohen’s D for positive financial well-being is approximately -.217 to -.219, and the Cohen’s D for negative financial well-being is approximately .086 to .087. Note that the ranges here are really narrow because the variances are so similar, the range would be larger if the difference in the variances was larger.

6.9 Summary

In this R lab, you were introduced to the steps involved in measurement invariance testing, an important quantitative method that can help us collect evidence regarding the fairness of our measurement scale. You also learned how to compare means of latent variables, using several different approaches. This is the final R lab of this class!