library(rio)
library(ggplot2)
library(tidyr)
library(dplyr)
library(lavaan)
library(semTools)
library(semhelpinghands)
8 Structural after Measurement Models
You can download the R code used in this lab by right-clicking this link and selecting “Save Link As…” in the drop-down menu: sam.R
8.1 Loading R packages
Load the required packages for this lab into your R environment.
8.2 Loading Data
Load the data into your environment. For this lab we will use a dataset based on N = 441 children whose caregivers completed a survey about family environment and child behavior. You can download the data by right-clicking this link and selecting “Save Link As…” in the drop-down menu: data/projectkids.csv. Make sure to save it in the folder you are using for this class.
The full dataset and more information about this project can be found here: https://www.ldbase.org/datasets/72ab9852-8ebc-4ba0-bb1f-5f1c347e2572.
<- read.csv("data/projectkids.csv")
kids
set.seed(9710)
<- kids %>% drop_na() %>% sample_n(100) kids100
We also used this data for our lab on the Structural Equation Model. We previously used variables measuring three constructs: reading problems, attention, and homework problems. In addition to these constructs, the data set also includes three items about math problems (example: Is your child worse at math than at reading and spelling?; rated on a five-point scale from Not at all to A great deal), and six items about a chaotic home environment (example: You can’t hear yourself think in our home; rated on a five-point scale from Definitely Untrue to Definitely True).
8.3 Showing that a factor loading is equivalent to a regression in standard SEM
Let’s first check the idea that a factor loading from a factor to an indicator is mathematically equivalent to a regression path from a factor to that same indicator. We will use three items measuring Chaotic Home Environment (chaos
) and the homework problems composite score (hpc_mean
). We will fit three models: (0) a model that just includes the three chaos items as indicators of a chaos factor, (1) a model in which hpc_mean
is added as a fourth indicator of the factor, (2) a model in which hpc_mean
is regressed on the latent factor.
<- '
mod0 chaos =~ chaos2 + chaos3 + chaos6
'
<- '
mod1 chaos =~ chaos2 + chaos3 + chaos6 + hpc_mean'
<- '
mod2 chaos =~ chaos2 + chaos3 + chaos6
hpc_mean ~ chaos
'
<- sem(mod0, kids100)
fit0 <- sem(mod1, kids100)
fit1 <- sem(mod2, kids100) fit2
To test the equivalence of the last two models, we can compare their model fit:
<- compareFit(fit1, fit2) comp12
Warning: lavaan->unknown():
some models have the same degrees of freedom
@nested comp12
Chi-Squared Difference Test
Df AIC BIC Chisq Chisq diff RMSEA Df diff Pr(>Chisq)
fit1 2 1078.4 1099.2 8.7898
fit2 2 1078.4 1099.2 8.7898 1.7764e-13 0 0
We can also use the net()
function to get confirmation that the two models are equivalent:
net(fit1, fit2)
If cell [R, C] is TRUE, the model in row R is nested within column C.
If the models also have the same degrees of freedom, they are equivalent.
NA indicates the model in column C did not converge when fit to the
implied means and covariance matrix from the model in row R.
The hidden diagonal is TRUE because any model is equivalent to itself.
The upper triangle is hidden because for models with the same degrees
of freedom, cell [C, R] == cell [R, C]. For all models with different
degrees of freedom, the upper diagonal is all FALSE because models with
fewer degrees of freedom (i.e., more parameters) cannot be nested
within models with more degrees of freedom (i.e., fewer parameters).
fit1 fit2
fit1 (df = 2)
fit2 (df = 2) TRUE
We can also see that the estimates are equivalent between these two models even though the factor loading suggests a very different kind of association with the factor than the regression coefficient:
group_by_models(list("cfa" = fit1, "sem" = fit2)) %>%
filter_by(op = c("=~", "~"))
lhs op rhs est_cfa est_sem
1 chaos =~ chaos2 1.000 1.000
2 chaos =~ chaos3 0.859 0.859
3 chaos =~ chaos6 0.439 0.439
4 chaos =~ hpc_mean 0.174 NA
5 hpc_mean ~ chaos NA 0.174
8.4 Showing that changing the outcome variable changes the loadings
The example above shows that treating a variable as a latent factor indicator or an outcome variable does not affect the other indicators’ factor loadings. But what happens if we replace the outcome variable with a different outcome variable? Below, we replace hpc_mean
with the latent factor for Reading Problems and use that latent factor as a new outcome variable.
The measurement model of Chaos in the home should not change just because we have a different outcome variable, right? Let’s see what happens.
<- '
mod3 chaos =~ chaos2 + chaos3 + chaos6
readprob =~ cldq_1 + cldq_2 + cldq_4 + cldq_5 + cldq_6
readprob ~ chaos
'
<- sem(mod3, kids100)
fit3
group_by_models(list( "outcome_hpc" = fit2, "outcome_read" = fit3)) %>%
filter_by(op = c("=~","~"), lhs ="chaos")
lhs op rhs est_outcome_hpc est_outcome_read
1 chaos =~ chaos2 1.000 1.000
2 chaos =~ chaos3 0.859 0.934
3 chaos =~ chaos6 0.439 0.460
Question: what do you notice about the factor loadings across the two models?
We can also focus on the standardized estimates to see how the first indicator’s loading is affected:
group_by_models(list( "sem_hpc" = fit2, "sem_read" = fit3),
use_standardizedSolution=TRUE) %>%
filter_by(op = c("=~","~"), lhs ="chaos")
lhs op rhs est.std_sem_hpc est.std_sem_read
1 chaos =~ chaos2 0.808 0.776
2 chaos =~ chaos3 0.764 0.799
3 chaos =~ chaos6 0.458 0.461
8.5 Re-estimating the model using SAM
To use SAM instead of traditional SEM, we can use the sam()
function which comes with the lavaan
package. Below, we use the local SAM method (alternatives are global
and fsr
[factor scores]), which is the preferred (and default) method. You can also specify different estimators for the measurement and structural part of the model and set other options (check out ?sam
for more information).
<- sam(mod3, kids100, sam.method = "local")
fit3_sam
group_by_models(list("cfa" = fit0, "sem_read" = fit3,"sam_read"=fit3_sam),
use_standardizedSolution=TRUE) %>%
filter_by(op = c("=~","~"))
lhs op rhs est.std_cfa est.std_sem_read est.std_sam_read
1 chaos =~ chaos2 0.719 0.776 0.719
2 chaos =~ chaos3 0.865 0.799 0.865
3 chaos =~ chaos6 0.451 0.461 0.451
4 readprob =~ cldq_1 NA 0.712 0.712
5 readprob =~ cldq_2 NA 0.677 0.675
6 readprob =~ cldq_4 NA 0.880 0.880
7 readprob =~ cldq_5 NA 0.940 0.940
8 readprob =~ cldq_6 NA 0.960 0.961
9 readprob ~ chaos NA 0.148 0.097
Question: What difference do you notice between the SEM and SAM estimates?
8.6 SEM vs SAM with larger models
To illustrate the difference between SEM and SAM with a larger model, we first estimate separate CFAs for each latent factor that we will include in the larger model. You would not normally need to do this, but we will do it here to show how SEM and SAM results compare.
<- '
cfa_read readprob =~ cldq_1 + cldq_2 + cldq_4 + cldq_5 + cldq_6
cldq_5 ~~ cldq_6
'
<- '
cfa_attention attention =~ swan_1 + swan_2 + swan_3 + swan_4 + swan_5 + swan_6 +
swan_7 + swan_8 + swan_9
swan_1 ~~ swan_2
'
<- '
cfa_chaos chaos =~ chaos2 + chaos3 + chaos6
'
<- '
cfa_math mathprob =~ cldq_18 + cldq_19 + cldq_20
'
<- cfa(cfa_read, kids100)
fit_cfa_read <- cfa(cfa_math, kids100)
fit_cfa_math <- cfa(cfa_chaos, kids100)
fit_cfa_chaos <- cfa(cfa_attention, kids100) fit_cfa_attention
Next, we will set up the full model, including the measurement model and structural model. The model below includes several indirect effects with the goal of explaining variance in reading and math problems.
<- '
big_sem # Measurement Model
readprob =~ cldq_1 + cldq_2 + cldq_4 + cldq_5 + cldq_6
attention =~ swan_1 + swan_2 + swan_3 + swan_4 + swan_5 + swan_6 +
swan_7 + swan_8 + swan_9
chaos =~ chaos2 + chaos3 + chaos6
mathprob =~ cldq_18 + cldq_19 + cldq_20
hwp =~ 1* hpc_mean
hpc_mean ~~ 0.01572*hpc_mean
swan_1 ~~ swan_2
cldq_5 ~~ cldq_6
# Structural Model
attention ~ a1*chaos
hwp ~ a2*chaos + b1*attention
readprob ~ b2*hwp
mathprob ~ b3*hwp
readprob ~~ mathprob
# Indirect Effects
cha.att.hwp.read := a1*b1*b2
cha.att.hwp.math := a1*b1*b3
cha.hwp.read := a2*b2
cha.hwp.math := a2*b3
'
# Estimate SEM and SAM
<- sem(big_sem, kids100)
fit4_sem <- sam(big_sem, kids100) fit4_sam
First, we will compare factor loading estimates from each of the CFAs with those from the full traditional SEM. The code below separates results for each latent factor to limit the size of the output tables.
group_by_models(list( "cfa_read" = fit_cfa_read,
"sem" = fit4_sem),
use_standardizedSolution=TRUE) %>%
filter_by(op = c("=~"), lhs = "readprob")
lhs op rhs est.std_cfa_read est.std_sem
17 readprob =~ cldq_1 0.744 0.750
18 readprob =~ cldq_2 0.719 0.716
19 readprob =~ cldq_4 0.927 0.918
20 readprob =~ cldq_5 0.875 0.880
21 readprob =~ cldq_6 0.901 0.910
group_by_models(list("cfa_math" = fit_cfa_math,
"sem" = fit4_sem),
use_standardizedSolution=TRUE) %>%
filter_by(op = c("=~"), lhs = "mathprob")
lhs op rhs est.std_cfa_math est.std_sem
14 mathprob =~ cldq_18 0.744 0.736
15 mathprob =~ cldq_19 0.809 0.840
16 mathprob =~ cldq_20 0.937 0.914
group_by_models(list("cfa_attention" = fit_cfa_attention,
"sem" = fit4_sem),
use_standardizedSolution=TRUE) %>%
filter_by(op = c("=~"), lhs = "attention")
lhs op rhs est.std_cfa_attention est.std_sem
1 attention =~ swan_1 0.806 0.809
2 attention =~ swan_2 0.832 0.831
3 attention =~ swan_3 0.742 0.743
4 attention =~ swan_4 0.929 0.931
5 attention =~ swan_5 0.874 0.869
6 attention =~ swan_6 0.811 0.807
7 attention =~ swan_7 0.815 0.817
8 attention =~ swan_8 0.814 0.817
9 attention =~ swan_9 0.865 0.865
group_by_models(list("cfa_chaos" = fit_cfa_chaos,
"sem" = fit4_sem),
use_standardizedSolution=TRUE) %>%
filter_by(op = c("=~"), lhs = "chaos")
lhs op rhs est.std_cfa_chaos est.std_sem
10 chaos =~ chaos2 0.719 0.837
11 chaos =~ chaos3 0.865 0.737
12 chaos =~ chaos6 0.451 0.448
Next, we will compare the SEM and SAM regression estimates (of direct and indirect effects). In addition to parameter estimates, we will also look at p-values to see if any inferences about significance change
group_by_models(list("sem" = fit4_sem,
"sam" = fit4_sam),
col_names = c("est", "pvalue")) %>%
filter_by(op = c("~", ":="))
lhs op rhs est_sem est_sam pvalue_sem pvalue_sam
22 attention ~ chaos -0.350 -0.279 0.012 0.071
23 hwp ~ attention -0.371 -0.375 0.000 0.000
24 hwp ~ chaos 0.046 0.041 0.393 0.502
25 mathprob ~ hwp 0.996 0.986 0.000 0.000
26 readprob ~ hwp 0.861 0.845 0.000 0.000
56 cha.att.hwp.math := a1*b1*b3 0.129 0.103 0.021 0.085
57 cha.att.hwp.read := a1*b1*b2 0.112 0.089 0.020 0.083
58 cha.hwp.math := a2*b3 0.046 0.040 0.397 0.504
59 cha.hwp.read := a2*b2 0.040 0.034 0.396 0.504
Let’s also take a look at the full summary results for the SAM analysis to see what information is included:
summary(fit4_sam, remove.step1 = FALSE)
This is lavaan 0.6-19 -- using the SAM approach to SEM
SAM method LOCAL
Mapping matrix M method ML
Number of measurement blocks 5
Estimator measurement part ML
Estimator structural part ML
Number of observations 100
Summary Information Measurement + Structural:
Block Latent Nind Chisq Df
1 readprob 5 8.913 4
2 attention 9 62.025 26
3 chaos 3 0.000 0
4 mathprob 3 0.000 0
5 hwp 1 0.000 0
Model-based reliability latent variables:
readprob attention chaos mathprob hwp
0.93 0.959 0.811 0.912 0.965
Summary Information Structural part:
chisq df cfi rmsea srmr
4.742 4 0.996 0.043 0.027
Parameter Estimates:
Standard errors Twostep
Information Expected
Information saturated (h1) model Structured
Latent Variables:
Step Estimate Std.Err z-value P(>|z|)
readprob =~
cldq_1 1 1.000
cldq_2 1 0.798 0.110 7.265 0.000
cldq_4 1 1.491 0.157 9.490 0.000
cldq_5 1 1.407 0.157 8.947 0.000
cldq_6 1 1.454 0.157 9.252 0.000
attention =~
swan_1 1 1.000
swan_2 1 0.951 0.093 10.208 0.000
swan_3 1 0.816 0.097 8.381 0.000
swan_4 1 1.187 0.102 11.614 0.000
swan_5 1 1.228 0.116 10.559 0.000
swan_6 1 1.109 0.117 9.473 0.000
swan_7 1 1.027 0.108 9.540 0.000
swan_8 1 1.048 0.110 9.525 0.000
swan_9 1 1.021 0.098 10.402 0.000
chaos =~
chaos2 1 1.000
chaos3 1 1.092 0.273 4.003 0.000
chaos6 1 0.486 0.124 3.931 0.000
mathprob =~
cldq_18 1 1.000
cldq_19 1 0.920 0.113 8.116 0.000
cldq_20 1 1.062 0.128 8.323 0.000
hwp =~
hpc_mean 1 1.000
Regressions:
Step Estimate Std.Err z-value P(>|z|)
attention ~
chaos (a1) 2 -0.279 0.155 -1.807 0.071
hwp ~
chaos (a2) 2 0.041 0.061 0.672 0.502
attention (b1) 2 -0.375 0.050 -7.558 0.000
readprob ~
hwp (b2) 2 0.845 0.121 6.966 0.000
mathprob ~
hwp (b3) 2 0.986 0.158 6.224 0.000
Covariances:
Step Estimate Std.Err z-value P(>|z|)
.swan_1 ~~
.swan_2 1 0.050 0.080 0.632 0.527
.cldq_5 ~~
.cldq_6 1 0.202 0.070 2.892 0.004
.readprob ~~
.mathprob 2 -0.040 0.050 -0.788 0.431
Variances:
Step Estimate Std.Err z-value P(>|z|)
.hpc_mean 1 0.016
.cldq_1 1 0.508 0.080 6.382 0.000
.cldq_2 1 0.375 0.058 6.480 0.000
.cldq_4 1 0.230 0.069 3.312 0.001
.cldq_5 1 0.381 0.082 4.672 0.000
.cldq_6 1 0.311 0.075 4.151 0.000
.swan_1 1 0.821 0.127 6.453 0.000
.swan_2 1 0.613 0.097 6.334 0.000
.swan_3 1 0.830 0.124 6.700 0.000
.swan_4 1 0.340 0.066 5.132 0.000
.swan_5 1 0.713 0.117 6.083 0.000
.swan_6 1 0.974 0.150 6.485 0.000
.swan_7 1 0.811 0.125 6.467 0.000
.swan_8 1 0.851 0.132 6.471 0.000
.swan_9 1 0.535 0.087 6.162 0.000
.chaos2 1 0.792 0.225 3.526 0.000
.chaos3 1 0.341 0.237 1.435 0.151
.chaos6 1 0.782 0.120 6.529 0.000
.cldq_18 1 0.786 0.135 5.837 0.000
.cldq_19 1 0.434 0.089 4.880 0.000
.cldq_20 1 0.152 0.088 1.717 0.086
.readprob 2 0.320 0.078 4.086 0.000
.attention 2 1.460 0.304 4.810 0.000
chaos 2 0.847 0.281 3.012 0.003
.mathprob 2 0.549 0.136 4.042 0.000
.hwp 2 0.214 0.034 6.337 0.000
Defined Parameters:
Step Estimate Std.Err z-value P(>|z|)
cha.att.hwp.rd 2 0.089 0.051 1.736 0.083
cha.tt.hwp.mth 2 0.103 0.060 1.723 0.085
cha.hwp.read 2 0.034 0.051 0.669 0.504
cha.hwp.math 2 0.040 0.060 0.668 0.504
Question: What does the output tell you about the fit of the model? How is that different from traditional SEM?
8.7 Does this matter with larger samples?
Research has shown that SAM can be particularly helpful with small to medium samples. The full kids
sample consists of 441 observations, which can be considered a medium-sized sample for this relatively simple structural equation model. To see if differences between SEM and SAM are less pronounced with larger samples, we can re-estimate the model with the full sample:
<- sem(big_sem, kids)
fit5_sem <- sam(big_sem, kids)
fit5_sam
group_by_models(list("sem" = fit5_sem,
"sam" = fit5_sam),
col_names = c("est", "pvalue")) %>%
filter_by(op = c("~", ":="))
lhs op rhs est_sem est_sam pvalue_sem pvalue_sam
22 attention ~ chaos -0.288 -0.253 0.001 0.004
23 hwp ~ attention -0.361 -0.362 0.000 0.000
24 hwp ~ chaos 0.030 0.029 0.333 0.344
25 mathprob ~ hwp 0.906 0.928 0.000 0.000
26 readprob ~ hwp 0.688 0.665 0.000 0.000
56 cha.att.hwp.math := a1*b1*b3 0.094 0.085 0.002 0.006
57 cha.att.hwp.read := a1*b1*b2 0.072 0.061 0.002 0.006
58 cha.hwp.math := a2*b3 0.027 0.027 0.335 0.346
59 cha.hwp.read := a2*b2 0.020 0.020 0.336 0.346
Question: What differences do you notice? What does that tell you about the role of the sample size?
8.8 Summary
In this R lab, you were introduced to the structural after measurement approach to SEM. Below, you’ll find a Bonus section that demonstrates how to use SAM to estimate latent variable interaction effects. In the next R Lab, you will learn about latent growth models, which allow you to study trajectories of development across multiple time points.
8.9 Bonus: Interaction Effects with SAM
By separating the measurement and structural model, it also becomes much easier to estimate interaction effects among latent variables. This is still a new feature of SAM, so the official version of the lavaan
package does not let you compute standard errors/evaluate significance yet (you can download the development version of lavaan
from Github and use it to compute bootstrapped SEs). The example below shows you how to specify the interaction effect (identical to the mod.sem
package approach) and then estimate the model. You will notice that this approach is much faster than some of the other latent variable interaction effect approaches.
<- '
big_sem_int # Measurement Model
readprob =~ cldq_1 + cldq_2 + cldq_4 + cldq_5 + cldq_6
attention =~ swan_1 + swan_2 + swan_3 + swan_4 + swan_5 + swan_6 +
swan_7 + swan_8 + swan_9
mathprob =~ cldq_18 + cldq_19 + cldq_20
# Structural Model
mathprob ~ readprob + attention + readprob:attention
'
<- sam(big_sem_int, data = kids, se="none")
fit_int_sam
summary(fit_int_sam)
This is lavaan 0.6-19 -- using the SAM approach to SEM
SAM method LOCAL
Mapping matrix M method ML
Number of measurement blocks 3
Estimator measurement part ML
Estimator structural part ML
Used Total
Number of observations 373 441
Summary Information Measurement + Structural:
Block Latent Nind Chisq Df
1 readprob 5 40.605 5
2 attention 9 152.925 27
3 mathprob 3 0.000 0
Model-based reliability latent variables:
readprob attention mathprob
0.936 0.965 0.908
Summary Information Structural part:
chisq df cfi rmsea srmr
0 0 1 0 0
Parameter Estimates:
Regressions:
Estimate
mathprob ~
readprob 0.404
attention -0.230
readprob:ttntn 0.016
Covariances:
Estimate
readprob ~~
attention -0.450
readprob:ttntn 1.333
attention ~~
readprob:ttntn 1.492
Intercepts:
Estimate
readprob 2.016
attention 4.552
.mathprob 2.101
readprob:ttntn 8.728
Variances:
Estimate
readprob 0.589
attention 1.646
.mathprob 0.689
readprob:ttntn 8.701