8  Structural after Measurement Models

Note

You can download the R code used in this lab by right-clicking this link and selecting “Save Link As…” in the drop-down menu: sam.R

8.1 Loading R packages

Load the required packages for this lab into your R environment.

library(rio)
library(ggplot2)
library(tidyr)
library(dplyr)
library(lavaan)
library(semTools)
library(semhelpinghands)

8.2 Loading Data

Load the data into your environment. For this lab we will use a dataset based on N = 441 children whose caregivers completed a survey about family environment and child behavior. You can download the data by right-clicking this link and selecting “Save Link As…” in the drop-down menu: data/projectkids.csv. Make sure to save it in the folder you are using for this class.

The full dataset and more information about this project can be found here: https://www.ldbase.org/datasets/72ab9852-8ebc-4ba0-bb1f-5f1c347e2572.

kids <- read.csv("data/projectkids.csv")

set.seed(9710)
kids100 <- kids %>% drop_na() %>% sample_n(100)

We also used this data for our lab on the Structural Equation Model. We previously used variables measuring three constructs: reading problems, attention, and homework problems. In addition to these constructs, the data set also includes three items about math problems (example: Is your child worse at math than at reading and spelling?; rated on a five-point scale from Not at all to A great deal), and six items about a chaotic home environment (example: You can’t hear yourself think in our home; rated on a five-point scale from Definitely Untrue to Definitely True).

8.3 Showing that a factor loading is equivalent to a regression in standard SEM

Let’s first check the idea that a factor loading from a factor to an indicator is mathematically equivalent to a regression path from a factor to that same indicator. We will use three items measuring Chaotic Home Environment (chaos) and the homework problems composite score (hpc_mean). We will fit three models: (0) a model that just includes the three chaos items as indicators of a chaos factor, (1) a model in which hpc_mean is added as a fourth indicator of the factor, (2) a model in which hpc_mean is regressed on the latent factor.

mod0 <- '
chaos =~ chaos2 + chaos3 + chaos6
'

mod1 <- '
chaos =~ chaos2 + chaos3 + chaos6 + hpc_mean'

mod2 <- '
chaos =~ chaos2 + chaos3 + chaos6

hpc_mean ~ chaos
'

fit0 <- sem(mod0, kids100)
fit1 <- sem(mod1, kids100)
fit2 <- sem(mod2, kids100)

To test the equivalence of the last two models, we can compare their model fit:

comp12 <- compareFit(fit1, fit2)
Warning: lavaan->unknown():  
   some models have the same degrees of freedom
comp12@nested

Chi-Squared Difference Test

     Df    AIC    BIC  Chisq Chisq diff RMSEA Df diff Pr(>Chisq)
fit1  2 1078.4 1099.2 8.7898                                    
fit2  2 1078.4 1099.2 8.7898 1.7764e-13     0       0           

We can also use the net() function to get confirmation that the two models are equivalent:

net(fit1, fit2)

        If cell [R, C] is TRUE, the model in row R is nested within column C.

        If the models also have the same degrees of freedom, they are equivalent.

        NA indicates the model in column C did not converge when fit to the
        implied means and covariance matrix from the model in row R.

        The hidden diagonal is TRUE because any model is equivalent to itself.
        The upper triangle is hidden because for models with the same degrees
        of freedom, cell [C, R] == cell [R, C].  For all models with different
        degrees of freedom, the upper diagonal is all FALSE because models with
        fewer degrees of freedom (i.e., more parameters) cannot be nested
        within models with more degrees of freedom (i.e., fewer parameters).
        
              fit1 fit2
fit1 (df = 2)          
fit2 (df = 2) TRUE     

We can also see that the estimates are equivalent between these two models even though the factor loading suggests a very different kind of association with the factor than the regression coefficient:

group_by_models(list("cfa" = fit1, "sem" = fit2)) %>%
  filter_by(op = c("=~", "~"))
       lhs op      rhs est_cfa est_sem
1    chaos =~   chaos2   1.000   1.000
2    chaos =~   chaos3   0.859   0.859
3    chaos =~   chaos6   0.439   0.439
4    chaos =~ hpc_mean   0.174      NA
5 hpc_mean  ~    chaos      NA   0.174

8.4 Showing that changing the outcome variable changes the loadings

The example above shows that treating a variable as a latent factor indicator or an outcome variable does not affect the other indicators’ factor loadings. But what happens if we replace the outcome variable with a different outcome variable? Below, we replace hpc_mean with the latent factor for Reading Problems and use that latent factor as a new outcome variable.

The measurement model of Chaos in the home should not change just because we have a different outcome variable, right? Let’s see what happens.

mod3 <- '
chaos =~ chaos2 + chaos3 + chaos6
readprob =~ cldq_1 + cldq_2 + cldq_4 + cldq_5 + cldq_6

readprob ~ chaos
'

fit3 <- sem(mod3, kids100)

group_by_models(list( "outcome_hpc" = fit2, "outcome_read" = fit3)) %>% 
  filter_by(op = c("=~","~"), lhs ="chaos")
    lhs op    rhs est_outcome_hpc est_outcome_read
1 chaos =~ chaos2           1.000            1.000
2 chaos =~ chaos3           0.859            0.934
3 chaos =~ chaos6           0.439            0.460

Question: what do you notice about the factor loadings across the two models?

We can also focus on the standardized estimates to see how the first indicator’s loading is affected:

group_by_models(list( "sem_hpc" = fit2, "sem_read" = fit3),
                use_standardizedSolution=TRUE) %>% 
  filter_by(op = c("=~","~"), lhs ="chaos")
    lhs op    rhs est.std_sem_hpc est.std_sem_read
1 chaos =~ chaos2           0.808            0.776
2 chaos =~ chaos3           0.764            0.799
3 chaos =~ chaos6           0.458            0.461

8.5 Re-estimating the model using SAM

To use SAM instead of traditional SEM, we can use the sam() function which comes with the lavaan package. Below, we use the local SAM method (alternatives are global and fsr [factor scores]), which is the preferred (and default) method. You can also specify different estimators for the measurement and structural part of the model and set other options (check out ?sam for more information).

fit3_sam <- sam(mod3, kids100, sam.method = "local")

group_by_models(list("cfa" = fit0, "sem_read" = fit3,"sam_read"=fit3_sam),
                                 use_standardizedSolution=TRUE) %>% 
  filter_by(op = c("=~","~"))
       lhs op    rhs est.std_cfa est.std_sem_read est.std_sam_read
1    chaos =~ chaos2       0.719            0.776            0.719
2    chaos =~ chaos3       0.865            0.799            0.865
3    chaos =~ chaos6       0.451            0.461            0.451
4 readprob =~ cldq_1          NA            0.712            0.712
5 readprob =~ cldq_2          NA            0.677            0.675
6 readprob =~ cldq_4          NA            0.880            0.880
7 readprob =~ cldq_5          NA            0.940            0.940
8 readprob =~ cldq_6          NA            0.960            0.961
9 readprob  ~  chaos          NA            0.148            0.097

Question: What difference do you notice between the SEM and SAM estimates?

8.6 SEM vs SAM with larger models

To illustrate the difference between SEM and SAM with a larger model, we first estimate separate CFAs for each latent factor that we will include in the larger model. You would not normally need to do this, but we will do it here to show how SEM and SAM results compare.

cfa_read <- '
readprob =~ cldq_1 + cldq_2 + cldq_4 + cldq_5 + cldq_6
cldq_5 ~~ cldq_6
'

cfa_attention <- '
attention =~ swan_1 + swan_2 + swan_3 + swan_4 + swan_5 + swan_6 + 
             swan_7 + swan_8 + swan_9
swan_1 ~~ swan_2
'

cfa_chaos <- '
chaos =~ chaos2 + chaos3 + chaos6
'

cfa_math <- '
mathprob =~ cldq_18 + cldq_19 + cldq_20
'

fit_cfa_read <- cfa(cfa_read, kids100)
fit_cfa_math <- cfa(cfa_math, kids100)
fit_cfa_chaos <- cfa(cfa_chaos, kids100)
fit_cfa_attention <- cfa(cfa_attention, kids100)

Next, we will set up the full model, including the measurement model and structural model. The model below includes several indirect effects with the goal of explaining variance in reading and math problems.

big_sem <- '
# Measurement Model
readprob =~ cldq_1 + cldq_2 + cldq_4 + cldq_5 + cldq_6
attention =~ swan_1 + swan_2 + swan_3 + swan_4 + swan_5 + swan_6 + 
             swan_7 + swan_8 + swan_9
             
chaos =~ chaos2 + chaos3 + chaos6
mathprob =~ cldq_18 + cldq_19 + cldq_20
             
hwp =~ 1* hpc_mean
hpc_mean ~~ 0.01572*hpc_mean

swan_1 ~~ swan_2
cldq_5 ~~   cldq_6

# Structural Model
attention ~ a1*chaos
hwp ~ a2*chaos + b1*attention

readprob ~ b2*hwp
mathprob ~ b3*hwp

readprob ~~ mathprob

# Indirect Effects
cha.att.hwp.read := a1*b1*b2
cha.att.hwp.math := a1*b1*b3
cha.hwp.read := a2*b2
cha.hwp.math := a2*b3
'

# Estimate SEM and SAM
fit4_sem <- sem(big_sem, kids100)
fit4_sam <- sam(big_sem, kids100)

First, we will compare factor loading estimates from each of the CFAs with those from the full traditional SEM. The code below separates results for each latent factor to limit the size of the output tables.

group_by_models(list( "cfa_read" = fit_cfa_read,
                      "sem" = fit4_sem),
                use_standardizedSolution=TRUE) %>% 
  filter_by(op = c("=~"), lhs = "readprob")
        lhs op    rhs est.std_cfa_read est.std_sem
17 readprob =~ cldq_1            0.744       0.750
18 readprob =~ cldq_2            0.719       0.716
19 readprob =~ cldq_4            0.927       0.918
20 readprob =~ cldq_5            0.875       0.880
21 readprob =~ cldq_6            0.901       0.910
group_by_models(list("cfa_math" = fit_cfa_math,
                     "sem" = fit4_sem),
                use_standardizedSolution=TRUE) %>% 
  filter_by(op = c("=~"), lhs = "mathprob")
        lhs op     rhs est.std_cfa_math est.std_sem
14 mathprob =~ cldq_18            0.744       0.736
15 mathprob =~ cldq_19            0.809       0.840
16 mathprob =~ cldq_20            0.937       0.914
group_by_models(list("cfa_attention" = fit_cfa_attention,
                     "sem" = fit4_sem),
                use_standardizedSolution=TRUE) %>% 
  filter_by(op = c("=~"), lhs = "attention")
        lhs op    rhs est.std_cfa_attention est.std_sem
1 attention =~ swan_1                 0.806       0.809
2 attention =~ swan_2                 0.832       0.831
3 attention =~ swan_3                 0.742       0.743
4 attention =~ swan_4                 0.929       0.931
5 attention =~ swan_5                 0.874       0.869
6 attention =~ swan_6                 0.811       0.807
7 attention =~ swan_7                 0.815       0.817
8 attention =~ swan_8                 0.814       0.817
9 attention =~ swan_9                 0.865       0.865
group_by_models(list("cfa_chaos" = fit_cfa_chaos,
                     "sem" = fit4_sem),
                use_standardizedSolution=TRUE) %>% 
  filter_by(op = c("=~"), lhs = "chaos")
     lhs op    rhs est.std_cfa_chaos est.std_sem
10 chaos =~ chaos2             0.719       0.837
11 chaos =~ chaos3             0.865       0.737
12 chaos =~ chaos6             0.451       0.448

Next, we will compare the SEM and SAM regression estimates (of direct and indirect effects). In addition to parameter estimates, we will also look at p-values to see if any inferences about significance change

group_by_models(list("sem" = fit4_sem, 
                     "sam" = fit4_sam), 
                col_names = c("est", "pvalue")) %>% 
  filter_by(op = c("~", ":="))
                lhs op       rhs est_sem est_sam pvalue_sem pvalue_sam
22        attention  ~     chaos  -0.350  -0.279      0.012      0.071
23              hwp  ~ attention  -0.371  -0.375      0.000      0.000
24              hwp  ~     chaos   0.046   0.041      0.393      0.502
25         mathprob  ~       hwp   0.996   0.986      0.000      0.000
26         readprob  ~       hwp   0.861   0.845      0.000      0.000
56 cha.att.hwp.math :=  a1*b1*b3   0.129   0.103      0.021      0.085
57 cha.att.hwp.read :=  a1*b1*b2   0.112   0.089      0.020      0.083
58     cha.hwp.math :=     a2*b3   0.046   0.040      0.397      0.504
59     cha.hwp.read :=     a2*b2   0.040   0.034      0.396      0.504

Let’s also take a look at the full summary results for the SAM analysis to see what information is included:

summary(fit4_sam, remove.step1 = FALSE)
This is lavaan 0.6-19 -- using the SAM approach to SEM

  SAM method                                     LOCAL
  Mapping matrix M method                           ML
  Number of measurement blocks                       5
  Estimator measurement part                        ML
  Estimator  structural part                        ML

  Number of observations                           100

Summary Information Measurement + Structural:

  Block    Latent Nind  Chisq Df
      1  readprob    5  8.913  4
      2 attention    9 62.025 26
      3     chaos    3  0.000  0
      4  mathprob    3  0.000  0
      5       hwp    1  0.000  0

  Model-based reliability latent variables:

  readprob attention chaos mathprob   hwp
      0.93     0.959 0.811    0.912 0.965

  Summary Information Structural part:

  chisq df   cfi rmsea  srmr
  4.742  4 0.996 0.043 0.027

Parameter Estimates:

  Standard errors                              Twostep
  Information                                 Expected
  Information saturated (h1) model          Structured

Latent Variables:
                       Step Estimate  Std.Err  z-value  P(>|z|)
  readprob =~                                                  
    cldq_1                1    1.000                           
    cldq_2                1    0.798    0.110    7.265    0.000
    cldq_4                1    1.491    0.157    9.490    0.000
    cldq_5                1    1.407    0.157    8.947    0.000
    cldq_6                1    1.454    0.157    9.252    0.000
  attention =~                                                 
    swan_1                1    1.000                           
    swan_2                1    0.951    0.093   10.208    0.000
    swan_3                1    0.816    0.097    8.381    0.000
    swan_4                1    1.187    0.102   11.614    0.000
    swan_5                1    1.228    0.116   10.559    0.000
    swan_6                1    1.109    0.117    9.473    0.000
    swan_7                1    1.027    0.108    9.540    0.000
    swan_8                1    1.048    0.110    9.525    0.000
    swan_9                1    1.021    0.098   10.402    0.000
  chaos =~                                                     
    chaos2                1    1.000                           
    chaos3                1    1.092    0.273    4.003    0.000
    chaos6                1    0.486    0.124    3.931    0.000
  mathprob =~                                                  
    cldq_18               1    1.000                           
    cldq_19               1    0.920    0.113    8.116    0.000
    cldq_20               1    1.062    0.128    8.323    0.000
  hwp =~                                                       
    hpc_mean              1    1.000                           

Regressions:
                       Step Estimate  Std.Err  z-value  P(>|z|)
  attention ~                                                  
    chaos     (a1)        2   -0.279    0.155   -1.807    0.071
  hwp ~                                                        
    chaos     (a2)        2    0.041    0.061    0.672    0.502
    attention (b1)        2   -0.375    0.050   -7.558    0.000
  readprob ~                                                   
    hwp       (b2)        2    0.845    0.121    6.966    0.000
  mathprob ~                                                   
    hwp       (b3)        2    0.986    0.158    6.224    0.000

Covariances:
                       Step Estimate  Std.Err  z-value  P(>|z|)
 .swan_1 ~~                                                    
   .swan_2                1    0.050    0.080    0.632    0.527
 .cldq_5 ~~                                                    
   .cldq_6                1    0.202    0.070    2.892    0.004
 .readprob ~~                                                  
   .mathprob              2   -0.040    0.050   -0.788    0.431

Variances:
                       Step Estimate  Std.Err  z-value  P(>|z|)
   .hpc_mean              1    0.016                           
   .cldq_1                1    0.508    0.080    6.382    0.000
   .cldq_2                1    0.375    0.058    6.480    0.000
   .cldq_4                1    0.230    0.069    3.312    0.001
   .cldq_5                1    0.381    0.082    4.672    0.000
   .cldq_6                1    0.311    0.075    4.151    0.000
   .swan_1                1    0.821    0.127    6.453    0.000
   .swan_2                1    0.613    0.097    6.334    0.000
   .swan_3                1    0.830    0.124    6.700    0.000
   .swan_4                1    0.340    0.066    5.132    0.000
   .swan_5                1    0.713    0.117    6.083    0.000
   .swan_6                1    0.974    0.150    6.485    0.000
   .swan_7                1    0.811    0.125    6.467    0.000
   .swan_8                1    0.851    0.132    6.471    0.000
   .swan_9                1    0.535    0.087    6.162    0.000
   .chaos2                1    0.792    0.225    3.526    0.000
   .chaos3                1    0.341    0.237    1.435    0.151
   .chaos6                1    0.782    0.120    6.529    0.000
   .cldq_18               1    0.786    0.135    5.837    0.000
   .cldq_19               1    0.434    0.089    4.880    0.000
   .cldq_20               1    0.152    0.088    1.717    0.086
   .readprob              2    0.320    0.078    4.086    0.000
   .attention             2    1.460    0.304    4.810    0.000
    chaos                 2    0.847    0.281    3.012    0.003
   .mathprob              2    0.549    0.136    4.042    0.000
   .hwp                   2    0.214    0.034    6.337    0.000

Defined Parameters:
                       Step Estimate  Std.Err  z-value  P(>|z|)
    cha.att.hwp.rd        2    0.089    0.051    1.736    0.083
    cha.tt.hwp.mth        2    0.103    0.060    1.723    0.085
    cha.hwp.read          2    0.034    0.051    0.669    0.504
    cha.hwp.math          2    0.040    0.060    0.668    0.504

Question: What does the output tell you about the fit of the model? How is that different from traditional SEM?

8.7 Does this matter with larger samples?

Research has shown that SAM can be particularly helpful with small to medium samples. The full kids sample consists of 441 observations, which can be considered a medium-sized sample for this relatively simple structural equation model. To see if differences between SEM and SAM are less pronounced with larger samples, we can re-estimate the model with the full sample:

fit5_sem <- sem(big_sem, kids)
fit5_sam <- sam(big_sem, kids)

group_by_models(list("sem" = fit5_sem, 
                     "sam" = fit5_sam), 
                col_names = c("est", "pvalue")) %>% 
  filter_by(op = c("~", ":="))
                lhs op       rhs est_sem est_sam pvalue_sem pvalue_sam
22        attention  ~     chaos  -0.288  -0.253      0.001      0.004
23              hwp  ~ attention  -0.361  -0.362      0.000      0.000
24              hwp  ~     chaos   0.030   0.029      0.333      0.344
25         mathprob  ~       hwp   0.906   0.928      0.000      0.000
26         readprob  ~       hwp   0.688   0.665      0.000      0.000
56 cha.att.hwp.math :=  a1*b1*b3   0.094   0.085      0.002      0.006
57 cha.att.hwp.read :=  a1*b1*b2   0.072   0.061      0.002      0.006
58     cha.hwp.math :=     a2*b3   0.027   0.027      0.335      0.346
59     cha.hwp.read :=     a2*b2   0.020   0.020      0.336      0.346

Question: What differences do you notice? What does that tell you about the role of the sample size?

8.8 Summary

In this R lab, you were introduced to the structural after measurement approach to SEM. Below, you’ll find a Bonus section that demonstrates how to use SAM to estimate latent variable interaction effects. In the next R Lab, you will learn about latent growth models, which allow you to study trajectories of development across multiple time points.

8.9 Bonus: Interaction Effects with SAM

By separating the measurement and structural model, it also becomes much easier to estimate interaction effects among latent variables. This is still a new feature of SAM, so the official version of the lavaan package does not let you compute standard errors/evaluate significance yet (you can download the development version of lavaan from Github and use it to compute bootstrapped SEs). The example below shows you how to specify the interaction effect (identical to the mod.sem package approach) and then estimate the model. You will notice that this approach is much faster than some of the other latent variable interaction effect approaches.

big_sem_int <- '
# Measurement Model
readprob =~ cldq_1 + cldq_2 + cldq_4 + cldq_5 + cldq_6
attention =~ swan_1 + swan_2 + swan_3 + swan_4 + swan_5 + swan_6 + 
             swan_7 + swan_8 + swan_9

mathprob =~ cldq_18 + cldq_19 + cldq_20

# Structural Model
mathprob ~ readprob + attention + readprob:attention
'

fit_int_sam <- sam(big_sem_int, data = kids, se="none")

summary(fit_int_sam)
This is lavaan 0.6-19 -- using the SAM approach to SEM

  SAM method                                     LOCAL
  Mapping matrix M method                           ML
  Number of measurement blocks                       3
  Estimator measurement part                        ML
  Estimator  structural part                        ML

                                                  Used       Total
  Number of observations                           373         441

Summary Information Measurement + Structural:

  Block    Latent Nind   Chisq Df
      1  readprob    5  40.605  5
      2 attention    9 152.925 27
      3  mathprob    3   0.000  0

  Model-based reliability latent variables:

  readprob attention mathprob
     0.936     0.965    0.908

  Summary Information Structural part:

  chisq df cfi rmsea srmr
      0  0   1     0    0

Parameter Estimates:


Regressions:
                   Estimate
  mathprob ~               
    readprob          0.404
    attention        -0.230
    readprob:ttntn    0.016

Covariances:
                   Estimate
  readprob ~~              
    attention        -0.450
    readprob:ttntn    1.333
  attention ~~             
    readprob:ttntn    1.492

Intercepts:
                   Estimate
    readprob          2.016
    attention         4.552
   .mathprob          2.101
    readprob:ttntn    8.728

Variances:
                   Estimate
    readprob          0.589
    attention         1.646
   .mathprob          0.689
    readprob:ttntn    8.701