library(lavaan)
library(psych)
library(GPArotation)
4 Exploratory Factor Analysis
You can download the R code used in this lab by right-clicking this link and selecting “Save Link As…” in the drop-down menu: exploratoryfactoranalysis.R
4.1 Loading R Packages
Remember, you only need to install a package once. But if you want to use the functionality of a package, you will need to “load” the package into your environment. To do that for lavaan (and the psych and GPArotation packages, which we’ll also use in this lab), we use the library()
function:
4.2 Loading data into our environment
We’re using the same dataset as we used in the CFA R Lab, so we can use the data()
function like we did before:
data("HolzingerSwineford1939")
Remember that for the CFA analysis, we did not have to remove any variables from the data frame, because lavaan
extracted the relevant variables automatically. With EFA, using the psych
package, we have to do that extraction ourselves. We can do that as follows:
# print the variable names of the full data frame
# and locate the relevant variables
colnames(HolzingerSwineford1939)
[1] "id" "sex" "ageyr" "agemo" "school" "grade" "x1" "x2"
[9] "x3" "x4" "x5" "x6" "x7" "x8" "x9"
# use the [,] operator to select only the relevant
# columns/variables (here in column 7 to 15)
<- HolzingerSwineford1939[,7:15] HSdata
There are many other ways of selecting variables from a larger data frame, and if you have a different method that you like better, feel free to use it!
4.3 EFA Step 1: How many factors should I extract?
In the first step of the EFA, we will use parallel analysis to see what the algorithm identifies as the optimal number of factors to extract from the data. This algorithm generates random correlation matrices, and when doing so, it may return an error message because something went wrong with those random matrices. If this happens, you can simply re-run the fa.parallel()
function and the error should disappear.
The code below will return a parallel analysis for the Holzinger Swineford data using the factor analysis method and based on 50 random correlation matrices. You can increase that number to 100 or 1000 if you want to be more certain of the results, but note that that will take longer to run.
fa.parallel(HSdata, fa = "fa", n.iter = 50)
Parallel analysis suggests that the number of factors = 3 and the number of components = NA
Based on the plot, how many factors should we extract?
4.4 EFA Step 2: Factor Extraction and Rotation
In the first cycle of the EFA process, we will follow the parallel analysis results and estimate a 3-factor EFA. To estimate the parameters and rotate those results to be more interpretable, we just need to use one function:
<- fa(HSdata, nfactors = 3,
efa_3f fm = "minres",
rotate = "oblimin")
Technically, you don’t even need to include fm = "minres", rotate = "oblimin"
, but I wanted to show you what arguments you need to use if you want to change the default estimation method (here “minres”) or if you want to change the default rotation method (here “oblimin”).
4.5 EFA Step 3: Interpreting the EFA estimates
Commmunalities
round(efa_3f$communalities,
digits = 3)
x1 x2 x3 x4 x5 x6 x7 x8 x9
0.477 0.255 0.453 0.728 0.754 0.691 0.519 0.520 0.460
Most communalities are between .4 and .6 (one even above .6), indicating that the factors are able to account for a good chunk of the variability in the item responses. One exception is x2
, which has a communality of .255. Overall, these values look acceptable.
Factor Loadings
It can be helpful to hide low factor loadings from your output to see if the factor extraction and rotation has resulted in a simple structure. We can do that by including cutoff = .3
in the print()
function:
print(efa_3f$loadings,
cutoff = .3)
Loadings:
MR1 MR3 MR2
x1 0.592
x2 0.509
x3 0.686
x4 0.846
x5 0.886
x6 0.805
x7 0.737
x8 0.686
x9 0.382 0.456
MR1 MR3 MR2
SS loadings 2.197 1.275 1.239
Proportion Var 0.244 0.142 0.138
Cumulative Var 0.244 0.386 0.523
The factor loadings appear to follow a pretty clear, simple structure. The exception is x9
, which has a factor loading > .3 on two factors.
This output also includes information about the variance in the items that is explained by each factor. SS loadings
refers to the sum of the squared loadings (i.e., the factor’s Eigenvalue). The columns (even in the loadings table) are sorted from the highest Eigenvalue to the lowest. That’s why the order here is MR1, MR3, and then MR2 (and MR refers to the estimation method, minres). The second row shows the variance that is accounted for by each factor, and the bottom row shows the cumulative variance accounted for by all factors. Here, the three factors explain 52.3% of the variance in the items.
Note: if you want to see all the factor loading estimates, you need to set the cuttof at the lowest possible value for factorloadings (-1):
print(efa_3f$loadings,
cutoff = -1)
Loadings:
MR1 MR3 MR2
x1 0.196 0.592 0.031
x2 0.043 0.509 -0.122
x3 -0.062 0.686 0.019
x4 0.846 0.016 0.008
x5 0.886 -0.065 0.007
x6 0.805 0.080 -0.013
x7 0.044 -0.152 0.737
x8 -0.034 0.125 0.686
x9 0.032 0.382 0.456
MR1 MR3 MR2
SS loadings 2.197 1.275 1.239
Proportion Var 0.244 0.142 0.138
Cumulative Var 0.244 0.386 0.523
Factor Correlations
Finally, we can look at the correlations between the factors:
round(efa_3f$Phi,
digits = 3)
MR1 MR3 MR2
MR1 1.000 0.323 0.213
MR3 0.323 1.000 0.261
MR2 0.213 0.261 1.000
Extremely large correlations between factors may be an indication of overextraction; the two factors could be combined into one factor. In this case, the correlations between the factors are small to moderate, indicating that they are tapping into distinct but correlated subconstructs.
4.6 EFA Step 4: Comparing to other factor solutions
To understand if the three-factor model makes the most sense, it is typical to also estimate an EFA with one factor less and one factor more to see if those analyses result in more clearly interpretable results. Let’s start by estimating a two-factor EFA.
Two-Factor EFA
<- fa(HSdata, nfactors = 2,
efa_2f fm = "minres",
rotate = "oblimin")
Interpreting the results of the Two-Factor EFA
round(efa_2f$communalities,
digits = 3)
x1 x2 x3 x4 x5 x6 x7 x8 x9
0.341 0.100 0.223 0.728 0.708 0.705 0.179 0.381 0.545
Many of the communalities are low, indicating that this factor solution does not do a good job of accounting for variability in the items.
print(efa_2f$loadings,
cutoff = .3)
Loadings:
MR1 MR2
x1 0.430
x2
x3 0.449
x4 0.851
x5 0.854
x6 0.828
x7 0.434
x8 0.640
x9 0.736
MR1 MR2
SS loadings 2.244 1.588
Proportion Var 0.249 0.176
Cumulative Var 0.249 0.426
Although there are no cross-loadings, one item (x2
) doesn’t have a loading > .3 on either of the factors! These two factors cannot capture the variance in x2
that is common with the other items.
round(efa_2f$Phi,
digits = 3)
MR1 MR2
MR1 1.00 0.34
MR2 0.34 1.00
The factor correlation does not indicate any issues.
Four-Factor EFA
<- fa(HSdata, nfactors = 4,
efa_4f fm = "minres",
rotate = "oblimin")
Interpreting the results of the Four-Factor EFA
round(efa_4f$communalities,
digits = 3)
x1 x2 x3 x4 x5 x6 x7 x8 x9
0.454 0.230 0.554 0.740 0.787 0.687 0.995 0.424 0.568
Compared to the three-factor EFA, the communalities have not changed a lot, except for the communality of x7
, which is now a whopping .995. Such a high communality indicates that there is a factor (or combination of factors) that can account for almost all variability in x7
. Although this may sound good, it may stand in the way of our goal of dimension reduction (as we’ll see next).
print(efa_4f$loadings,
cutoff = .3)
Loadings:
MR1 MR2 MR3 MR4
x1 0.453
x2 0.397
x3 0.735
x4 0.850
x5 0.887
x6 0.804
x7 0.986
x8 0.499
x9 0.674
MR1 MR2 MR3 MR4
SS loadings 2.216 1.089 0.954 0.787
Proportion Var 0.246 0.121 0.106 0.087
Cumulative Var 0.246 0.367 0.473 0.561
When asked to estimate four factors, the EFA algorithm resulted in a factor that only represents one item (x7
). This one-to-one association gets in the way of our goal of dimension reduction, and is an indication that this factor solution is not appropriate.
round(efa_4f$Phi, digits = 3)
MR1 MR2 MR3 MR4
MR1 1.000 0.122 0.250 0.286
MR2 0.122 1.000 0.046 0.417
MR3 0.250 0.046 1.000 0.436
MR4 0.286 0.417 0.436 1.000
Factor correlations indicate that there is no extremely strong correlation (\(r = .417\)) between the factor that represents x7
and the factor that represents x8
and x9
(these three items are hypothesized to measure one subconstruct: speed). This indicates that, although these three items share some common variance, they also tap into distinct sub-subconstructs that may need to be explored further.
4.7 Some Final Conclusions
For this sample, a three-factor solution appeared to best balance dimension reduction and representing the associations among the observed variables. However, the results did indicate that there may be an issue with x2
(low communality) and x9
(cross loading). In addition, the four-factor EFA seemed to indicate that the three items measuring speed are not as related as we’d hoped they’d be. A second sample could indicate whether these findings were due to sampling variability or whether they reflect true issues that need to be resolved.
4.8 Summary
In this R lab, you learned how to specify, estimate, evaluate and interpret EFAs. You also learned how to evaluate different sources of information about the appropriateness of the EFA solutions.