5  Reliability

Note

You can download the R code used in this lab by right-clicking this link and selecting “Save Link As…” in the drop-down menu: reliability.R

5.1 Loading R packages

In this lab we will be using a new package: MBESS (Methods for the Behavioral, Educational, and Social Sciences), which includes a function that calculates coefficient omega and (more importantly) its confidence interval. Although other packages can also estimate coefficient omega, they often do not provide a confidence interval. We will also use the psych package. We can simply get those packages from the R package library:

library(psych)
library(MBESS)

5.2 Loading data into our environment

For this lab, we will use a dataset that is included in the psych package, so we can use the data() function like we did before:

data("attitude")

These data come from a survey of clerical employees of a large financial organization. Each variable represents a rating (on the percentage scale) of how well the company performs on that item’s topic (e.g., complaints).

describe(attitude)
           vars  n  mean    sd median trimmed   mad min max range  skew
rating        1 30 64.63 12.17   65.5   65.21 10.38  40  85    45 -0.36
complaints    2 30 66.60 13.31   65.0   67.08 14.83  37  90    53 -0.22
privileges    3 30 53.13 12.24   51.5   52.75 10.38  30  83    53  0.38
learning      4 30 56.37 11.74   56.5   56.58 14.83  34  75    41 -0.05
raises        5 30 64.63 10.40   63.5   64.50 11.12  43  88    45  0.20
critical      6 30 74.77  9.89   77.5   75.83  7.41  49  92    43 -0.87
advance       7 30 42.93 10.29   41.0   41.83  8.90  25  72    47  0.85
           kurtosis   se
rating        -0.77 2.22
complaints    -0.68 2.43
privileges    -0.41 2.23
learning      -1.22 2.14
raises        -0.60 1.90
critical       0.17 1.81
advance        0.47 1.88

5.3 Are the items tau-equivalent?

When determining which internal consistency coefficient may be most appropriate for our measurement instrument, we can look at whether the items are tau equivalent (equivalent factor loadings for all items). A simple way to do so is the run a unidimensional EFA and inspect the factor loadings:

fa(attitude)$loadings

Loadings:
           MR1  
rating     0.758
complaints 0.834
privileges 0.603
learning   0.789
raises     0.841
critical   0.284
advance    0.491

                 MR1
SS loadings    3.285
Proportion Var 0.469

Based on the loadings, do you think the items are tau equivalent? What does this mean for our choice of internal consistency coefficient?

5.4 Coefficient Omega

We will use the MBESS package to compute coefficient omega:

ci.reliability(attitude, type = "omega", 
               conf.level = 0.95, 
               interval.type = "mlr")
$est
[1] 0.8563268

$se
[1] 0.04619404

$ci.lower
[1] 0.7657882

$ci.upper
[1] 0.9468655

$conf.level
[1] 0.95

$type
[1] "omega"

$interval.type
[1] "robust maximum likelihood (wald ci)"

How internally consistent are the scores on this measurement instrument with this sample? Here is how you’d report the reliability: Internal consistency of the Attitudes survey was good, \(\omega\) = .86 (SE = .05), 95% CI = [.77, .95].

We can use the estimated internal consistency to get an estimate of the overall standard error of measurement (sem):

# compute the SD of the attitude sum scores
sd_x <- sd(rowSums(attitude))

# compute sem: sd_x * sqrt(1 - reliability)
sem <- sd_x * sqrt(1 - 0.8563268)
sem
[1] 21.88914

So, the average size of the error scores is 21.89.

5.5 Cronbach’s Alpha

We will use the psych package to compute Cronbach’s alpha.

attitude_alpha <- alpha(attitude)
Number of categories should be increased  in order to count frequencies. 

This function returns a bunch of output that we can look at, starting with some basic summary statistics:

summary(attitude_alpha)

Reliability analysis   
 raw_alpha std.alpha G6(smc) average_r S/N   ase mean  sd median_r
      0.84      0.84    0.88      0.43 5.2 0.042   60 8.2     0.45

We can also look at how Cronbach’s alpha would change if specific items were removed from the instrument. This can help us identify items that are measured with more measurement error:

attitude_alpha$alpha.drop
           raw_alpha std.alpha   G6(smc) average_r      S/N   alpha se
rating     0.8097602 0.8081915 0.8317701 0.4125442 4.213534 0.05244967
complaints 0.7969175 0.7956468 0.8201749 0.3935404 3.893487 0.05653049
privileges 0.8278478 0.8230879 0.8661986 0.4367533 4.652525 0.04757251
learning   0.8030310 0.7983665 0.8367262 0.3975597 3.959493 0.05429802
raises     0.7953866 0.7847196 0.8261984 0.3779228 3.645105 0.05558872
critical   0.8638723 0.8634716 0.8900481 0.5131641 6.324481 0.03839722
advance    0.8404649 0.8346265 0.8563876 0.4568621 5.046918 0.04287642
                var.r     med.r
rating     0.03484463 0.4454779
complaints 0.03454755 0.4261169
privileges 0.05381799 0.5316198
learning   0.04457495 0.3768830
raises     0.04790794 0.3432934
critical   0.03015342 0.5582882
advance    0.04811380 0.4933310

And finally, we can find the 95% CI:

attitude_alpha$feldt

   95% confidence boundaries (Feldt)
 lower alpha upper
  0.74  0.84  0.92

Here is how you’d report the internal consistency using Cronbach’s alpha: Internal consistency of the Attitudes survey was good, \(\alpha\) = .84 (SE = .04), 95% CI = [.74, .92].

Note that you can also use the MBESS package to compute Cronbach’s alpha:

ci.reliability(attitude, type = "alpha", 
               conf.level = 0.95, 
               interval.type = "feldt")
$est
[1] 0.8431428

$se
[1] NA

$ci.lower
[1] 0.7393757

$ci.upper
[1] 0.9157731

$conf.level
[1] 0.95

$type
[1] "alpha"

$interval.type
[1] "feldt"

5.6 Split-Half Reliability

We can also use the psych package to estimate the split-half reliability. This function returns (among some other things) the minimum, maximum, and average split-half reliability. Ideally, we want those numbers to be close together and close to 1. If the minimum and maximum are far apart, it indicates that only some specific splits can be considered essentially parallel.

splitHalf(attitude)
Split half reliabilities  
Call: splitHalf(r = attitude)

Maximum split half reliability (lambda 4) =  0.89
Guttman lambda 6                          =  0.88
Average split half reliability            =  0.82
Guttman lambda 3 (alpha)                  =  0.84
Guttman lambda 2                          =  0.85
Minimum split half reliability  (beta)    =  0.68
Average interitem r =  0.43  with median =  0.45

5.7 Disattenuation of correlations

As discussed in class, when two measures are not perfectly reliable, then the correlation between them will be biased, or attenuated (i.e., lower than it should be). In this part of the lab, we’ll see this phenomenon in action.

First, we’ll split the attitude survey in two parts, so we can look at the correlation between the two parts. In Assignment 6, you will do something similar, but for two different surveys. However, you will still need to split the Assignment 6 data frame into two parts, so the code below is still relevant.

# Split attitude data into two parts 
# for demonstration
part1 <- attitude[,c(1:4)]
part2 <- attitude[,c(5:7)]

To compute a correlation between the two tests, we need to compute each participant’s sumscore across the items. We can use a function called rowSums() to do this:

# Compute summed scores of 
# each part using rowSums()
sumscore1 <- rowSums(part1)
sumscore2 <- rowSums(part2)

Now, we can compute the observed correlation of the summed scores:

# Compute correlation between summed scores
obscor <- cor(sumscore1,sumscore2)
obscor
[1] 0.5438004

But we already know from our earlier assessment above that the full attitude test is not perfectly reliable. Now we also need to see if these two parts are perfectly reliable or not. To decide between using Cronbach’s alpha and coefficient omega, we need to assess the (lack of) tau equivalence of the two parts:

#examine tau equivalence
fa(part1)$loadings

Loadings:
           MR1  
rating     0.855
complaints 0.920
privileges 0.590
learning   0.713

                 MR1
SS loadings    2.434
Proportion Var 0.608
fa(part2)$loadings

Loadings:
         MR1  
raises   0.874
critical 0.431
advance  0.657

                 MR1
SS loadings    1.381
Proportion Var 0.460

These loadings do not look tau equivalent, so we will use coefficient omega to quantify the internal consistency. This time, we’re using the $ operator to extract just the omega estimate (est) from the ci.reliability() output:

#record omega reliability estimates of both parts
omega1 <- ci.reliability(part1)$est
omega2 <- ci.reliability(part2)$est

omega1
[1] 0.8615586
omega2
[1] 0.7097988

The omegas above show that the two tests are not perfectly reliable, so it’s important to disattenuate the observed correlation:

# Disattenuated correlation between tests
discor <- obscor / sqrt(omega1 * omega2)
discor
[1] 0.6953916
Note

A nice feature of CFA (or structural equation modeling more generally) is that correlations between factors are disattenuated for (lack of) reliability, because the factors only represent the true score part of the item’s variability, while the error variance is separated into the residual or error variance of the indicators.

5.8 Summary

In this R lab, you learned how to determine whether a set of items are tau-equivalent and how to compute coefficient omega and Cronbach’s alpha to evaluate internal consistency reliability. You also learned how to get the split-half reliability. Finally, you used the dissatenuation formula to dissatenuate a correlation for measurement error.