Planning a Factor Analytic study

Michael Friendly
Psychology 6140

Note: This document was prepared some years ago, and covers material for which there are many more recent contributions. See the Further Reading section for some more recent pointers.


So, you want to do a factor analysis? Apart from understanding a modest amount of theory, there are a number of practical questions that arise in any factor analytic study: There are many ways of answering these questions in the factor analysis literature and in research ``lore'', but it is important to understand that there is much art (as well as science) to carrying out a factor analytic study and judgment is often required.

The material below began as notes I copied from the blackboard in Karl Jöreskog's factor analysis course at Princeton in 1970/71. Over the years I've added topics that appeared as Frequently Asked Questions in consulting and teaching.

This document outlines the phases of a factor analytic study and a number of the practical questions and issues that need to be addressed. As an outline, it does not go into much detail. Instead, you should consult one or more of these sources:

Phases in a Factor Analytic Study

I. Reconnaissance Stage

  1. Preparatory Planning
    1. Do you really want to do a factor analysis? (Theory construction and testing vs. data summarization; account for common variance or all variance).
    2. Definition of domain: what kinds of tests to study? e.g., Guilford's "Structure of Intellect" model presented a theory which cross-classified any test of intellective performance along a number of dimensions. For factor-analytic study of a single construct (e.g., "anxiety") it is important to have a sufficiently detailed theoretical description to determine the relevant dimensions of the construct (e.g., trait-anxiety, state-anxiety, etc.)
    3. Examination of earlier literature: What variables used in previous studies, factors found, etc.
    4. Formulation of hypotheses:
      • How many factors expected?
      • What kind of factors? (Orthogonal, oblique, general, group)
      • Alternative hypotheses?
  2. Construction & selection of tests
    1. How many variables to use? (p)
      • Overdetermine the hypothesized factors: Need at least p = 2 variables to extract a common factor (by definition). It is better to have at least 3-5 variables believed to measure each factor. p = 5 × k for safety.
      • Factor analytic principles and empirical studies suggest it is better to have more than the minimum number of variables/factor.
      • As the number of salient variables / factor increases, the communalities, rotational positions, and factor scores all become better determined.
      • It appears to be generally more difficult to replicate factors with fewer than 5 or 6 salient variables for each factor.
    2. Include pure-factor variables ("markers") wherever possible-- variables expected to load only on that factor.
    3. Avoid variables which are experimentally dependent-- where the result on one variable is necessarily dependent on another (e.g., systolic & disystolic BP; items on questionnaire which are just minor rephrasings or which are based on the same context).
  3. Data collection
    1. What population is being sampled? Define the population to which you want to generalize results.
    2. Take pains to achieve random sampling. As in all statistics, the validity of inferences is threatened when samples are non-random.
    3. Avoid restriction of range, i.e., sample is homogeneous on some of the measures. This reduces the possible size of correlations.
    4. Sample size (N)
      • The more the better! Reliability and replicability increase directly with N.
      • Monte Carlo studies show that more reliable factors can be extracted with larger sample sizes.
      • Absolute minimum-- N = 5 ×p, but you should have N > 100 for any serious factor analysis. Minimum applies only when communalities are high and p / k is high. Most major factor analytic studies use N > 200, some as high as 500-600.
      • Safer to use at least N > 10 × p.
      • The lower the reliabilities, the larger N should be.
    5. Plan for determining the reliability of each measure (e.g., test-retest on a subsample, or coefficient a for scales/tests composed of items).
    6. Plan for cross-validation (split-sample) or validation (replication).
  4. Descriptive analysis
    1. Reliabilities of tests - gives upper bound on communalities, and good initial estimates (PRIORS statement in PROC FACTOR).
    2. Data screening - check for outliers, errors: probability plot of Mahalanobis squared distances from mean is useful. (Alternatively, the diagonal elements of H, the "hat" matrix can be used to check for multivariate outliers.)
    3. Distributions - transformations required? All variables should be multivariate normal. Lack of normality can distort the validity of the c2 tests for ML factor methods. At the least, make sure that all are reasonably symmetric, and transform any which are highly skewed.
    4. Sample stratification: Are there natural subgroups within the sample which might differ in either their means or in the pattern of correlation?
      • If significant differences in means exist, analyze within-cell correlation matrix (i.e., use PROC STANDARD to set the means in each group to zero before computing correlations.) Alternatively, code the group variable with dummy variables and examine the correlations with the factor variables.
      • If different pattern of correlations is expected, consider doing a separate analysis for each group or testing the hypothesis of equal covariance / correlation matrices. An alternative, which does not require splitting the sample is to include the dummy variable(s) in the analysis. If these dummy variables load (strongly) on any factors, the groups differ on this factor.
    5. Correlations
      • Matrix of full rank? (Linear dependencies?) Check the value of the determinant of the correlation/covariance matrix. If near zero, delete one or more variables.
      • R = Identity? (Sphericity test). If you cannot reject the hypothesis that the variables are all uncorrelated, you have no business doing factor analysis.
      • With discrete, ordinal (or binary) measures the alternatives are to treat them as continuous anyway (use ordinary Pearson correlations) or use special procedures developed for polychoric (or tetrachoric) correlations. Robustness studies are mixed; they suggest that discreteness may introduce little bias in the estimation of parameter values, but may affect standard errors and ML c2; tests more seriously, particularly when the number of response categories is small (2-4). Hence, for exploratory studies the consequences may not be serious. For confirmatory studies, the PRELIS program, a companion program to LISREL 7, provides a special method to estimate a covariance matrix from ordinal data and a weight matrix which is used in LISREL for such data. See Bollen (1989, 433ff) and the LISREL 7 Guide for further discussion.

II. Exploratory Factor Analysis

  1. Adequacy of common factor model?
  2. Determining the number of factors
  3. Chi Square test from Maximum Likelihood solution
    Provides a statistical test of fit of the model with k factors, against the alternative hypothesis that S is unconstrained (any positive definite symmetric matrix). The test is based on the following assumptions, which are rarely fulfilled in practice:

    Rather than regarding c2; as a formal test statistic, one should regard it as a badness of fit measure in the sense that large c2; values correspond to bad fit and small values correspond to good fit. From this perspective, the statistical problem is not one of testing a given hypothesis (which may be considered false a priori), but rather one of fitting the model to the data to decide whether the fit is adequate or not. With greater N you can extract more statistically significant factors.

    The c2; measure is sensitive to sample size and very sensitive to departures from multivariate normality. Large sample sizes and departure from normality tend to increase c2; over and above what can be expected due to misspecification of the model. (See CFA below for alternative measures).

    A more reasonable way to use the c2 value is to compare the differences in c2; to the differences in degrees of freedom as more factors are added to the model. A large drop in c2; compared to the difference in d.f. indicates that the addition of one more factor represents a real improvement. A drop in c2; close to the difference in d.f. indicates that the improvement in fit is obtained by 'capitalizing on chance', and the added factor may not have real significance or meaning.

  4. Other goodness of fit measures
    There are a large number of alterntaive goodness of fit measures designed to overcome the limitations of the raw c2; test. Bollen (1989, 256-289) divides these into overall fit measures and incremental fit measures (how much better with one more factor?). Some of these are:
  5. Rotation
  6. Interpretation
  7. Reformulation of hypotheses and/or tests
    Summarize the discrepancies between the hypotheses and the rotated solution. Might there be a need to redesign or replace any of the tests?
  8. Cross validation or replication

III. Confirmatory Factor Analysis

  1. Number of factors
    Specify the number of factors based on exploratory analyses.
  2. Pattern of loadings: fixed vs. free parameters.
    Specify a hypothesis by constraining certain parameters in the factor matrices to be zero. The hypothesis is confirmed to the extent that the model still fits.

    Note: For a LISREL factor analysis model to be identified, it is necessary to fix at least one loading on each factor to a non-zero value (e.g., 1.0) in order to fix the measurement scale of that factor.

  3. Modification indices.
    The LISREL program calculates "modification indices" for each fixed and constrained parameter; PROC CALIS calls these "Lagrange multiplier" tests. This index is the expected decrease in c2; if a single constraint in the hypothesis is relaxed, and all estimated parameters are held fixed at their estimated values. Each modification index is a c2 with 1 df, and the parameter with the largest index will improve fit maximally. Relaxing parameters based on the modification index is only recommended when the parameter(s) freed make sense from a substantive point of view.

    Similarly, the t-values for each free parameter provide a test of the hypothesis that the parameter equals 0. PROC CALIS provides a Wald test statistic as well, which is a 1 df c2; value. Both statistics evaluate whether a restriction (setting the parameter = 0) can be imposed on the estimated model.

  4. Nested hypotheses and difference in Chi Square.
    As described above under "Determining the number of factors", the c2; statistic is best regarded as a measure of badness of fit of the hypothesis. It makes sense to compare a series of hypotheses, H 1, H 2, H 3, ..., such that H 1 is the most stringent or restricted hypothesis, and H 2, H 3, ... successively relax some of the restrictions. If H i is wholly included in H i+1, then the difference in c2; between them can be regarded as a test of the parameters that are fixed in H i but free in H i+1 .

    For example if H 1 and H 2 both specify the same factor pattern, but H 1 fixes the factor correlations, F = I, while H 2 allows factor correlations to be free, the c2; difference is attributable to the correlations among the factors.

    H(1-2):    F = I     test by     Dc2 = (c12 - c22) on Ddf = (df1 - df2) d.f.
  5. Other measures of goodness of fit.
    LISREL and PROC CALIS also give other indices which are useful in assessing the fit of a hypothesis:

    Goodness of fit index (GFI): A measure of the relative amount of variances and covariances accounted for by the model, and an adjusted goodness of fit value (AGFI), adjusted for degrees of freedom. Both measures are between 0 and 1, where 1=perfect fit. Unlike the c2; , Jöreskog & Sorbom (1984) claim that both the GFI and AGFI index are independent of sample size and relatively robust against departure from normality. Their distributional properties are unknown, however, so there is no significance test associated with them.

    It should be emphasized that the measures c2, GFI, and AGFI are measures of the overall fit of the model to the data and do not express the quality of the model by any other criteria. For example, it can happen that the overall fit of the model is very good, but one or more relationships in the model is poorly determined (as indicated by the squared multiple correlations), or vice versa. Furthermore, if any of the overall measures indicates that the model does not fit well, that fact does not tell what is wrong with the model. Diagnosing what part of the model is wrong can be done by inspecting the normalized residuals (which correlations are not well fit?) and/or the modification indices (which fixed parameters might be relaxed?).

  6. Absolute vs. relative measures. There are now many new measures of goodness-of-fit, but they may be classified as

    Of all available software, only AMOS provides these model comparison statistics automatically when you fit a series of models. My CALISCMP macro provides similar model comparison statistics for a set of models fit using PROC CALIS.

Good luck!

Further Reading

© 1995 Michael Friendly

Author: Michael Friendly