Overview | Software |

Description | Websites |

Readings | Courses |

## Overview

This list builds off of the work on Principal Components Analysis (PCA)pageand Exploratory Factor Analysis (EFA)pageon this site. This resource is intended to serve as a guide for researchers who are considering use of PCA or EFA as a data reduction technique. The resources outlined below are intended to complement the already existing resources on the technique-specific webpages.

## Description

**Theoretical/Statistical background and comparisons**

These two publications compare the two methods and present opposing views of whether EFA and PCA should be used on the same dataset.

Principal Components Analysis vs. Exploratory Factor Analysis. D. Suhr SAS Working Paper 203-30:http://www2.sas.com/proceedings/sugi30/203-30.pdf

“Determine the appropriate statistical analysis to answer research questions a priori…It is inappropriate to run PCA and EFA with your data. PCA includes correlated variables with the purpose of reducing the numbers of variables and explaining the same amount of variance with fewer variables (principal components). EFA estimates factors, underlying constructs that cannot be measured directly.”

Joliffe IT, Morgan BJ. Principal component analysis and exploratory factor analysis. Statistical methods in medical research 1992;1:69-95.

“Despite their different formulations and objectives, it can be informative to look at the results of both techniques on the same data set. Each technique gives different insights into the data structure, with PCA concentrating on explaining the diagonal elements, and factor analysis the off-diagonal elements, of the covariance matrix, and both may be useful.”

(Video) Exploratory Factor Analysis (Principal Axis Factoring vs. Principal Components Analysis) in SPSS

There are a number of other books and resources cited on the Advanced Epidemiology page for each method. Many resources cover both techniques but don’t necessarily compare and contrast the two. The online resources at the end of this handout provide introductory material and comparison of the two methods.

The overall goal of this guide is to provide resources for a researcher to navigate the junctures of the decision tree below by sharing literature that compared use of PCA, EFA and other data reduction techniques.

## Readings

### Methodological Articles

The papers below are reviews of use of PCA, EFA and other data reduction techniques in public health and health literature.

This paper is more theoretical and reviews the underlying theory for PCA, EFA (and their connection) along with structural equation models and MIMIC using well-being and poverty indices as a case study.

Krishnakumar, Jaya and Nagar, A. L., On Exact Statistical Properties of Multidimensional Indices Based on Principal Components, Factor Analysis, MIMIC and Structural Equation Models (2008). Social Indicators Research, (2008) 86:481-496.

Systematic review of major depressive disorder classification systems and statistical methods used to identify symptom dimensions or latent classes. Based on 20 articles with 34 analyses, the authors found equal number of factor analyses and PCAs conducted, often with the same scales and measures or on the same sample.

van Loo HM, de Jonge P, Romeijn JW, Kessler RC, Schoevers RA. Data-driven subtypes of major depressive disorder: a systematic review. BMC Med 2012;10:156.

This paper reviewed 47 studies using PCA and compares methods and challenges and mistakes when using PCA for composite health measures. Paper suggests repeating analysis across samples and using complementary methods such as factor analysis.

Coste J, Bouee S, Ecosse E, Leplege A, Pouchot J. Methodological issues in determining the dimensionality of composite health measures using principal component analysis: case illustration and suggestions for practice.

Quality of life research : an international journal of quality of life aspects of treatment, care and rehabilitation 2005;14:641-54.

This paper outlines common mistakes and errors with EFA from a review of 60 studies in psychology journals. Provides useful suggestions for improved practices related to use of EFA and reporting in journals.

Henson RK, Roberts JK. Use of Exploratory Factor Analysis in Published Research: Common Errors and Some Comment on Improved Practice. Educational and Psychological Measurement 2006;66:393-416.

This paper reviews the use of EFA and key decisions when conducting EFA (reviewing 28 papers from high-impact nursing journals). Findings reported that PCA was used more often than EFA (61% vs. 39%), though no paper explained why PCA was chosen over EFA. The paper outlines practical recommendations for addressing flawed and out-of-date “rules of thumb” for PCA and EFA use.

Gaskin CJ, Happell B. On exploratory factor analysis: A review of recent evidence, an assessment of current practice, and recommendations for future use. International Journal of Nursing Studies 2014;51:511-21.

### Application Articles

**PCA**

Nutritional Epidemiology comparison of reduced rank regression, partial least-squares regression and PCA.

DiBello JR, Kraft P, McGarvey ST, Goldberg R, Campos H, Baylin A. Comparison of 3 Methods for Identifying Dietary Patterns Associated With Risk of Disease. American journal of epidemiology 2008;168:1433-43.

Social epidemiology example. The authors concluded using one variable rather than PCA might be as good as developing principal components.

Hurtado D, Kawachi I, Sudarsky J. Social capital and self-rated health in Colombia: The good, the bad and the ugly. Social science & medicine 2011;72:584-90.

Built environment research and development of neighborhood deprivation index using PCA.

Messer LC, Laraia BA, Kaufman JS, et al. The Development of a Standardized Neighborhood Deprivation Index. Journal of urban health : Bulletin of the New York Academy of Medicine 2006;83:1041-62.

**EFA**

Nutritional epidemiology study of dietary patterns and association with laryngeal cancer. Comparison of dietary patterns and whether they allow better explanation of determinants compared to individual components of dietary patterns.

De Stefani E, Boffetta P, Ronco AL, Deneo-Pellegrini H, Acosta G, Mendilaharsu M. Dietary patterns and risk of laryngeal cancer: an exploratory factor analysis in Uruguayan men. International journal of cancer Journal international du cancer 2007;121:1086-91.

Nutritional epidemiology study comparing two dietary patterns generated through EFA (“traditional cooking” and “fruits and vegetables” pattern) with a hypothesis-driven Dietary Approaches to Stop Hypertension (DASH) pattern. No significant trends were found when comparing all three patterns, though women in Q3 of DASH were at lower risk than those in Q1.

Schulze MB, Hoffmann K, Kroke A, Boeing H. Risk of hypertension among women in the EPIC-Potsdam Study: comparison of relative risk estimates for exploratory and hypothesis-oriented dietary patterns. American journal of epidemiology 2003;158:365-73.

Social epidemiology paper using PCA and EFA synonymously: authors write they “conducted an exploratory factor analysis using principal components analysis.” EFA yielded two factors that reflected Perceived and Enacted Sexual Stigma among LBQ women (based on items on a sexual stigma scale).

Logie CH, Earnshaw V. Adapting and Validating a Scale to Measure Sexual Stigma among Lesbian, Bisexual and Queer Women. PloS one 2015;10:e0116198.

Built environment paper exploring environmental contributors to drug abuse using 32 variables for census tracks. 4 factors (representing 55.8% of variance) were identified. Authors made the point that EFA can be more policy relevant by helping distinguish between influence/relationship of economic well-being, violence or social disorganization (3 of the factors).

Bell DC, Carlson JW, Richard AJ. The social ecology of drug use: a factor analysis of an urban environment. Subst Use Misuse 1998;33:2201-17.

## Courses

Short course on PCA and EFA by Jose Manuel Roche at Oxford University Poverty and Human Development Initiative with lecture video, slides, exercise files, reading list and links to other resources. Available here:http://www.ophi.org.uk/principal-components-analysis-and-factor-analysis-2010

Two introductory lessons on PCA and EFA from Mike Clark, PhD at University of North Texas and Elizabeth Root at University of Colorado. Explains the difference the in the variance between the two methods. These lectures also have a useful explanation of factor analysis scales along with guidance on what variables to include in analysis:

http://www.unt.edu/rss/class/mike/6810/Principal%20Components%20Analysis.pdfand

http://www.colorado.edu/geography/class_homepages/geog_4023_s11/Lecture18_PCA.pdf

A resource page on EFA and PCA from University of Wisconsin psychology department:http://psych.wisc.edu/henriques/pca.html

5 videos (2 hours) introduction and tutorial for EFA and PCA from Econometrics Academy (by Ani Katakova). Interesting to note that the example conducts EFA and PCA on the same dataset.https://www.youtube.com/playlist?list=PLRW9kMvtNZOjaStLK9ldf_Yc8MB6TkCUx

More resources from Econometrics Academy available here:https://sites.google.com/site/econometricsacademy/

Theoretical lecture on principal components analysis from “Opinionated Lessons in Statistics” by Bill Press, University of Texas. Main caution related to over-interpretation of meaning of componentshttps://www.youtube.com/watch?v=frWqIUpIxLg&index=43&list=PLUAHeOPjkJseXJKbuk9-hlOfZU9Wd6pS0

A written tutorial on Principal Components Analysis. Lindsay I Smith February 26, 2002. Accessed March 15, 2015. Available athttps://courses.cs.washington.edu/courses/cse528/09sp/pca.pdf

Brief written tutorial on Exploratory (and Confirmatory) Factor Analysis from Jamie Decoster at University of Alabama. “Overview of Factor Analysis.” Accessed March 16, 2005. Available at:http://stat-help.com/factor.pdf

## FAQs

### Is principal component analysis the same as exploratory factor analysis? ›

PCA and EFA have different goals: **PCA is a technique for reducing the dimensionality of one's data, whereas EFA is a technique for identifying and measuring variables that cannot be measured directly** (i.e., latent variables or factors).

**Is exploratory factor analysis necessary? ›**

Exploratory factor analysis is **essential to determine underlying constructs for a set of measured variables**.

**Is principal component analysis difficult? ›**

**The major components are difficult to comprehend**.

Even the most basic invariance could not be caught by the PCA unless the training data clearly stated it. For example, after computing the main components, it is difficult to determine which characteristics in the dataset are the most significant.

**When would you run an exploratory factor analysis? ›**

Exploratory factor analysis (EFA) is generally used to discover the factor structure of a measure and to examine its internal reliability. EFA is often recommended **when researchers have no hypotheses about the nature of the underlying factor structure of their measure**.

**How are principal components analysis PFA and EFA different? ›**

Differences Between EFA and PCA

**In PCA, when we retain a component, we take into account both specific variance and common variance.** While in EFA we only take into account common variance. Seeing the next figure, we can think that A's are specific variances, B is the common variance, and C's are error variances.

**What is the difference between PCA and PAF? ›**

Principal axis factoring: The only difference between PAF and PCA is that **in PAF in the correlation matrix 1's in the diagonal are replaced with the estimates of the communalities**.

**Can I do CFA without EFA? ›**

If you have an a priori expectation of what the structure should be for a given measure, then, **yes: you can (and should) go with CFA (confirmatory factor analysis)**.

**Do you need EFA before CFA? ›**

The decision about using EFA and CFA is not discretionary, rather it depends upon the constructs you are employing in your study. **If your model contains constructs that have been not well tested in terms of reliability and validity, you must proceed with EFA in such circumstances before CFA**.

**What is the minimum sample size for factor analysis? ›**

Exploratory factor analysis (EFA) is generally regarded as a technique for large sample sizes (N), with **N = 50** as a reasonable absolute minimum.

**Does PCA improve accuracy? ›**

Conclusion. Principal Component Analysis (PCA) is very useful to speed up the computation by reducing the dimensionality of the data. Plus, **when you have high dimensionality with high correlated variable of one another, the PCA can improve the accuracy of classification model**.

### What type of data is good for PCA? ›

PCA works best on **data set having 3 or higher dimensions**. Because, with higher dimensions, it becomes increasingly difficult to make interpretations from the resultant cloud of data. PCA is applied on a data set with numeric variables. PCA is a tool which helps to produce better visualizations of high dimensional data.

**What is the goal of exploratory factor analysis? ›**

In exploratory factor analysis (EFA, the focus of this resource page), each observed variable is potentially a measure of every factor, and the goal is **to determine relationships (between observed variables and factors) are strongest**.

**What are the assumptions of exploratory factor analysis? ›**

Assumptions of Factor Analysis

**There will not be any outliers in the data**. The sample size will be greater than the size of the factor. Since the method is interdependent, there will be no perfect multicollinearity between any of the variables.

**How do you report exploratory factor analysis results? ›**

Usually, you summarize the results of the EFA into one table which contains all items used for the EFA, their factor loadings and the names of the factors. Then you indicate in the notes of the table the method of extraction, the method of rotation and the cutting value of extracting factors.

**When would you use PCA over EFA? ›**

If communalities are large, close to 1.00, results could be similar. PCA assumes the absence of outliers in the data. EFA assumes a multivariate normal distribution when using Maximum Likelihood extraction method. PCA decomposes a correlation matrix with ones on the diagonals.

**What is the difference between Anova and PCA? ›**

The two methods differ in the way the effect matrices are analyzed. **ANOVA–PCA adds the residual matrix to the effect matrices before PCA**. Score plots obtained with ANOVA–PCA therefore immediately show grouping of data points for the different levels of the independent variables.

**How much variance should PCA explain? ›**

Some criteria say that the total variance explained by all components should be **between 70% to 80% variance**, which in this case would mean about four to five components.

**Do you need to standardize data for PCA? ›**

**Yes, it is necessary to normalize data before performing PCA**. The PCA calculates a new projection of your data set. And the new axis are based on the standard deviation of your variables.

**What is the common goal of principal component analysis and confirmatory factor analysis? ›**

Confirmatory Factor Analysis

Principal components analysis and factor analysis are common methods used to analyze groups of variables for the purpose of **reducing them into subsets represented by latent constructs** (Bartholomew, 1984; Grimm & Yarnold, 1995).

**Is principal axis factoring the same as factor analysis? ›**

In a very broad sense, **“common factor” analysis (or “principal axis factoring”) is used when we want to identify the latent variables that are underlying a set of variables**, while “principal components” analysis is used to reduce a set of variables to a smaller set of factors (i.e., the “principal components” of the ...

### Should EFA and CFA be applied on the same sample? ›

If a researcher decides that EFA is the best approach for analyzing the data, the results from the EFA should ideally be confirmed with a CFA before using the measurement instrument for research. This confirmation **should never be conducted on the same sample as the initial EFA**.

**Whats the difference between EFA and CFA? ›**

CFA and EFA are both methods of factor analysis. It is said that **EFA extracts a factor structure from the data whereas CFA is used to test if a factor structure fits the data** (or in other words to test a hypothesis).

**Can you do CFA in SPSS? ›**

You need to purchase the Analysis of Moment Structue {AMOS} to rund CFA. **You can not use SPSS**. You can use AMOS, LISREL or MPlus. If you do not have AMOS, LISREL or Mplus, you could use R (free of charge) or integrate R with SPSS, The connection of SPSS 23 will be to R 3.1.

**What do I do after EFA? ›**

You may consider **running a CFA** after the EFA. You can take the analysis further by using some of the techniques of structural equation modelling such as multi-group analysis or multilevel analysis.

**Is EFA A SEM? ›**

EFA is a data-driven approach which is generally used as an investigative technique to identify relationships among variables. SEM is an a priori theory approach which is most often used to determine the extent to which an already established theory about relationships among variables is supported by empirical data.

**When should I take CFA? ›**

In statistics, confirmatory factor analysis (CFA) is a special form of factor analysis, most commonly used in social research. It is used **to test whether measures of a construct are consistent with a researcher's understanding of the nature of that construct (or factor)**.

**Does sample size matter in factor analysis? ›**

**The factor analysis literature provides a wide range of rough guidelines regarding an adequate sample size**. Most of these guidelines consistently advocate for large samples (say, a sample size of at least 200) to obtain high-quality factor analysis solutions.

**How many variables are needed for factor analysis? ›**

Is there a minimum number of input variables that an exploratory factor analysis requires? Mathematically, two. For more practical & statistical reasons, **three at least**. This is because there is a rule of thumb that each factor should decently load at least 3 variables.

**Does sample size affect factor analysis? ›**

These power calculations figure out how big a sample you need so that a certain width of a confidence interval or p-value will coincide with a scientifically meaningful effect size. But that's not the only issue in sample size, and not every statistical analysis uses p-values.

**When should you not use PCA? ›**

PCA should be used mainly for variables which are strongly correlated. If the relationship is weak between variables, PCA does not work well to reduce data. Refer to the correlation matrix to determine. In general, **if most of the correlation coefficients are smaller than 0.3**, PCA will not help.

### On what type of data does PCA fail? ›

When a given data set is not linearly distributed but might be arranged along with non-orthogonal axes or well described by a geometric parameter, PCA could fail to represent and recover **original data from projected variables**.

**Why does PCA reduce accuracy? ›**

**Using PCA can lose some spatial information which is important for classification**, so the classification accuracy decreases.

**How many principal components should I use? ›**

If our sole intention of doing PCA is for data visualization, the best number of components is **2 or 3**. If we really want to reduce the size of the dataset, the best number of principal components is much less than the number of variables in the original dataset.

**Can PCA handle missing values? ›**

Input to the PCA can be any set of numerical variables, however they should be scaled to each other and **traditional PCA will not accept any missing data points**. Data points will be scored by how well they fit into a principal component (PC) based upon a measure of variance within the dataset.

**How do you select the best number of principal components for the dataset? ›**

A widely applied approach is to decide on the number of principal components by **examining a scree plot**. By eyeballing the scree plot, and looking for a point at which the proportion of variance explained by each subsequent principal component drops off. This is often referred to as an elbow in the scree plot.

**What is the benefit of using PCA? ›**

PCA can help us **improve performance at a very low cost of model accuracy**. Other benefits of PCA include reduction of noise in the data, feature selection (to a certain extent), and the ability to produce independent, uncorrelated features of the data.

**What is one disadvantage of using PCA even when the dataset is a good candidate for it? ›**

On the other hand, PCA has its disadvantages: **Independent variables are now less interpretable**. PCA reduces your features into smaller number of components. Each component is now a linear combination of your original features, which makes it less readable and interpretable.

**Is Principal Components Analysis Exploratory? ›**

Principal components analysis (PCA, for short) is a variable-reduction technique that **shares many similarities to exploratory factor analysis**.

**What is the difference between CFA and EFA? ›**

**EFA is used when it is not known how many factors there are between the items and which factors are determined by which items while CFA is used if there is a strong theory about the structure**. In this study, a data set is examined to fit to more than one CFA model via a simulation study.

**Is principal axis factoring the same as factor analysis? ›**

In a very broad sense, **“common factor” analysis (or “principal axis factoring”) is used when we want to identify the latent variables that are underlying a set of variables**, while “principal components” analysis is used to reduce a set of variables to a smaller set of factors (i.e., the “principal components” of the ...

### What is the difference between Anova and PCA? ›

The two methods differ in the way the effect matrices are analyzed. **ANOVA–PCA adds the residual matrix to the effect matrices before PCA**. Score plots obtained with ANOVA–PCA therefore immediately show grouping of data points for the different levels of the independent variables.

**How do you write an exploratory factor analysis result? ›**

Usually, you summarize the results of the EFA into one table which contains all items used for the EFA, their factor loadings and the names of the factors. Then you indicate in the notes of the table the method of extraction, the method of rotation and the cutting value of extracting factors.

**How is exploratory factor analysis subjective? ›**

Factor analysis is generally an exploratory/descriptive method that **requires many subjective judgments**. It is a widely used tool and often controversial because the models, methods, and subjectivity are so flexible that debates about interpretations can occur.

**Do you need to standardize data for PCA? ›**

**Yes, it is necessary to normalize data before performing PCA**. The PCA calculates a new projection of your data set. And the new axis are based on the standard deviation of your variables.

**Can I do CFA without EFA? ›**

If you have an a priori expectation of what the structure should be for a given measure, then, **yes: you can (and should) go with CFA (confirmatory factor analysis)**.

**Do you need to do EFA before CFA? ›**

The decision about using EFA and CFA is not discretionary, rather it depends upon the constructs you are employing in your study. **If your model contains constructs that have been not well tested in terms of reliability and validity, you must proceed with EFA in such circumstances before CFA.**

**What type of factor analysis should I use? ›**

**Exploratory Factor Analysis** should be used when you need to develop a hypothesis about a relationship between variables. Confirmatory Factor Analysis should be used to test a hypothesis about the relationship between variables.

**How much variance should PCA explain? ›**

Some criteria say that the total variance explained by all components should be **between 70% to 80% variance**, which in this case would mean about four to five components.

**What are the assumptions of principal component analysis? ›**

The assumptions in PCA are: There must be linearity in the data set, i.e. the variables combine in a linear manner to form the dataset. The variables exhibit relationships among themselves.

**How much variance should be explained in EFA? ›**

Variance explained by factor analysis **must not maximum of 100% but it should not be less than 60%**. It should not be less than 60%. If the variance explained is 35%, it shows the data is not useful, and may need to revisit measures, and even the data collection process.

### How PCA is used for dimensionality reduction? ›

PCA generally tries to find the lower-dimensional surface to project the high-dimensional data. PCA works **by considering the variance of each attribute because the high attribute shows the good split between the classes, and hence it reduces the dimensionality**.

**What is exploratory factor analysis in research? ›**

Exploratory factor analysis (EFA) is **a classical formal measurement model that is used when both observed and latent variables are assumed to be measured at the interval level**. Characteristic of EFA is that the observed variables are first standardized (mean of zero and standard deviation of 1).

**What is long form of PCA in factor analysis? ›**

PCA's approach to data reduction is to create one or more index variables from a larger set of measured variables. It does this using a linear combination (basically a weighted average) of a set of variables. The created index variables are called components.