DEV Community

PromptCloud
PromptCloud

Posted on • Originally published at promptcloud.com

Exploratory Factor Analysis in R

Image description

Exploratory Factor Analysis (EFA) is a powerful statistical method used in data analysis for uncovering the underlying structure of a relatively large set of variables. It is particularly valuable in situations where the relationships between variables are not entirely known or when data analysts seek to identify underlying latent factors that explain observed patterns in data.

At its core, EFA helps in simplifying complex data sets by reducing a large number of variables into a smaller set of underlying factors, without significant loss of information. This technique is instrumental in various fields, including psychology, marketing, finance, and social sciences, where it aids in identifying patterns and relationships that are not immediately apparent.

The importance of EFA lies in its ability to provide insights into the underlying mechanisms or constructs that influence data. For example, in psychology, EFA can be used to identify underlying personality traits from a set of observed behaviors. In customer satisfaction surveys, it helps in pinpointing key factors that drive consumer perceptions and decisions.

Moreover, EFA is crucial for enhancing the validity and reliability of research findings. By identifying the underlying factor structure, it ensures that subsequent analyses, like regression or hypothesis testing, are based on relevant and concise data constructs. This not only streamlines the data analysis process but also contributes to more accurate and interpretable results.

In summary, Exploratory Factor Analysis is an essential tool in the data analyst’s arsenal, offering a pathway to decipher complex data sets and revealing the hidden structures that inform and guide practical decision-making. Its role in simplifying data and uncovering latent variables makes it a cornerstone technique in the realm of data analysis and interpretation.

What is exploratory factor analysis in R?

Exploratory Factor Analysis (EFA) or roughly known as factor analysis in R is a statistical technique that is used to identify the latent relational structure among a set of variables and narrow it down to a smaller number of variables. This essentially means that the variance of a large number of variables can be described by a few summary variables, i.e., factors.

Basic Concept and Mathematical Foundation:

  • The fundamental idea behind EFA is that there are latent factors that cannot be directly measured but are represented by the observed variables.
  • Mathematically, EFA models the observed variables as linear combinations of potential factors plus error terms. This model is represented as: X = LF + E, where X is the matrix of observed variables, L is the matrix of loadings (which shows the relationship between variables and factors), F is the matrix of factors, and E is the error term.
  • Factor loadings, which are part of the output of EFA, indicate the degree to which each variable is associated with each factor. High loadings suggest that the variable has a strong association with the factor.
  • The process involves extracting factors from the data and then rotating them to achieve a more interpretable structure. Common rotation methods include Varimax and Oblimin.

Differences from Confirmatory Factor Analysis (CFA):

  • EFA differs from Confirmatory Factor Analysis (CFA) in its purpose and application. While EFA is exploratory in nature, used when the structure of the data is unknown, CFA is confirmatory, used to test hypotheses or theories about the structure of the data.
  • In EFA, the number and nature of the factors are not predefined; the analysis reveals them. In contrast, CFA requires a predefined hypothesis about the number of factors and the pattern of loadings based on theory or previous studies.
  • EFA is more flexible and is often used in the initial stages of research to explore the possible underlying structures. CFA, on the other hand, is used for model testing and validation, where a specific model or theory about the data structure is being tested against the observed data.

Exploratory Factor Analysis is a powerful tool for identifying the underlying dimensions in a set of data, particularly when the relationships between variables are not well understood. It serves as a foundational step in many statistical analyses, paving the way for more detailed and hypothesis-driven techniques like Confirmatory Factor Analysis.

Here is an overview of efa in R.

Image description

As the name suggests, EFA is exploratory in nature – we don’t really know the latent variables, and the steps are repeated until we arrive at a lower number of factors. In this tutorial, we’ll look at EFA using R. Now, let’s first get the basic idea of the dataset.

1. The Data

This dataset contains 90 responses for 14 different variables that customers consider while purchasing a car. The survey questions were framed using a 5-point Likert scale with 1 being very low and 5 being very high. The variables were the following:

  • Price
  • Safety
  • Exterior looks
  • Space and comfort
  • Technology
  • After-sales service
  • Resale value
  • Fuel type
  • Fuel efficiency
  • Color
  • Maintenance
  • Test drive
  • Product reviews
  • Testimonials

Download the coded dataset now.

2. Importing WebData

Now we’ll read the dataset present in CSV format into R and store it as a variable.

[code language=”r”] data <- read.csv(file.choose(),header=TRUE) [/code]

It’ll open a window to choose the CSV file and the header option will make sure that the first row of the file is considered as the header. Enter the following to see the first several rows of the data frame and confirm that the data has been stored correctly.

[code language=”r”] head(data) [/code]

3. Package Installation

Now we’ll install the required packages to carry out further analysis. These packages are psych and GPArotation. In the code given below, we are calling install.packages() for installation.

[code language=”r”] install.packages(‘psych’) install.packages(‘GPArotation’) [/code]

4. Number of Factors

Next, we’ll find out the number of factors that we’ll be selecting for factor analysis statistics. This is evaluated via methods such as Parallel Analysis and eigenvalue, etc.

Parallel Analysis

We’ll be using the Psych package’s fa.parallel function to execute the parallel analysis. Here we specify the data frame and factor method (minres in our case). Run the following to find an acceptable number of factors and generate the scree plot:

[code language=”r”] parallel <- fa.parallel(data, fm = ‘minres’, fa = ‘fa’) [/code]

The console would show the maximum number of factors we can consider. Here is how it’d look.

“Parallel analysis suggests that the number of factors = 5 and the number of components = NA“

Given below in the scree plot generated from the above code:

Image description

The blue line shows eigenvalues of actual data and the two red lines (placed on top of each other) show simulated and resampled data. Here we look at the large drops in the actual data and spot the point where it levels off to the right. Also, we locate the point of inflection – the point where the gap between simulated data and actual data tends to be minimum.

Looking at this plot and parallel analysis, anywhere between 2 to 5 factors would be a good choice.

Factor Analysis

Now that we’ve arrived at a probable number of factors, let’s start off with 3 as the number of factors. In order to perform factor analysis, we’ll use the psych packages`fa()function. Given below are the arguments we’ll supply:

  • r – Raw data or correlation or covariance matrix
  • nfactors – Number of factors to extract
  • rotate – Although there are various types of rotations, Varimax and Oblimin are the most popular
  • fm – One of the factor extraction techniques like Minimum Residual (OLS), Maximum Liklihood, Principal Axis etc.

In this case, we will select oblique rotation (rotate = “oblimin”) as we believe that there is a correlation in the factors. Note that Varimax rotation is used under the assumption that the factors are completely uncorrelated. We will use Ordinary Least Squared/Minres factoring (fm = “minres”), as it is known to provide results similar to Maximum Likelihood without assuming a multivariate normal distribution and derives solutions through iterative eigendecomposition like a principal axis.

Run the following to start the analysis.

[code language=”r”] threefactor <- fa(data,nfactors = 3,rotate = “oblimin”,fm=”minres”) print(threefactor) [/code]

Here is the output showing factors and loadings:

Image description

Now we need to consider the loadings of more than 0.3 and not loading on more than one factor. Note that negative values are acceptable here. So let’s first establish the cut-off to improve visibility.

[code language=”r”] print(threefactor$loadings,cutoff = 0.3) [/code]

Image description

As you can see two variables have become insignificant and two others have double-loading. Next, we’ll consider the ‘4’ factors.

[code language=”r”] fourfactor <- fa(data,nfactors = 4,rotate = “oblimin”,fm=”minres”) print(fourfactor$loadings,cutoff = 0.3) [/code]

Image description

We can see that it results in only single-loading. This is known as the simple structure.

Hit the following to look at the factor mapping.

[code language=”r”] fa.diagram(fourfactor) [/code]

Adequacy Test

Now that we’ve achieved a simple structure it’s time for us to validate our model. Let’s look at the factor analysis output to proceed.

Image description

The root means the square of residuals (RMSR) is 0.05. This is acceptable as this value should be closer to 0. Next, we should check the RMSEA (root mean square error of approximation) index. Its value, 0.001 shows a good model fit as it is below 0.05. Finally, the Tucker-Lewis Index (TLI) is 0.93 – an acceptable value considering it’s over 0.9.

Naming the Factors

Image description

After establishing the adequacy of the factors, it’s time for us to name the factors. This is the theoretical side of the analysis where we form the factors depending on the variable loadings. In this case, here is how the factors can be created.

The Importance of EFA in Data Analysis

Exploratory Factor Analysis (EFA) is a critical tool in data analysis, highly valued for its ability to simplify complex datasets, reduce dimensions, and reveal latent variables. The significance of EFA in various industries and research fields is multifaceted:

Image description

Simplifying Data

EFA helps in making large sets of variables more manageable. By identifying clusters or groups of variables that are closely related, EFA reduces the complexity of data. This simplification is crucial in making the data more understandable and in facilitating clearer, more concise interpretations.

Reducing Dimensions

In datasets with numerous variables, EFA serves as an efficient method for dimensionality reduction. It consolidates information into a smaller number of factors, making it easier to analyze without a significant loss of original information. This reduction is particularly useful in fields like machine learning and statistics, where handling large numbers of variables can be computationally intensive and challenging.

Uncovering Latent Variables

One of the most significant advantages of EFA is its ability to identify latent variables. These are underlying factors that are not directly observed but inferred from the relationships between observed variables. In psychology, for example, EFA can reveal underlying personality traits from observed behaviors. In marketing research, it can identify consumer preferences and attitudes that are not directly expressed.

Role in Various Industries and Research Fields

  • Market Research: In market research, EFA is used to understand consumer behavior, segment markets, and identify key factors that influence purchase decisions.
  • Psychology and Social Sciences: EFA is extensively used in psychological testing to identify underlying constructs in personality, intelligence, and attitude measurement.
  • Healthcare: In the healthcare sector, EFA helps in understanding the factors that affect patient outcomes and in developing scales for assessing patient experiences or symptoms.
  • Finance: EFA assists in risk assessment, portfolio management, and identifying underlying factors that influence market trends.
  • Education: In educational research, EFA is utilized to develop and validate testing instruments and to understand educational outcomes.

In each of these fields, EFA not only aids in data reduction and simplification but also provides critical insights that might not be apparent from the raw data alone. By revealing hidden patterns and relationships, EFA plays a pivotal role in informing decision-making processes, developing strategic initiatives, and advancing scientific understanding. The versatility and applicability of EFA across different domains underscore its importance as a fundamental tool in data analysis.

Conclusion

In this tutorial for analysis in r, we discussed the basic idea of EFA in R (exploratory factor analysis in R), covered parallel analysis, and scree plot interpretation. Then we moved to factor analysis in R to achieve a simple structure and validate the same to ensure the model’s adequacy. Finally arrived at the names of factors from the variables. Now go ahead, try it out, and post your findings in the comment section.

If you’re intrigued by the possibilities of EFA and other data analysis techniques, we invite you to delve deeper into the world of advanced data solutions with PromptCloud. At PromptCloud, we understand the power of data and the importance of extracting meaningful insights from it. Our suite of data analysis tools and services is designed to cater to diverse needs, from web scraping and data extraction to advanced analytics.

Whether you’re looking to harness the potential of big data for your business, seeking to understand complex data sets, or aiming to transform raw data into strategic insights, PromptCloud has the expertise and tools to help you achieve your goals. Our commitment to delivering top-notch data solutions ensures that you can make data-driven decisions with confidence and precision.

Explore our offerings, learn more about how we can assist you in navigating the ever-evolving data landscape, and take the first step towards unlocking the full potential of your data with PromptCloud. Visit our website, reach out to our team of experts, and join us on this journey of data exploration and innovation.

Top comments (0)