Core Web Vitals (CWV) are the metrics that Google considers to be the most important indicators of the quality of experience on the web. The process to identify and optimize CWV issues has typically been a reactive one. The decisions site owners make about which technologies to use or which metrics to look at are usually decided by trial and error, rather than empirical research. A site may be built or rebuilt using a new technology, only to discover that it creates UX issues in production.
In this analysis, we analyze the correlation between CWV and many different types of web characteristics simultaneously, rather than a single type of web characteristic in a vacuum, since web development choices are not in a vacuum but in the context of many parts of a website. We hope that these results will provide additional reference points to teams as they consider assessing various web development choices and invite the community to help further the understanding of the interplay between CWV and web characteristics.
- Notable negative associations with largest contentful paint:
- CMS - Joomla and Squarespace
- UI frameworks - animatecss
- Web frameworks - MicrosoftASPNet
- Widgets - FlexSlider and OWLCarousel
- Notable negative associations with cumulative layout shift:
- Bytes of images
- Widgets - Flexslider and OWLCarousel
This analysis is based on data from HTTP Archive. The HTTP Archive dataset is generated in a lab environment and contains detailed information on many characteristics of a website as well as performance data. Due to being lab-generated on a single set of hardware, HTTP Archive data will not be completely reflective of real usage and only allows for us to analyze LCP (largest contentful paint) and CLS (cumulative layout shift) as we do not have any user input for FID (first input delay). However, an advantage of being lab-generated is that all data is gathered on a single set of hardware with no bias in the types of websites that are loaded; thus, shields us from confounding due to user/device characteristics that we do not measure. Although we are not shielded from all confounding between website characteristics and web performance, this choice leaves us with far less confounding than a user-generated dataset where we often have no information on the user, and have only limited device information.
We conferred with domain experts and established a list of web characteristics of interest:
- TTFB, font requests, and bytes of content of various types
- Counts of various types of third party requests
- Web technologies (coded as binary to represent whether technology is used)
- UI frameworks
- Web frameworks
With LCP and CLS as the outcomes and the web characteristic as the predictors, we attempt to model the relationship between the outcomes and the predictors through random forest. Random forest is a learning algorithm for both regression and classification based on a set of decision trees trained from a randomly chosen set of predictors and a bootstrap sample of the dataset.
To assess the correlation between the outcome and each predictor as well as their individual effects on the outcome, we derived a measure of correlation (% of higher >= split mean, %HSM) and a measure of effect size (mean split difference, MSD). Both measures are based on the types of splits the trained decision trees make based on the predictors. See appendix for more details.
%HSM is bounded between 0 and 1, with values close to 0 indicating negative correlation and values close to 1 indicating positive correlation, while values close 0.5 indicates little correlation. MSD’s magnitude is not bounded, and a large positive value indicates that the predictor appears to contribute positively to the mean of the outcome. Note positive here does not necessarily mean it is good, but merely in the numerical sense.
Here, we present results on association and make note of specific characteristics that appear especially impactful on CWV.
When interpreting these results on association, an important thing to note is that positive and negative impact of a particular web characteristics should only be interpreted relative to that of other web characteristics and in the context of websites that employ an array of web technologies, various types of contents, and different third party requests. For instance, if a given web technology shows a strong positive impact, it should be interpreted as this technology appears to be good for performance relative to other technologies, instead of if we add this technology to a website, its web performance will improve.
LCP is modeled as the log of its numerical value, so higher values are worse.
Predictors with %HSM values close to 1 means higher values of a numerical/count characteristic or presence of a technology are strongly associated with higher values of LCP, and vice versa for predictors with %HSM close to 0 (high %HSM is worse).
Likewise, predictors with a relatively large and positive MSD means higher values of a numerical/count characteristic or presence of a technology shows a strong negative impact on LCP, and vice versa for predictors with relatively large and negative MSD (large positive MSD is worse).
In general, third party requests do not show strong correlation or impact on LCP in the context of the other predictors we consider. This result could be due to most websites in HTTP Archive having a fair number of third party requests, so its effect could not be well ascertained.
Animatecss stands out among UI frameworks, MicrosoftASPNet stands out among web frameworks.
Among widgets, FlexSlider and OWLCarousel both show strong positive correlation with LCP, and Flexslider also shows a strong negative effect size.
CLS is modeled as a binary indicator of whether a given threshold is met. 1 indicates a website has CLS < 0.1, and 0 otherwise, so 1s are better than 0s.
Predictors with %HSM values close to 1 means higher values of a numerical/count characteristic or presence of a technology are strongly associated with meeting the CLS threshold, and vice versa for predictors with %HSM close to 0 (low %HSM is worse).
Likewise, predictors with a relatively large and positive MSD means higher values of a numerical/count characteristic or presence of a technology shows a strong positive impact on meeting the CLS threshold, and vice versa for predictors with relatively large and negative MSD (large negative MSD is worse).
UI frameworks all show low impact. Among web frameworks, RubyonRails shows a fairly positive correlation with CLS compliance.
Among widgets, Flexslider and OWLCarousel both show a fairly negative impact on CLS compliance.
This analysis is a first step in an effort to more comprehensively understand the impact of web characteristics on CWV. While the results point out strongly associated characteristics, it would be impactful for the web community to further delve into the associations identified to ascertain which ones are truly causal and which ones are merely associative to inform web developers. In the meantime, the web characteristics with strong negative correlations/effects should be seen as a signal of things that require more attention and/or planning. Finally, it would be of interest to refresh these analyses in the future to see if the associations identified here still hold.
Random forest trains decision trees by making binary splits of the data. Each split is based on a particular predictor and are of the form X <= c and X > c, for a predictor X based on some purity criterion. Then, all data points with value for X <= c will be put in the corresponding branch, and likewise for data points with X > c. The data points can then be further split in each subsequent branch based on other predictors in the same way. The measures of correlation and effect size we use exploit these splits.
Specifically, for a given predictor, we look for splits that are based on the predictor. For each such split, we compute the outcome mean of the data points that are in the <= and > branches. %HSM (% of higher >= split mean) checks the proportion of times that the outcome mean in the >= branch is higher than that in the < branch. This checks how frequently larger outcome means are associated with higher predictor value. MSD (mean split difference) is the outcome mean of the <= branch subtracted from that of the > branch, averaged across all relevant splits of the predictor. This checks the difference in the outcome mean between data points with higher values of the predictor and those with lower values.