gouse

Posted on Oct 20, 2022

Great Resignations, employee attrition analysis using Machine Learning Algorithms

#employee #prediction #attrition #machinelearning

Abstract:

In recent days there were high number of attrition's across all the industries over the globe. In this article we tried analyzing some of the most influenced reasons/factors using the ML algorithms.

Attrition is defined an employee leaves the organization for various reasons. The number of employees that leave an organization versus average number of employees in the organization over a period of time is known as Attrition rate. If Attrition Rate is higher than usual, it become a matter of concern. If attrition rate is high, there will be a huge loss of talent for the company. So, it is always suggested to predict the employee attrition forehand [2,3]. If the company has information on employees those who may leave the organization, company can take some preventive steps to contain the attrition. In this analysis we will explore the important factors/ attributes that are influencing employee attrition. We had also explored how each factor is contributing to the attrition. We had applied machine learning algorithms such as Classification prediction, data pre-processing techniques like Data Extraction ,Feature Engineering and Data sampling. Hence Classification Predictive models are implemented in companies to keep track of attrition possibilities, in turn to avoid or mitigate the employee attrition.

Keywords : Attrition, Classification, Perdition Future Extraction ,Future Engineering

Statement of problem and objective:

Attrition is a big problem in many organizations [4]. In any organization, small attrition rate is common. But, if it is more, then it becomes a matter of concern and the reasons for high attrition rates are to be investigated, so that the company can take required measures to reduce the attrition rate in future [5,6,7]. If more number of employees leave an organization, there will be a huge production loss, economic loss, loss of clients and loss company image. It affects the organization in many ways. Hence, it’s required to investigate the reasons behind high attrition rate and [12–14] it’s being asked to build a model to predict attrition of the employees.

Methodologies for Analysis:

As per the objective of Research Question, we adopted chi2 test statistic and evaluated the predictions of employee attrition. The analysis are carried using different algorithms like Logistic Regression, Linear Discriminate Analysis, K Nearest Neighbors, Classification and Regression Tree, Gaussian Naïve Bayes, Support Vector Machine. We used chi2 test for finding the features that are affecting the Attrition of an employee[15]. At the beginning stage, we applied data validation techniques and encoding techniques to convert Categorical Variables to Numerical Variables. Based on the sample experiment data,

2.1 Data Acquisition : Dataset Description The HRM dataset used in this research work is distributed by IBM Analytics [32]. This dataset contains 35 features relating to 1500 observations and refers to U.S. data. All features are related to the employees’ working life and personal characteristics (see Table 1). Table 1. Dataset features. Age, Monthly income Attrition(predicted), Monthly rate, Business travel, Number of companies worked, Daily rate, Over18, Department, Overtime, Distance from home, Percent salary hike, Education, Performance rating, [33]Education field, Relationship satisfaction[1,7], Employee count, Standard hours, Employee number, Stock option level, Environment satisfaction, Total working years, Gender, Training times last year, Hourly rate, Work-life balance, Job involvement, Years with company, Job level, Years in current role, Job role, Years since last promotion, Job satisfaction, Years with current manager, Marital status[34–38]

Attrition: A high attrition rate triggers high recruitment cost for resourcing new employees. So, it is always helpful for the companies to know the influencing factors of employee attrition. Here, chi2 test statistic is used for finding the strong relation or dependency of attrition variable[22] on input features of the given data.

2.2 Feature Engineering Techniques for Character Data:

When there are more predictors or features, the degree of association between predictor or input feature and the target feature or outcome can be measured with statistics such as Chi2.The features with more chi2 test statistic value can be the best features to be considered for modelling. The p value less than 0.01 are considered to validate the Chi2 score values. These are the nine features that are having high chi2 values and p values less than 0.01. DistanceFromHome’,’JobLevel’,’MaritalStatus’,’OverTime’,’StockOptionLevel’,’TotalWorkingYears’, ‘YearsAtCompany’, ‘YearsInCurrentRole’, ‘YearsWithCurrManager’ are the nine affecting features of attrition.

i. Distance From Home: This is one the input features which is influencing the attrition according to chi2 test result. It’s Chi2 test value with Attrition variable is 59.49 which is a huge score that represents much dependency of target variable on this input feature. The barplot in [Fig01] shows the affect of Distance From Home on Attrition. It shows that ,those who are 2 kms away(near to the office) from office are more likely to leave the company. Those who are much far from the company are not willing to leave the company. The pie chart in [Fig02] shows that among all the employees, who would like to leave the company, more people are 2kms away from the office. Data shows that 11.81% of employees who left the company are 2kms away from the office and 10.97% of employees who left the organization are 1km away from their office.

ii. Job Level: Job Level is another input feature which is influencing the attrition according to chi2 test result. It’s Chi2 test value with Attrition variable is 21.74 which is a good score that represents much dependency of target variable on this input feature. There are five job levels in the data. Among all, only few are influencing attrition. The barplot in [Fig03] shows the affect of Job Level on Attrition. It shows that those who are at Job Level 1 are more likely to leave the company. Those who are at Job Level 4 and 5 are not willing to leave the company. The pie chart in [Fig04] shows that among all the employees, who would like to leave the company, more people are at Job Level 1. Data shows, 60.34% of employees who left the company are at Job Level 1, followed by 21.94% of employees who left the company are at Job Level 2.

iii. Marital Status: Marital Status is another input feature which is influencing the attrition according to chi2 test result. It’s Chi2 test value with Attrition variable is 12.93 which is a good score that represents good dependency of target variable on this input feature. The bar plot in [Fig05] shows the affect of Marital Status on Attrition. It shows that those who are Single, are more likely to leave the company. Those who are Divorced are less willing to leave the company. The pie chart in [Fig06] shows that among all the employees, who would like to leave the company, more people are Single. Data shows, 50.63% of employees who left the company are Single, followed by 35.44% of employees who left the company are Married, and remaining 13.92% employees who left the company are Divorced.

iv. Over Time: Over Time is another input feature which is influencing the attrition according to chi2 test result. It’s Chi2 test value with Attrition variable is 56.92 which is a huge score that represents much dependency of target variable on this input feature. The bar plot in [Fig07] shows the affect of Over Time on Attrition. It shows that those who are doing over time, are more likely to leave the company, when compared to those who are not working over time. The pie chart in [Fig08] shows that among all the employees, who would like to leave the company, more people are those who are working over time. Data shows, 53.59% of employees who left the company are doing over time.

v. Stock Option Level: Stock Option Level is another input feature which is influencing the attrition according to chi2 test result. It’s Chi2 test value with Attrition variable is 17.31 which is a good score that represents good dependency of target variable on this input feature. The bar plot in [Fig09] shows the effect of Stock Option Level on Attrition. It shows that those who are having Stock Option Level 0, are more likely to leave the company, when compared to Stock Option Levels 1, 2 and 3.The pie chart in [Fig10] shows that among all the employees, who would like to leave the company, most people those who are at Stock Option Level 0. Data shows, 64.98% of employees who left the company are having Stock Option Level 0, followed by 23.63% of employees having Stock Option Level 1.

vi. Total Working Years: Total Working Years is another input feature which is influencing the attrition according to chi2 test result. It’s Chi2 test value with Attrition variable is 219.33 which is a big score that represents high dependency of target variable on this input feature. The bar plot in [Fig11] shows the effect of Total Working Years on Attrition. It shows that those who are having Total Working Experience of 1 year, are more likely to leave the company, and those who are having Total Working Experience of more than 11 years are less likely to leave the company. The pie chart in [Fig12] shows that among all the employees, who would like to leave the company, more people are those who are having 1 year of Total Working experience. Data shows, 16.88% of employees who left the company are having 1 year of Total Working Experience, followed by 10.55% of employees with Total Working Experience of 10 years.

vii. Years At Company: Years At Company is another input feature which is influencing the attrition according to chi2 test result. It’s Chi2 test value with Attrition variable is 145.78 which is a big score that represents high dependency of target variable on this input feature. The barplot in [Fig13] shows the effect of Years At Company on Attrition. It shows that those who are having 1 Year of experience At Company, are more likely to leave the company, and those who are in the company for more than 10 Years are less likely to leave the company. The pie chart in [Fig14] shows that among all the employees, who would likely to leave the company, more people are those who are having 1 year of Working experience At Company. Data shows, 24.89% of employees who left the company are having 1 year of Working Experience At the Company, followed by 11.39% of employees with 2 years of experience At Company.

viii. Years In Current Role: Years In Current Role is another input feature which is influencing the attrition according to chi2 test result. It’s Chi2 test value with Attrition variable is 103.62 which is a big score that represents high dependency of target variable on this input feature. The barplot in [Fig15] shows the effect of Years In Current Role on Attrition. It shows that those who are having 0 Years of experience or less than 1 year of experience In Current Role, are more likely to leave the company, and those who are having more than 10 Years of experience in current role are less likely to leave the company. The pie chart in [Fig16] shows that among all the employees, who would like to leave the company, more people are those who are having less than 1 year of Working experience in current role. Data shows, 30.80% of employees who left the company are having less than 1 year of Working Experience in current role, followed by 28.69% of employees with 2 years of experience in current role.

ix. Years With Current Manager: Years With Current Manager is another input feature which is influencing the attrition according to chi2 test result. It’s Chi2 test value with Attrition variable is 120.49 which is a big score that represents high dependency of target variable on this input feature. The barplot in [Fig17] shows the effect of Years With Current Manager on Attrition. It shows that those who are having less than 1 Year With Current Manager, are more likely to leave the company, and those who are having more than 10 Years With Current Manager are less likely to leave the company. The pie chart in [Fig18] shows that among all the employees, who would like to leave the company, more people are those who are having less than 1 year with current manager. Data shows, 35.86% of employees who left the company are having less than 1 year of association with the current manager, followed by 21.10% of employees with 2 years with current manager.

2.2 Feature Engineering for Numerical Data :

The available numerical variables for modelling are “Age”, “DailyRate”, “HourlyRate”, “MonthlyIncome”, “MonthlyRate”. Often, it required to check the correlation among all the numerical features that are present in the dataset. If there are any highly correlated numerical features present in the data, it is required to remove redundant features. It’s required because, most of the times, these redundant features reduce the performance of machine learning models. And,the [table01] shows the correlation among all numerical features

i. Age:“Age” is one of the numerical variables that is useful in predicting the Attrition of employee. It has exhibited gaussian distribution with skewness of 0.413 and kurtosis of -0.404, which are valid scores. And, data shows that many employees left the organization at an age of 29 and 31 years.

ii. Daily Rate:“Daily Rate” is another numerical variables that is useful in predicting the Attrition of employee. It has exhibited gaussian distribution with skewness of -0.0035 and kurtosis of -1.203, which are valid scores.

iii. HourlyRate:“Hourly Rate” one more numerical variable that is useful in predicting the Attrition of employee. It has exhibited Gaussian distribution with skewness of -0.0323 and kurtosis of -1.196, which are valid scores. And, data shows that more number of employees with an Hourly Rate of 66 left the organization.

iv. Monthly Income:“Monthly Income” is one of the numerical variables that is useful in predicting the Attrition of employee. It has not exhibited Gaussian distribution and it’s skewness is 1.369, which is not acceptable and kurtosis is 1.005. Hence logarithmic transformation is done on this variable to make it Gaussian distributed. Now, the skewness is 0.286 and kurtosis is -0.697, which are acceptable scores.

v. MonthlyRate:“Monthly Rate” is one of the numerical variables that is useful in predicting the Attrition of employee. It has exhibited Gaussian distribution with skewness of 0.0185 and kurtosis of -1.214, which are valid scores.

2.3 Machine Learning /AI Algorithms Description:

i. Logistic Regression :Logistic regression assumes a Gaussian distribution for the numeric input variables and can model binary classification problems.

ii. Linear Discriminant Analysis: Linear Discriminant Analysis or LDA is a statistical technique for binary and multiclass classification. It too assumes a Gaussian distribution for the numerical input variables.

iii. K Nearest Neighbors: The k-Nearest Neighbors algorithm (or KNN) uses a distance metric like ecludian distance to find the k most nearest instances in the training data for a new instance and takes the mean outcome of the neighbors as the prediction.

iv. Naive Bayes: Naive Bayes calculates the probability of each class and the conditional probability of each class given each input value. These probabilities are estimated for new data and multiplied together, assuming that they are all independent (a simple or naive assumption).

v. CART: Classification and Regression Trees construct a binary tree from the training data. Split points are chosen greedily by evaluating each attribute and each value of each attribute in the training data in order to minimize a cost function.

vi. SVM: Support Vector Machines (or SVM) seek a line that best separates two classes. Those data instances that are closest to the line that best separates the classes are called support vectors

2.5 Evaluating Models:

In order to avoid data leakage, first the whole dataset is divided into training and validating datasets. Pipeline process is used to automate the scaling and evaluation of algorithms. Accuracy is chosen as the evaluation metric. KFold cross validation is used for resampling and evaluation of different algorithms. Min Max Scalar is used for scaling the data and standardizing it. Logistic Regression, Linear Discriminant Analysis, K Nearest Neighbors, Classification and Regression Tree, Gaussian Naïve Bayes, Support Vector Machine algorithms are used. The below scores are mean and standard deviation values of accuracy scores over 10 folds of KFold cross validation.

i. Logistic Regression has given mean accuracy of 0.851 and standard deviation of 0.027 on the training data of this dataset.

ii. Linear Discriminant Analysis has given mean accuracy of 0.845 and standard deviation of 0.024 on the training data of this dataset.

iii. K Nearest Neighbors has given mean accuracy of 0.832 and standard deviation of 0.026 on the training data of this dataset.

iv. Decision Tree Classifier or CART has given mean accuracy of 0.772 and standard deviation of 0.034 on the training data of this dataset.

v. Naive Bayes has given mean accuracy of 0.768 and standard deviation of 0.037 on the training data of this dataset.

vi. Support Vector Machine has given mean accuracy of 0.849 and standard deviation of 0.030 on the training data of this dataset.

2.6 Finalizing Model:

From the above analysis and stats resulting from the Machine Learning Models, the score card says the best model is Logistic Regression which is having the highest accuracy amongst other models. Hence, it is selected as final model and its accuracy is checked on unseen or validating dataset. It has given an accuracy of 0.8639, which is a good score on validation data. The test data of this dataset is used for further future predictions.

3 Findings:

Out of our research on attrition of employees from an organization, significance influencing factors are extracted through future extraction and future engineering techniques such as and it can be concluded that ‘DistanceFromHome’, ‘JobLevel’, ‘MaritalStatus’, ‘OverTime’, ‘StockOptionLevel’,’ ‘TotalWorkingYears’, ‘YearsAtCompany’, ‘YearsInCurrentRole’, ‘YearsWithCurrManager’ , ”Age”, “DailyRate”, “HourlyRate”, “MonthlyIncome”, “MonthlyRate” are the effecting features of attrition. And, LogisticRegression Algorithm is working better on this binary classification prediction problem with an accuracy of about 85%.

4 Conclusion:

High Attrition Rate is a problem that is to be carefully examined and to be investigated to find out the reasons behind it, in order to avoid major losses for the organization. Hence, in our research, we found out the major factors that are acting as driving forces of employee attrition, and accordingly we have developed models to predict the possible employee attrition. This might help organization to take required steps to avoid the losses caused by attrition, or else, companies can imply preventive measures to retain those employees who might leave the organization. Above factors are the most influencing in employee attrition.

Figures referred:

References:

Cockburn, I.; Henderson, R.; Stern, S. The Impact of Artificial Intelligence on Innovation. In The Economics of Artificial Intelligence: An Agenda; University of Chicago Press: Chicago, IL, USA, 2019; pp. 115–146.
Jarrahi, M. Artificial intelligence and the future of work: Human-AI symbiosis in organizational decisionmaking. Bus. Horiz. 2018, 61, 577–586. [CrossRef]
Yanqing, D.; Edwards, J.; Dwivedi, Y. Artificial intelligence for decision making in the era of Big Data. Int. J.Inf. Manag. 2019, 48, 63–71.
Paschek, D.; Luminosu, C.; Dra, A. Automated business process management-in times of digital transformation using machine learning or artificial intelligence. In MATEC Web of Conferences; EDP Sciences:Les Ulis, France, 2017; Volume 121.
Varian, H. Artificial Intelligence, Economics, and Industrial Organization; National Bureau of Economic Research: Cambridge, MA, USA, 2018.
Vardarlier, P.; Zafer, C. Use of Artificial Intelligence as Business Strategy in Recruitment Process and Social Perspective. In Digital Business Strategies in Blockchain Ecosystems; Springer: Berlin/Heidelberg, Germany, 2019; pp. 355–373.
Gupta, P.; Fernandes, S.; Manish, J. Automation in Recruitment: A New Frontier. J. Inf. Technol. Teach. Cases2018, 8, 118–125. [CrossRef]
Geetha, R.; Bhanu Sree Reddy, D. Recruitment through artificial intelligence: A conceptual study. Int. J. Mech.Eng. Technol. 2018, 9, 63–70.
Syam, N.; Sharma, A. Waiting for a sales renaissance in the fourth industrial revolution: Machine learning and artificial intelligence in sales research and practice. Ind. Mark. Manag. 2018, 69, 135–146. [CrossRef]
Mishra, S.; Lama, D.; Pal, Y. Human Resource Predictive Analytics (HRPA) For HR Management in Organizations. Int. J. Sci. Technol. Res. 2016, 5, 33–35.
Jain, N.; Maitri. Big Data and Predictive Analytics: A Facilitator for Talent Management. In Data Science Landscape; Springer: Singapore, 2018; pp. 199–204.
Boushey, H.; Glynn, S.J. There Are Significant Business Costs to Replacing Employees. Cent. Am. Prog.2012, 16, 1–9.
Martin, L. How to retain motivated employees in their jobs? Econ. Ind. Democr. 2018, 34, 25–41. [CrossRef]
involvement management and organizational performance: The mediating roles of job satisfaction and wellbeing. Hum. Relat. 2012, 65, 419–446. [CrossRef]
Zelenski, J.M.; Murphy, S.A.; Jenkins, D.A. The happy-productive worker thesis revisited. J. Happiness Stud.2008, 9, 521–537. [CrossRef]
Clark, A.E. What really matters in a job? Hedonic measurement using quit data. Labour Econ. 2001, 8, 223–242.[CrossRef]
Clark, A.E.; Georgellis, Y.; Sanfey, P. Job satisfaction, wage changes, and quits: vidence from Germany.Res. Labor Econ. 1998, 17, 95–121.Computers 2020, 9, 86 17 of 17
Delfgaauw, J. The effect of job satisfaction on job search: Not just whether, but also where. Labour Econ.2007, 14, 299–317. [CrossRef]
Green, F. Well-being, job satisfaction and labour mobility. Labour Econ. 2010, 17, 897–903. [CrossRef]
Kristensen, N.; Westergaard-Nielsen, N. Job satisfaction and quits — which job characteristics matters most?Dan. Econ. J. 2006, 144, 230–249.
Marchington, M.; Wilkinson, A.; Donnelly, R.; Kynighou, A. Human Resource Management at Work; Kogan PagePublishers: London, UK, 2016.
Van Reenen, J. Human resource management and productivity. In Handbook of Labor Economics; Elsevier:Amsterdam, The Netherlands, 2011.
Deepak, K.D.; Guthrie, J.; Wright, P. Human Resource Management and Labor Productivity: Does IndustryMatter? Acad. Manag. J. 2005, 48, 135–145.
Gordini, N.; Veglio, V. Customers churn prediction and marketing retention strategies. An application ofsupport vector machines based on the AUC parameter-selection technique in B2B e-commerce industry.Ind. Mark. Manag. 2016, 62, 100–107. [CrossRef]
Keramati, A.; Jafari-Marandi, R.; Aliannejadi, M.; Ahmadian, I.; Mozaffari, M.; Abbasi, U. Improved churnprediction in telecommunication industry using data mining techniques. Appl. Soft Comput. 2014, 24, 994–1012.[CrossRef]
Alao, D.; Adeyemo, A. Analyzing employee attrition using decision tree algorithms. Comput. Inf. Syst. Dev.Inf. Allied Res. J. 2013, 4, 17–28.
Nagadevara, V. Early Prediction of Employee Attrition in Software Companies-Application of Data MiningTechniques. Res. Pract. Hum. Resour. Manag. 2008, 16,
Rombaut, E.; Guerry, M.A. Predicting voluntary turnover through Human Resources database analysis.Manag. Res. Rev. 2018, 41, 96–112. [CrossRef]
Usha, P.; Balaji, N. Analysing Employee attrition using machine learning. Karpagam J. Comput. Sci. 2019, 13,277–282.
Ponnuru, S.; Merugumala, G.; Padigala, S.; Vanga, R.; Kantapalli, B. Employee Attrition Prediction usingLogistic Regression. Int. J. Res. Appl. Sci. Eng. Technol. 2020, 8, 2871–2875. [CrossRef]
Microsoft Docs: Team Data Science Process.
IBM HR Analytics Employee.
CrowdFlower. Data Science Report. 2016.
Antecol, H.; Cobb-Clark, D. Racial harassment, job satisfaction, and intentions to remain in the military.J. Popul. Econ. 2009, 22, 713–738. [CrossRef]
Böckerman, P.; Ilmakunnas, P. Job disamenities, job satisfaction, quit intentions, and actual separations:Putting the pieces together. Ind. Relations 2009, 48, 73–96. [CrossRef]
Theodossiou, I.; Zangelidis, A. Should I stay or should I go? The effect of gender, education andunemployment on labour market transitions. Labour Econ. 2009, 16, 566–577. [CrossRef]
Böckerman, P.; Ilmakunnas, P.; Jokisaari, M.; Vuori, J. Who stays unwillingly in a job? A study based on arepresentative random sample of employees. Econ. Ind. Democr. 2013, 34, 25–41. [CrossRef]
Griffeth, R.W.; Hom, P.W.; Gaertner, S. A meta-analysis of antecedents and correlates of employee turnover:Update, moderator tests, and research implications for the next millennium. J. Manag. 2000, 26, 463–488.[CrossRef]

DEV Community

Great Resignations, employee attrition analysis using Machine Learning Algorithms

Top comments (0)