Project 3 - Rules of Attrition
The contents of this project are classified…
…by algorithm.
Project 3 at Metis saw our first foray into the world of classification algorithms. We were tasked to find a dataset with somewhere between 1000 and 100000 entries, 10 or more features, and a question we could answer using binary or multiclass partitioning algorithms.
A large part of my work history is in Human Resources and Recruiting, so I set out to find a dataset to help me answer figure out what is it that makes people quit jobs, and what can we do to change that?
A Word on Attrition:
I think many business leaders underestimate the true cost of turnover. The reality is that at this moment in time, it is truly an employee’s market. Skilled workers have never been in such high demand, and that can be seen in trends in attrition.
In their annual study, The Work Institute found that in 2018, more than 27% of employees nationwide voluntarily left a job, accounting for more than 41 million job vacancies. Their survey data also concludes that 77% of these attritions are preventable.
The Society for Human Resource Management (SHRM) estimates that replacing an attrited employee can cost between 50% and 250% of their annual salary. They estimate that in 2018 alone, attrition costs for US employers topped $600 billion.
We’ll use some of these numbers later.
The Data:
For this project, I used a dataset called IBM HR Analytics Employee Attrition & Performance from Kaggle. This is a synthetic dataset from IBM’s data science team. There are 1470 individual employee entries and 35 features. On some light inspection, we find the following:
237 of 1470 entries are attrited (16%). As expected, there are trends that we see widely in HR on the whole. Attrited employees generally had:
fewer total working years, lower income, and lower job level
lower ratings for job satisfaction and job involvement
longer commutes and more frequent travel demands
None of these findings are exactly groundbreaking, but with this many features, I was sure there was something to uncover in this data.
The Model:
It is important to note that this dataset has an implicit “survivor bias”. Once employees quit, they don’t tend to un-quit. It quickly becomes important to recognize that we still have employees who are more likely to quit in our currently un-attrited population.
My primary goal with this investigation was to come out of it with an interpretable and actionable model. To be able to forecast with accuracy whether or not an employee is likely to quit is a novel party trick, but it becomes a pointless exercise if we don’t conclude what features are most impactful in retention.
Given this, I used oversampling to balance out my attrition classes and scaled the data for interpretability. I then ran a slew of boilerplate models and compared them to see which ones performed best.
As illustrated by this chart, the logistic regression and Random Forest models generally outperform the rest at almost every confidence level.
This wound up working in my favor in terms of interpretability; both logistic regressions and Random Forests have coefficient importance built into their metric interpretations.
At a threshold of 35% confidence, our logistic regression has a recall score of 76%, echoing the proportion of preventable attritions.
Findings, Part 1:
Our model’s feature coefficients tell an interesting, but seemingly contradictory, story:
Strongest predictors of retention:
Years in current role
Stock option level
Years with current manager
Job Involvement
Strongest predictors of attrition:
Overtime obligations
Years at company
Years since last promotion
Frequent business travel
It’s not surprising that the longer an employee stays at a company, the more likely they are to leave; that’s just how time works.
In direct opposition to this, the longer an employee stays in a particular role and the longer they stay with their current manager, the more likely they are to stay in their job. I thought these two narratives were mutually exclusive until I visualized attrition against years with manager.
There seems to be a wall in an employee’s first year with their manager. More than one third of employees in this group wound up attriting. This is significantly reduced after one year and slowly continues to degrade as time moves on.
Visualizing the Cost of Attrition:
The estimated cost of attrition for this group of employees is a staggering $15.7 million.
This is based on the assumption the cost of backfill to be between 50-250% of an employee’s annual salary.
This Tableau Dashboard breaks down the cost of attrition for each job level (1-5).
In this breakdown, we can see something that wasn’t clear in our initial findings…
Findings, Part 2:
Level 3 employees are not our largest population, nor are they the most expensive to replace, but the intersection of cost and population make them indisputably the most expensive group overall.
Though they only make up 13.5% of our company’s attritions, they account for 34.3% of the total estimated losses.
Grouping the data by level, level 3 employees ranked:
4th of 5 in job involvement
5th of 5 in workplace satisfaction and
job satisfaction
Conclusion:
As a general recommendation for this company, I would push for a more robust training program. This would kill two birds with one stone.
Firstly, robust management training could have a huge impact on entry-level retention. It would also significantly reduce instances of employees quitting during their first year with their manager.
Secondly, it would boost engagement and morale with level 3 employees. It’s no secret that middle management can be extremely tough. One of the hardest transitions an employee can make is from an individual contributor to a people manager. Enabling these employees with the tools they need to thrive can spare them from some of the challenges of this transition.
This would allow us to not work so hard on the workplace and focus more on the work.