Andrew Harrison Way

View Original

Project 2 - Regression Progression, or How I Learned to Stop Worrying and Love Machine Learning



A Confession:

Until my admission to Metis in late 2019, I hadn’t thought about statistics since Dr. Letarte’s class in 11th grade.  In that era, I was only peripherally interested in mathematics and significantly more interested in becoming a Broadway star.

Statistics was unlike any class that preceded it.  Algebra, Geometry, and Pre-calculus were all rooted in some sort of mathematical reality.  There were formulas and variables, the simple building blocks of math. Each question had a specific answer.  Stat was different. I was wary of things like distributions. If you couldn’t sort your hypothetical groups of people into neat, fractional buckets, what was the point?

Stat passed with barely-there marks, and I went through my whole education without having to reconsider greek symbols ever again.  Ten years later I would face my own reckoning.


Arriving at a question:

We were prompted to find a question that could possibly be solved with linear regression.  It would be our first time utilizing machine learning techniques, our first model (or at least mine), plenty of firsts.

I’ve been obsessed with pop music for around the last 4-5 years.  Coming from a background of musical theatre, I used to think it was cool to dismiss pop music for being made for the masses.  In reality, pop music is an incredible force for bringing people together. And 2019 was a great year for it. Billie Eilish pioneered intimate bedroom pop, Lil Nas X set a new record for longest run at no. 1 on the Hot 100 (19 weeks!), and Mariah Carey finally got the recognition she deserved for making a contemporary Christmas classic.

Using newly minted skills (aka BeautifulSoup), I scraped the Billboard Hot 100 for every week in 2019 to try and answer the following:

What is it about this music that we all love?  And if we can find it, can we use it to forecast how far a song will go?


Diving In:

This may seem shocking, but there’s really nothing predictive about a song’s features that can indicate chart success, at least not in the models I was working with.  Having begun with 12 features, I wound up using LASSO to remove features one by one. And after all that LASSO-ing, I was left with only 2 features in a model performing nearly identically to my full-featured first-draft model.

OLS, 12 features


OLS + LASSO, 2 features

Unsurprisingly, the two strongest indicators of how long a song will spend on the chart are:

  1. Peak position

  2. A metric from Spotify called “Popularity”.  Popularity is calculated by play-count, giving weight to more recent plays.  For example, a song played 1000 times today would have a higher Popularity score than a song played 1000 times yesterday.

This finding is even less impressive when you consider that the Hot 100 actually started incorporating streaming counts into their rankings beginning in 2012, so the Popularity metric and the Hot 100 are basically counting the same thing.


Drawing a conclusion:

Perhaps if anyone were able to answer the question “what makes a song popular,” the person answering would probably not be a fledgling data scientist/former theatre kid with a dubious stats background.  And the answer would have probably been monetized by now.

Instead, I can only extrapolate what this lack of a pattern means.  I found some pertinent theory from far outside the realm of data science, and also outside the realm of music.

Raymond Loewy was an acclaimed mid-century French-American industrial designer.  He is responsible for logos for many staple companies including Exxon, Nabisco, and the US Postal Service, as well for a design philosophy known as MAYA: Most Acceptable, Yet Advanced.

Human brains crave two things that are seemingly at odds: the thrill of the unknown and new, and the comfort of the familiar.  What Loewy aspired to do in his design was to push the boundaries of products into new territories, but never to the point where they felt foreign or unfamiliar.

I think this theory can be neatly applied to pop music.  The music that moves us tends to have that dualistic quality of being both warmly familiar and excitingly groundbreaking.  Given the opportunity, I would probably reevaluate these tracks to try and find what they don’t have in common rather than what they do.

Perhaps that will bring us one step closer to understanding what puts the pop in pop music.

Andrew Way, Jan 2020