The Data Science Pyramid


The data science pyramid represents the importance of data and methods for analyzing it to a business. There are several ways to read this chart.

If I’m a business and I understand the importance of data, I view this chart from the bottom to the top in terms of low value to high value. Rungs 1 through 4 are very important, which is why they are at the bottom. Without them no analysis is even possible, but as a business I still see it as an investment and as cost of doing business. Rungs 5 and 6 are ultimately what I as a business really care about. Insight is slightly lower on the pyramid because while important, ultimately the business truly cares about how that insight informs and shapes strategy.

As a data scientist my main concerns tend to be about the lower rungs, 1 through 4. Given that many data scientists are technologists they care deeply about platforms and tools: (which language do we use R or Python, which database SQL or NoSQL?) about methods and algorithms (do we use linear regression or do we use random forests?) and about data products (do we build a dashboard?, which statistical measure do we use?)

It is important for us to understand that for a business those questions are largely irrelevant. As such there’s a fundamental tension between the first four rungs and the top two with respect to how data science related to business. Our job as data scientists is primarily to solve business problems that have to do with insight and strategy, to inform and help decision making rather than bog them down with detailed calculations.

This tension became very evident to me when I recently interviewed a candidate for a an analyst role. Contrary to the typical interview, he decided to give a presentation to the VPs to showcase a paper he had written on a unique method of doing analysis. It didn’t take long for me to see that he cared deeply about the methods he was using and the brilliant calculations he’d come up with to solve this particular problem. It was also very evident to me that he was really smart and knew his craft very well.

However he didn’t think above rung 4 on the pyramid. He could not translate his findings into insight and ultimately into strategy for the business. He ended up alienating the VPs who had very logical and business related questions while he kept thinking in terms of calculations and analysis. To him rungs 5 and 6 didn’t exist, but to the VPs they were all that mattered.

If we’re going do solve the projected 50%-60% talent gap [1] in data scientists we’re not going to do it by focusing on how to do deeper analysis and number crunching but instead by being aware of the ultimate value of data science, insight and strategy.


2 thoughts on “The Data Science Pyramid

  1. Pingback: Purpose of this Page | KELVIN TAN

Leave a Reply

Fill in your details below or click an icon to log in: Logo

You are commenting using your account. Log Out /  Change )

Google photo

You are commenting using your Google account. Log Out /  Change )

Twitter picture

You are commenting using your Twitter account. Log Out /  Change )

Facebook photo

You are commenting using your Facebook account. Log Out /  Change )

Connecting to %s