How I taught myself data science in 90 days

I first encountered data science (aka data mining, machine leaning, artificial intelligence, etc) back in 2014. I was a newly minted analyst trying to expand my expertise beyond the basics and my company had just hired its first data scientist.

I got curious about what she did, so after talking with her for a while, I was sobered to realize that I would need two more years of education and a graduate degree in statistics if I wanted to properly to call myself a “data scientist.”

The only problem was that after my experience getting an MBA, I had sworn off any kind of academic style learning. While I made a number of connections and friends, the degree itself had not taught me anything useful. In fact, I have learned more about business by reading books in my own time than I ever did in college.

If I was going to learn data science quickly, I was going to learn it on my own. I also realized that with the exception of a few select fields, companies don’t care much about your degrees, they care about what you can do for them, what kinds of value you can provide, and in many cases, you can provide incredible value without needing another college degree.

What I really wanted was to understand enough key concepts of data science that I could apply them in the real world to produce something useful.

So I thought “hey, I’m a smart guy, I can probably figure this out.”

Yes, I was quite conceited back then.

The challenge for me was that I absolutely hated statistics, and when I took the class in college I found it incredibly hard to understand, nonetheless I began my quest.

I started by looking for a few books that taught the key concepts of data mining with a bend towards applicability. They were quite hard to find and the few books I did find were dull and boring. But, I did my best to get through them and I managed to learn a few theoretical concepts.

Next, I searched for videos on YouTube. There weren’t many, but what I did find was very interesting. There were some videos that demonstrated the use of a free visual data mining tool called RapidMiner.

The author had many examples with the code and data easily downloadable so I could get them and try them on my own. It was exactly what I needed because it allowed me to see the concepts I was learning applied in the real world.

JIC vs JIT learning

I believe that the best way to learn is to solve a problem that you care deeply about or are strongly motivated to solve. It could be a personal problem, or a professional one and it should allow you to apply theoretical concepts to a concrete problem.

But why?

Almost all colleges apply the same framework towards learning. I call it Just In Case (JIC) learning.

You start by learning all the fundamental concepts first, you then apply these concepts to artificial problems in the book (which you don’t really care about) that are very clearly laid out and where there’s usually only one correct solution.

You then continue to learn more concepts that build on the fundamental ones you learned previously, you continue to apply them to even more artificial problems you don’t care about, in the hopes that some day you’ll need this knowledge to solve problems.

This theory of learning believes that knowledge builds on top of itself like a pyramid. In fact, many text books are set up this way. Fundamental concepts first, more specific knowledge later.

In real life, however, you start with a very specific problem you’re trying to solve, you search for the solution and once you find it, you can generalize that solution to other similar problems. I call that Just In Time (JIT) learning. You only learn things just before you actually need them, which maximizes both usefulness and retention.

That’s why I chose a specific project at work to apply data science on, that would both benefit the company, and teach me how to do data science in practice. As I struggled with the project, I learned another dark secret nobody tells you about in college.

The problems you solve during your classes are artificially set up to be easily gradable not to maximize learning.

For example in pretty much all data science books and courses, the data you work with has already been selected, cleaned and staged in order to make it easy for you to build the model. In real life it’s never that easy.

The hardest problem I struggled with during my project was figuring out what data to choose for my model in such a way that made sense, and I had no one to ask. No articles or online courses to follow. I happen to find the answer in an obscure book written by a practitioner vs theoretical books written by professors.

That’s how I managed to teach myself data science in just three months without having to go back to school and get a statistics degree. As a side benefit, I now understand and enjoy statistics. It makes sense to me because I have seen it applied in the real world.

Years later in my new job, I repeated the same process to learn another aspect of data science/ machine learning this time by doing a hackathon project at work. It cemented the lessons I had learned and taught me even more valuable skills.

It is because of these, and other similar experiences that I believe that unstructured learning is the key to an amazing career. Many valuable things can only be learned the hard way through experience not by going to school.

Obliquity – Why some problems cannot be solved directly

On March 30, 2017 a large portion of the Interstate 85 (I85) highway in Atlanta, GA collapsed after a massive fire that raged underneath it.

Being a key piece of infrastructure that carries thousands of cars every day, experts predicted severe traffic congestion and delays. Yet, none of this materialized. People simply changes their behavior; in fact Atlanta’s public transportation (MARTA) reported a 25% spike in ridership following the incident.

On the other hand, adding a lane to an existing highway usually makes congestion worse. This is known as Braess’s Paradox. Traffic congestion is one of those complex problems that simply cannot be solved with a direct solution of building more roads.

Have there been times when you tried to tackle a problem head on and failed? Some problems are best tackled indirectly. Why?

In order to understand why this happens we have to first understand a few things about complex systems. As explained in my previous article, there are three types of systems categorized by the level of constraint on both the system and the agents operating in it.

While ordered systems are transparent (simple systems are transparent to everyone and complicated systems are transparent to experts) complex systems seem transparent but are in fact opaque. We simply cannot know everything that happens in these systems. We think we know, but we usually have a very limited understanding of the complexity inherent in these domains.

John Kay calls this phenomenon Obliquity and explains it in detail in his book by the same title. He writes:

The environment—social, commercial, natural—in which we operate changes over time and as we interact with it. Our knowledge of that complex environment is necessarily piecemeal and imperfect.

The human mind is programmed to look for patterns and to seek causes, and this approach is often valuable. But that programming leads us to see patterns in random events and to attribute intentions where none existed. We believe we observe directness in obliquity

Because of this, direct solutions almost never work as intended and usually have unforeseen consequences or adverse effects, like the increase in congestion when more roads are open. 

A good example of this is the so called cobra effect which is based on an anecdote about a bounty program created in British colonial India where the government tried to fix the problem of venomous cobras by offering a bounty for every dead cobra. 

This worked initially but then people started breeding cobras for income. When the government found out, the scrapped the program causing the breeders to release their worthless cobras making the problem worse.

It is because of this that I believe the first step in tackling any problem is to get a sense for the type of environment we’re dealing with. 

If the environment or domain is simple, the solution should be self evident. We simply sense what’s happening, we categorize, prioritize and solve the problem. 

If the domain or system is complicated, like a car’s engine or a software system. we hire experts to analyze the issue, get a sense for what the problem is and solve it.

 If we’re dealing with a complex domain or environment, we cannot solve the problem by analyzing. We have to adopt a more experimental, discovery based approach. We have to try things and see how they work; we have to probe, sense and then respond accordingly.

You assess the situation quickly, form a hypothesis, design and carry out a small-scale, safe-to fail experiment and analyze the results, Then you reassess your hypothesis and figure out if you’ve solved the problem. Other times you can leverage what’s already there, what you sense and see that’s already working.

Why plans are useless but planning indispensable

“Plans are worthless, but planning is everything”
-Dwight D. Eisenhower

“Failure to plan is planning to fail”
-Benjamin Franklin

We all have the fantasy of the perfect plan that goes without a hitch. Heist movies, like Ocean’s Eleven (Twelve and Thirteen), The Italian Job, The Bank Job, etc. all fuel that fantasy that you can be a mastermind capable of seeing all the angles, predicting everyone’s behavior several moves ahead, getting the timing right down to the second and achieving your goal exactly as you planned. In the real world however this is rarely the case. Why?

We live in a complex, interconnected world. Every action we take can cause ripples of unpredictability in the system. Complex systems are by their very nature unpredictable because there are no universal laws that govern them. Even if every agent in the system were to have simple rules by which they make decisions, the overall system behavior that emerges is unpredictable.

To account for all the possible scenarios quickly exceeds the capacity of even the most powerful of today’s computers. Just look at the weather patterns. Despite all the advances in computational power and simulation capabilities, we still can only forecast the weather with any level of accuracy a few days in advance. The complex behavior of the water molecules, the air temperature, atmospheric pressure, initial conditions and other factors make it nearly impossible to analyze and predict what will happen.

There are however systems which are highly predictable even if they seem very complex. A computer program’s behavior for example is very predictable (most of the time anyway) A car’s various systems: the engine, transmission, brakes, electrical systems, etc. are also very predictable even if they are interconnected and interdependent.

So what’s the difference?

David Snowden’s Cynefin framework (pronounced kun-ev-in) recognizes three types of systems: Ordered, Complex and Chaotic. The difference between them is the level of constraint in each system.

Ordered systems are highly constrained and as such their behavior is very deterministic and predictable. You can easily determine cause and effect and the patterns you find are very likely to repeat in the future. Ordered systems are further divided into Simple and Complicated. A highly structured business process for example (like getting a loan) is a Simple system. It’s highly constrained and relatively easy to fix or optimize. Cause and effect relationships are clearly visible and you can predict with very high accuracy what will happen.

A car is an example of a Complicated system. It’s still Ordered because it’s highly constrained (there’s little to no variation beyond what’s been specified by the system designer) but the level of detail in the design makes it much harder to understand and notice cause and effect relationships. This is why you need highly trained professionals (experts) to analyze the system and figure out cause and effect relationships.

Complex systems on the other hand are only partially constrained. Complexity science is still actively being studied and discovered but we do know a few things that can help us understand how these systems work. Complex systems are made up of agents that interact with each other and with the system based on their own rules and strategies and the constraints imposed by the system.

In the example above we saw that cars were Ordered systems because of the high level of constraint in every aspect of their design; traffic on the other hand is only partially constrained and as such it’s a Complex system. There are rules in the form of laws and guidelines such as speed limits, traffic signs, traffic lights, highways, ramps, paved roads, direction of driving, etc. but these rules do not fully constrain driving. You can choose to dive fast or slow, change lanes frequently or not at all, slow down or speed up, turn left or right, etc.

This creates unpredictable emergent patterns such as accidents, traffic jams, traffic congestion or sparsity, etc. On top of that, the traffic patterns from moment to moment, from day to day are completely novel and unique. There’s no way to know for sure when an accident will occur or when the traffic will become congested. Even though you may know exactly why an accident happened, it doesn’t help you fully predict future accidents.

Chaotic systems are highly unconstrained. Imagine for a second that one day none of the rules of driving applied. You could drive in the middle of the road if you wanted, drive backwards, go through red lights and stop signs, drive on the opposite side of the road, cut through lanes at will, make sudden u-turns, break and accelerate as you wished, etc. What would happen? Complete and utter chaos. It would be impossible to predict anything.

Side Note: Temporarily removing constrains in a system is an excellent way to unclog bureaucratic gridlock in an organization and spur innovation. Dave Snowden calls this “shallow dive into chaos” but that’s a topic for another day.

So how does this relate to planning?

Most planning is done under the assumption of Ordered systems. We assume that the future is predictable from past events so making plans is easy. Planning comes naturally to us as our brains function like cybernetic (goal seeking) systems. We set a goal and immediately our brain provides ways to achieve it.

Now if the system you’re dealing with is highly constrained, these plans are very likely to succeed. For example if you wanted to buy a house you’d need a bank loan and since getting a loan is an Ordered system, given certain criteria, you can predict with very high accuracy if you will succeed or fail.

If we’re dealing with a complex system however, or a chaotic system, we would be unable to account for all the possible future scenarios and contingencies and our plans would be at best incomplete. Before the advent of GPS and turn-by-turn navigation systems with up the the minute traffic data, it was impossible to plan a route down to the minute and be very confident you would arrive at a particular time.

So the reason why plans are useless is that more often than not they are incomplete and don’t account for all the possible contingencies in the complexity of today’s systems.

Why then is planning indispensable?

The process of planning gets us to think through many of the possible futures and scenarios that can unfold and help us be better prepared if any of those futures scenarios were to happen by creating contingencies. Of course we can’t cover every single scenario and we need to be agile and capable of course correction. The measure of true agility is the ability to ditch your plans halfway through when the situation has changed and made your plans obsolete even if the sunk cost might be high.

Always have multiple theories for explaining and understanding things

When trying to understand or explain something that’s happening, like a certain behavior pattern in your friends or significant other or a trend in fashion, technology, etc, it helps to have more than one hypothesis (theory), (even better if it’s more than two) and assign each one a probability of being right.

Then as you get more evidence for any one of your multiple theories, you adjust the probabilities of what the correct explanation could be. You might also run multiple experiments to cover all your theories. This will lead you to a more accurate understanding of people or the world around you which then leads to more accurate forecasts, better decisions, more confidence and decreased levels of stress.

I believe that there’s always more than one way to explain things, there’s always more than one theory that fits a situation and I’m not attached to any one of them at first. This doesn’t mean that I like being wrong, in fact this means that I want to be even more accurate so I want to cover all my bases. As I gather more data, I eventually converge on a single theory, while still keeping an open mind that it could still change in the future.

As humans we’re addicted to being right, it’s a compulsion that threatens to derail our friendships and our relationships. We want our intuition to be the correct one. It’s very easy to get emotionally attached to certain explanations that benefit us, make us feel smarter, more confident and more proud, or that ensure that we keep our jobs.

When you have multiple competing theories for why something is happening you keep yourself open to possibility, and as a result you understand the world better. You might not look as smart or as confident or as self assured as the person with a single theory, but more often than not, you will end up having more accurate predictions and be more confident than them in the end.

The Dangers of Optimization

“We should forget about small efficiencies, say about 97% of the time: premature optimization is the root of all evil” -Donald Knuth

In the field of Computer Science (CS) optimization is the process of changing the algorithm (the logic) of a program in order to improve its efficiency (i.e. make it run faster or consume fewer resources.) In order to do this, certain assumptions need to be put in place and my argument is that these assumptions make optimization dangerous and not just in the context of programming.

First of all in order to optimize a process you need to understand it very well. You need to know how it behaves in different circumstances or under various boundary conditions. If we were to look at an organically (necessity) grown business process that we want to optimize, it’s usually necessary to see where the inefficiencies or bottlenecks are in the process and remove them.

Second, when you optimize you may need to restrict boundaries, make certain assumptions, use special cases, tricks and complex trade-offs which will achieve the required result but at the expense of potentially over-complicating a process or over-specializing it. This causes a loss of agility in the long run or over-adaptation (as may be the case in over-optimized diets).

Let’s use a simple example from manufacturing to explain this. Let’s assume that you have a retool-able machine that produces widgets at 95% defect rate. This means that 5 out of every 100 widgets are defective. When you’re looking to optimize the defect rate, you’re looking to produce fewer defective widgets without slowing down the machine or the manufacturing process. Suppose also that you know the process of making this widget very well, since you make millions of them every year.

Since you know the process well you know that there are just a few ways to optimize the defect rate, you can for example utilize the machine better by redesigning the overall process or you can install much more specialized machines that make just this one widget. Now you have an optimized defect rate but it has come at the expense of over-specialization and a huge loss of flexibility or agility. Imagine what can happen after you’ve replaced your retoolable machines with specialized machines and the market changes to where it no longer needs those widgets. You’re practically out of business.

The danger is that not all processes are fully known. Even in a precise field such as computer science there’s always some level of unknown, certain unexpected conditions or special circumstances where the program will fail. If you were to optimize the process without knowing all these cases you run the risk of having an incorrect program that no longer solves the original problem.

Nowhere is this more prevalent than in digital marketing. First let’s get one obvious thing out of the way. SEO or search engine optimization is not really optimization per se. You’re not optimizing your website to work better or faster, you’re adapting it to fit the search engine’s constraints.

The whole idea behind the concept of targeting is really about optimization. When you’re trying to target a certain segment of your audience, you’re trying to optimize the marketing dollars so you increase the efficiency. After all why would you want to spend money on leads that will not respond to your offer?

There is nothing wrong with targeting (although Roy H. Williams aka The Wizard of Ads would beg to differ) The problems arise when you start to over-optimize while assuming that you really understand your audience and what offers they actually respond to. The truth is you really don’t, so all this optimization will really hurt sales. There are many other things in a business you can optimize and improve, such as your close ratio.

A good example of this is a recent story I read in Roy H. William’s Monday Morning Memo (a highly recommended business newsletter by the way) The story is about two lawyers who take different approaches to marketing. One believes in targeting, and converts about 10% of inquiries while the other believes in casting a wider net via radio advertising and ends up converting 60% of inquiries proving once again that marketing is primarily about the messaging and the offer.

Implicit vs. Explicit Mental Models

There are two kinds of mental models, implicit and explicit. They are categorized based on the acquisition method (i.e. how did they end up in our mind)

Explicit Mental Models

Explicit models are the ones you learn from studying various disciplines such as math, physics, economics, etc. In my last post I talked about Charlie Munger and his mental models he uses to evaluate deals and make investment decisions.

He draws them out of various disciplines and then uses them in contexts where they weren’t necessarily built to be used. For example, my background is in computer science which teaches the principles of computing.

Taking that model and applying it to any electronics device has allowed me to fix a lot of non-computer gadgets. It’s a great model to use for that purpose, but it fails terribly when applied to human interactions. You’ll need another model for that.

Another good example is the supply and demand model from economics. It’s a wonderful model for understanding many facets of human behavior. It can be applied on a micro level – like one-to-one daily transactions between humans – and on a macro level – like the economy of a country.

Note: Both the above examples illustrate the limits and failure of models in general, something that was discussed previously.

These are both examples of explicit models, where you learn the model from an outside source and then you apply it to a situation where it works.

Implicit models are the ones that your mind creates out of various patterns it notices around it through the five senses. The mind is a pattern matching machine. It seeks out patterns in the randomness and tries to make sense of it by creating models. These are also known as generalizations or beliefs.

Implicit Mental Models

Implicit mental models are harder to detect because they work essentially behind the scenes, filtering and distorting reality to fit what we believe. Yes, in case you didn’t know it, when presented with evidence, humans don’t change their minds. Instead, they interpret the facts through their internal mental models, but this is a discussion for another day.

How do you pick up these implicit mental models? There are several ways. First it’s through our culture. Culture indoctrinates us without us even being aware of it. You don’t know it’s there, you don’t know why it’s there, you just assume that’s how things are supposed to be. In fact, many people are unaware of indoctrination effect their culture has until they leave their country and live abroad for a while.

Second it’s through media. This is impossible to escape; every show you watch, every magazine or newspaper article, every movie, every song has built in assumptions and ends up reinforcing the same mental models about reality over and over.

For example, it’s impossible to watch a romantic comedy nowadays without implicitly believing that you’re supposed to have some spark or chemistry with someone right off the bat in order to fall in love, which is then a prerequisite for a successful relationship and marriage. It’s only when you study the history of society that you understand that marriages in the past were often arranged for economical or political reasons.

Third it’s through your peer group. Even if you don’t try, if you hang out with a group of people long enough, you’ll eventually start to change and adapt your mental models to fit those of the leader of the group. This is done completely outside of your awareness, but the processes that occur in your mind (such as reframing and the change of meaning) are very powerful and can be utilized on purpose to upgrade your mind.

How do these models compare?

Of the two, implicit models are the ones that seem to be more deeply entrenched and more likely to be outside of awareness. I believe this is due to the nature of the acquisition method. If the model was installed outside of our awareness, it will tend to operate outside of our awareness and control (or regulate) our life as it on autopilot.

There are benefits to this of course. Since the brain can rely on a predetermined pattern, it doesn’t need to expend energy again to solve the same problem in the future. It writes neurological software and then sets it on autopilot. Unless you explicitly go in and look at the code (by becoming aware of the underlying model) and refactoring it.

Experiments performed on mice in a maze show that brain activity is very high the first time that the mouse runs through the maze to find the hidden piece of cheese. After that, subsequent trials show brain activity leveling off as mice learn the path to the cheese. (see The Power of Habit by Charles Duhig)

On the other hand, being deeply entrenched, implicit models are very difficult to modify when you’re trying to rid yourself of some unwanted pattern of thought or behavior. Explicit models on the other hand, can also get deeply entrenched – this depends a great deal on the emotional charge during the “installation” process – but in general tend to be easily updated, upgraded or removed.

If you’ve learned Newtonian physics and then you delve into general relativity, it’s easy to update your mental model which now becomes more enriched. The only trouble seems to be having the model you’ve learned from a book available to you in the moment when you need it to make a decision or solve a problem.

The power of context

One of the properties of mental models is the concept of a context or situation when or where a model is appropriate. A context can be something like work or home or with friends” You could have the most amazing set of explicit models “installed” in your mind but if they don’t permeate through to the right context, you’ll find yourself using sub-optimal response and behavior patterns.

For example, you could have a set of useful mental models that you use at work, with your colleagues, bosses, underlings, etc. You could be the best manager in the company; your employees could love you, your colleagues could be asking you for advice, but when you go home you find yourself yelling at your spouse or your children. In fact you could be a completely different person.

It’s all in the power of implicit and explicit models. You’re not a different person, you just have a different set of models you could be using implicitly for family life and the work life models don’t seem to permeate there. You’d have to first become aware of them and then put in some effort in order to get them “copied” over.

The Dangers of Mental Models – Intro to Mental Models Continued

In the previous post, I talked about what mental models are and how important they are to your thinking. As we delve  deeper into refactored thinking, mental models are going to become crucial in understanding and implementing the process of refactoring your thoughts.

Mental Model Pitfalls:

First I want to talk about a few pitfalls that are common with mental models of any kind.

There is a tendency of humans to want to simplify things in order to understand them better, but sometimes this simplification is over the top and we end up with a dumbed down model. There are two fallacies that are direct descendants of this tendency.

The first one I call the Single Model Fallacy, and it’s something that plagued me for a long time. The single model fallacy is very simply the tendency for wanting to explain everything with the same model. This is not really anything new, as science has been pushing the idea that there is a single unifying theory that explains everything.

We see the same thing in areas like psychology, where different models of therapy from Freud to Skinner tried to explain human behavior and every single one of them claimed that their model was the right model. I subscribed to this view for way too long, trying desperately to come up with a single unifying theory for why we act the way we do.

It wasn’t until I read this quote from Charlie Munger (Warrant Buffett’s partner and a millionaire in his own right) that I started to see my own faulty thinking. Mr. Munger claims that all you really need to make a decision is a “latticework of mental models” from various disciplines:

“You’ve got to have models in your head. And you’ve got to array your experience—both vicarious and direct—on this latticework of models. You may have noticed students who just try to remember and pound back what is remembered. Well, they fail in school and in life. You’ve got to hang experience on a latticework of models in your head”.  –Charlie Munger (Wordly Wisdom)

The second one, I call Model Reduction and Mapping, and it’s the idea of reducing something new that you don’t know so that it maps into an existing set of concepts in your mind which you already know and understand.

When you’re learning new concepts and ideas, you tend to try and make sense of them from a frame of reference that you already know. For example if you’re studying physics, you will try and map the concepts you learn onto their math counterparts (speed is the a derivative of distance and acceleration is a derivative of speed) This helps you integrate your learning and refactor your thoughts so you understand things better.

There’s an inherent danger to this reduction. It prevents you from learning new things. If you’re always trying to map new concepts into existing concepts, you never learn new things and your view of the world tends to collapse rather than expand.

Ideologies, cults and religions, have an inherent (and secret I might add) interest in teaching you how to reduce and map new concepts into its existing set of beliefs. They use techniques such as relabeling, and reframing to make it seem like every new idea is something that you already know about if you study their stuff. The collapsing effect is absolutely necessary in order to keep people mentally “chained” to them.

How do you prevent this from happening?

The first step is to allow any new material to fit into its own box in your mind and let it simmer there until you’ve had the time to look it over and refactor it into either an existing model, or under its own category. 

As far as the single model fallacy, it’s important to understand that the world as we know it is a far more complicated system that we make it out to be. It might be decades before theoretical physicists even agree on a unifying model of the world if they even get there. Human behavior is another very complex process to fully comprehend. Until then, we have plenty of available models to explain it and to influence it. Don’t stick to just one!