The Three Common Mistakes Made with Human Data

Zuzanna Pasierbinska-Wilson

With Human Data comes human fallibility, but awareness is first step to mitigating our error prone nature. In this piece, we will take a look at the most common mistakes we humans make when it comes to collecting, analyzing, and drawing conclusions on Human Data.

1. Mistaking correlation with causality.

You might remember your college statistics 101 or 102 professor letting you know that just because something—let’s call it thing A—is related to or shows up a lot when thing B is also around, doesn’t mean that thing B is causing thing A to happen. As tempting as it might be to make that link, you have to be careful not to automatically do so. Essentially, when two events happen together, it is a mistake to consider them to have a cause and effect relationship by default. You can certainly make the assumption and perform back and forward testing to get to the fundamental nature of the relationship, but you will make errors if you simply assume that causality happens because of correlation. How does this apply to Human Data? Perhaps if you see a trend that people tweet with messages containing certain hashtags at a certain time, and separately you notice that a TV show happens to air at a certain time two hours before, you might think that people are tweeting based on that TV show. However, it could very well be that the TV show is a re-run, not a new episode, or that show may be airing live in a different time zone. While the resulting conclusion may be true, you need to carefully inspect your assumptions and hypothesis in order to draw the correct conclusions more often.

2. Falling victim to presentation bias.

We are all humans, and it is perfectly within our nature to be subject to countless psychological and physiological biases. We are not machines. How we rise above bias is to identify a bias, understand its effect, and “circle around” it to try to mitigate its effects on our decisions. Presentation bias is but one of the many biases we face, but it is a real factor in data analysis. This is more for the managers and business decisions makers, but all can suffer from it on either end of the presentation pipeline. Essentially, presentation bias refers to how one describes, or is persuaded by, a set of options and the degree to which those descriptions or decisions are affected by how the set of options is laid out and presented. This bias relates mainly to your experiences as a researcher, your philosophy as an executive, and your story as a developer. How you give information, lay it out, even down to what you measure – all of these aspects of your professional life are affected by your own personal story, and the presentation of human data is subject to these human tendencies. The trap with human data in particular is navigating the presentation bias of the researcher or developer, the presentation bias of the person doing the analysis and making the decisions, and the presentation bias of the humans that are generating the data you are studying.

3. Not understanding a phenomenon called reification.

To save you a Wikipedia search, I will define here that reification is the process of creating something, making something real, or bringing something into existence or a state of being. This concept relates to the overall objective of your human data program or study: why are you doing this at all? Why are you measuring behaviors? What are you hoping to change or improve? Fundamentally, you want to change peoples’ behaviors. You want to influence a purchase decision. You want to enhance your brand’s perception among a certain demographic. You want to increase your average cart size. You want to find out why customers are turning down certain offers or purchasing competitor’s services and sway their purchase to your business. So you build models to reflect current consumer behavior and give yourself a baseline to measure how successful your efforts to influence that behavior are, and then you start the process of making that happen by using your human data to make changes and put into place new strategies. But fast forward six, 12, 18 months—will your models still be accurate? If your efforts are successful, they will not be, because behaviors have changed! Your baseline has moved. You have reified a behavior and now you cannot accurately predict that trend line of change to continue because something else is real and true that was not before.

As DataSift, we give you a powerful platform to unify all the Human Data in a single place and an opportunity to ask this data any question. What questions you ask and what conclusions you draw from them, it’s up to you.

Zuzanna Pasierbinska-Wilson

Written by Zuzanna Pasierbinska-Wilson

Zuzanna is SVP, Marketing at DataSift. You can follow her @fattypontoonski.

Share This