Big Data Meets Small Data

by JenniferCobb on 07/10/2012

Post image for Big Data Meets Small Data

I recently attended a day full of Big Data conversations at UC Berkeley. We heard from investors and startups, academics and technology professionals.  These people are thinking hard about the promises of Big Data.  I left with one overriding thought.  Big Data must remain deeply connected to the relatively small but very powerful element of individual purpose and intention if it is going to offer a valuable contribution to human knowledge and evolution.

When is data Big Data?  Most industry people answer in terms of the “three Vs” – variety, velocity and volume.  Big Data is composed of structured and unstructured data with lots of semantic variability; it tends to happen in near real-time; and it tends to be counted in petabytes.  But while the three Vs may help us know Big Data when we see it, they do little to help us understand how to understand and use Big Data.  The challenges here are vast and are well articulated in an excellent Big Data white paper.

Usability is a very real concern.  Even assuming we can get good at acquiring, cleaning and aggregating the data, which is a tall order, we still need to do analysis/modeling and interpretation.  And we know from years of work in AI, that while algorithms can be very effective at some things, it is much harder for them to be good at things that are instinctive to humans.  Big Data without human instinct and interpretation in the mix remains a limited opportunity.

Yet the wish among many data scientists to have data equal truth creates a strong bias in this young industry toward the idea that the right algorithm will lead to better decision-making, across the board.  For some, data is more than a low-level layer in the hierarchy of data → information → knowledge → wisdom.  Gil Elbaz, founder of Factual, revealed this bias in an enlightening moment.

Factual is a start-up whose goal is to be the ultimate datamart working to collect every fact in the world – a huge and daunting task – in order to “try and predict truth.”  Elbaz acknowledged the difficulty of this task given that the very idea of a fact is a slippery concept, “Nothing is fact and nothing is an opinion.  There is a spectrum,” Elbaz commented.

Quintin Hardy of the NYT, who was interviewing Elbaz, asked him point blank, “Where do you not want to cede control to automated decision-making?  What is it that people do better?”  Elbaz paused for a long time and was unable to answer the question.  He could not think of a single example where a human decision would not be made better by the intervention of machines.

The problems in this point of view are myriad.  For one, machine-enabled decision-making is only as good as the algorithms governing it.  And those algorithms are limited by the people that create them.  Even Factual uses people to determine “truth” by crowdsourcing validation through Mechanical Turk.  If enough people can validate a truth, Factual decides it must be true.

The Tangle of Truth

The subjective and the objective become hopelessly entangled in the Big Data world, just as they do in the rest of reality.  Trying to untangle these twin elements of perception is a question that lay underneath many of the conference sessions.  DJ Patil from Greylock Partners, who cut his Big Data teeth at LinkedIn, believes Big Data needs to be taken out of the back office and made available to decision-makers throughout the company.  However, without the intervention of trained experts, the capacity for the data to be well understood drops.  His solution?  Use the data to have a conversation rather than to make a decision.  Patil said, “The best Big Data scientists are story tellers.”

If story is the interface to data, then won’t we tend to shape the data to fit the needs of the story?  Stories are inherently biased, as are the storytellers.  Here is where the humanities become important.  Many of the social sciences have been working for years to shape models that help understand data as well as surface biases and assumptions.  One of the most interesting sessions at the conference was when the social scientists took the stage.  Their message was that the models matter.  The linguists have something to teach the data scientists.

Cook Your Data With Care

In a session about Big Data and development, the question was raised, “Is the data in charge of the politics or is the politics in charge of the data?”  I would add to this excellent question, am I in charge of the data or is the data in charge of me?  What would it mean to live in a world where the data is in charge of me?  Where I cede my decisions about where to go, what to buy, what to eat, who to be friends with, where to live – to recommendation engines and other hip Big Data algorithms?   Sounds like a fast path to becoming passive consumers of our world as opposed to active participants.

The way forward will be both quantitative and qualitative.  The data we collect is as critical to the outcomes as the questions we ask.  We need subject matter experts, social scientists and statisticians.  Most of all, we need Small Data.

Small Data is my data.  It is the data about me that generates my knowledge and wisdom.  This data is foundational to my intentions, my wants, my needs, my intuitions.   Though I am deeply flawed, I remain the best subject-matter expert on my own life.

The promise of Big Data is that it can add a significant layer of intelligence into our world.  We can ask a whole host of important and interesting questions.  As danah boyd reminds us, “Raw data is both an oxymoron and false.  Data needs to be cooked with care.”  With strong, ethical frameworks around privacy and autonomy, we can analyze significant social and commercial trends for the first time.  But we need to understand that this analysis does not supply The Answer or The Truth.  It supplies another data point that we have the option of synthesizing with our unique intelligence into a larger picture.  Yes, Big Data can add immeasurably to our understanding, but only if we wrap it with our wisdom to reach our own truths.  Big Data needs Small Data to reach its true potential.

Leave a Comment

{ 1 trackback }

Previous post:

Next post: