Big Data 101: Should We Be Worried?

by JenniferCobb on 04/22/2011

Big Data makes some people breathless.  And it makes others shiver.  What both sides share is the understanding that Big Data is going to make a big difference, in all of our lives.  Whether that difference serves the social good or not remains to be seen.

Big Data is the information produced by our on-line interactions.  Some people call it “data exhaust” – the by-product of our online shopping, browsing, networking, emailing, calendaring and the like.  Everything we do online leaves a digital trace somewhere.  And those digital traces of human activity are adding up to huge datasets.

Humans aren’t the only players in this mix.  The Internet of Things is poised to be an even bigger piece of the global Big Data pie.  As increasing numbers of things are tagged with devices that produce information – “I am too cold”  “I am no longer functioning”  “I am consuming too much electricity” — Big Data will be not just be information about what we do, but it will be information about everything.  Big Data is going to keep getting bigger.

The notion of massive data sets that offer, for the first time in history, global pictures of whole systems sounds pretty exciting.  Experts in fields ranging from sociology to energy efficiency to economics to philanthropy are jumping into the fray, trying to find ways of mining Big Data for insights, solutions and new products and services.  It has the potential to make our world better, faster and cheaper.

Andrew Zolli, curator at PopTech, recently recounted a story of how Big Data is helping to better people’s everyday lives.  This story focuses on Water for People, a nonprofit that works at a grassroots level with very poor communities around the world to build sources of clean water.  In a recent audit, the organization found that fully 60% of the wells they put together were no longer working.  So they built an automated technology platform called FLOW that allows people in the field to update on a point by point basis the quality of their wells.  The updating takes place via an Android device and it synchs automatically to a Google Maps picture.  Today, the organization and their local partners can see at a glance 60,000 water sources and determine if they are green, yellow or red.

Zolli commented, “We are going to be producing a lot of data in real time for the first time ever to see if the behavioral change models we are putting into place actually work.  This terrifies many people in the space.  For the first time, the data about the thing that we did is going to be a primary asset that we can open source, remix, and do deep mash up analytics on top of.  The ability to push the data out into the field suggests that even when we fail, the precise data about how we failed is going to have enormous value.  The data may do the talking for us in the future.”

Peter Warden, in an article at O’Reilly Radar, argues that the real disruptive power of Big Data is price. With the introduction of Hadoop clusters (an open source, distributed computing model), anyone with an Amazon account can process huge amounts of data for just dollars an hour.  Processing costs are dropping in the same way storage costs dropped in the last decade.  Warden writes, “Now that processing has become cheap too, a whole universe of poverty-stricken hackers, academics, makers, reporters, and startups can do interesting things with massive data sets.”

Investors and marketers are also chasing the Big Data dream.  Data mining, knowledge management, personalization, and much more are huge opportunities for startups and established players to create new solutions.  And as Big Data moves increasingly to the cloud, making it accessible from anywhere, whole new categories of companies will emerge.  Ping Li, a partner at Accel, wrote in GigaOm of a new wave of enterprise technology solutions that are moving beyond traditional relational databases into new, disruptive architectures optimized for the cloud.  “As this cloud stack hardens, new applications and services –- previously unthinkable -– will come to light, in all shapes and sizes. But the one thing they will all have in common is Big Data.”

It all sounds very cool.  But there is a flip side.

Big Data has not so subtle overtones of Big Brother.  Who is collecting all of the information we leave in our digital wake, both intentionally and unintentionally?  How can it be protected from unwanted suveillance?  What can be learned about us from it?  And, perhaps even more significantly, how will this new knowledge impact the evolution of social norms?   As internet researcher danah boyd said in a recent speech, “Big Data isn’t arbitrary data; it’s data about people’s lives, data that is produced through their interactions with others, data that they don’t normally see let alone know is being shared.  The process of sharing it and using it and publicizing it is a violation of privacy. Our obsession with Big Data threatens to destabilize social situations and we need to consider what this means.”

Big Data analysis not only can be irresponsible, but it can be deeply misleading .  One of the more insidious aspects of Big Data is our shared cultural belief that data are objective and that analyzing them produces facts, not opinions.  Once people in all professions begin to dig into Big Data sets and start to draw conclusions from them, we need to heed boyd’s words of warning, “Every act of data analysis involves interpretation, regardless of how big or mathematical your data is.”  In other words, as we build tools to analyze these massive data sets, we need to be attentive to where the data come from and not seek answers from them that they cannot reliably give.

Many highly respected professionals, from all fields, will turn to Big Data to understand and solve problems, support policy decisions and directions, and create innovations on all fronts.  We must remain mindful that the biases and misinterpretations of these data will impact people’s lives.  As boyd says, “The Uncertainty Principle doesn’t just apply to physics.  The more you try to formalize and model social interactions, the more you disturb the balance of them.”  A fair and useful warning.

Beyond that, we need to be aware that all the tiny pieces of data in those massive sets represent some action, somewhere at some time, that takes place in a specific context.  And that context often includes a person, like you and me.  When it comes to the safe, responsible and effective use of data, that context matters.  We need to find ways of housing Big Data not just in the cloud, but within a social construct that protects all of us who give birth to its many elements.

Leave a Comment

Previous post:

Next post: