Big Data Conversations Need a Big Data Definition

As part of my day job, I recently recapped the Federal Trade Commission’s workshop on “Big Data” and discrimination. My two key takeaways were that regulators and the advocacy community wanted more “transparency” into how industry is using big data, particularly in positive ways, and second, that there was a pressing need for industry to take affirmative steps to implement governance systems and stronger “institutional review board”-type mechanisms to overcome the transparency hurdle the opacity of big data present.

But if I’m being candid, I think we really need to start narrowing our definitions of big data. Big data has become a term that gets attached to a wide-array of different technologies and tools that really ought to be addressed separately. We just don’t have a standard definition. The Berkeley School of Information recently asked forty different thought leaders what they thought of big data, and basically got forty different definitions. While there’s a common understanding of big data as more volume, more variety, and at greater velocity, I’m not sure how any of these terms is a foundation to start talking about practices or rules, let alone ethics.

At the FTC’s workshop, big data was spoken in the context of machine learning and data mining, the activities of data brokers and scoring profiles, wearable technologies and the greater Internet of Things. No one ever set ground rules as to what “Big Data” meant as a tool for inclusion or exclusion. At one point, a member of the civil rights community was focused on big data largely as the volume of communications being produced by social media at the same time as another panelist was discussing consumer loyalty cards. Maybe there’s some overlap, but the risks and rewards can be very different.

Buying and Selling Privacy Essay Published by Stanford Law Review Online

My essay on how “Big Data” is transforming our notions of individual privacy in unequal ways has been published by the Stanford Law Review Online.  Here’s how they summed up my piece:

We are increasingly dependent upon technologies, which in turn need our personal information in order to function. This reciprocal relationship has made it incredibly difficult for individuals to make informed decisions about what to keep private. Perhaps more important, the privacy considerations at stake will not be the same for everyone: they will vary depending upon one’s socioeconomic status. It is essential for society and particularly policymakers to recognize the different burdens placed on individuals to protect their data.

Digital Market Manipulation Podcast

[audio http://www.futureofprivacy.org/wp-content/uploads/FPFCast.Calo_.mp3]

The other week, Rebecca Rosen wrote up a fantastic overview of Professor Ryan Calo’s new paper on “Digital Market Manipulation” in The Atlantic.  “What Does It Really Matter If Companies Are Tracking Us Online?” she provocatively asked in her headline.

Conveniently, I was scheduled to speak with Professor Calo about his essay Consumer Subject Review Boards — A Thought Experiment, which looks at how institutional review boards (IRBs) were put in place to ensure ethical human testing standards and suggests a similar framework could be brought to bear on consumer data projects.

I was able to ask him about the concept of digital market manipulation, which seems to move beyond mere “privacy” concerns into questions of fundamental fairness and equality.

 Scroll to top