Monthly Archives: February 2012

Big Data – 5 low-profile startups

Odiago is the brainchild of Hadoop and analytics experts Christophe Bisciglia and Aaron Kimball, and aims to improve the state of web analytics. Its first product, Wibidata, which is in private beta, lets websites better analyze their user data to build more-targeted features. It’s built atop Hadoop and HBase, but also plugs into companies’ existing data-management and BI tools. Current customers include Wikipedia, RichRelevance, FoneDoktor and Atlassian (with whom it shares office space).

- More on the 5 start-ups working on Big Data.

Privacy in the Age of Big Data

In a well researched article, Privacy in the Age of Big Data, Prof. Omer Tene & Jules Polonetsky bring up some really interesting arguments for privacy in the big data age, that we currently are. Good read if you are dealing with big data.

The harvesting of large data sets and the use of analytics clearly implicate privacy concerns. The tasks of ensuring data security and protecting privacy become harder as information is multiplied and shared ever more widely around the world. Information regarding individuals’ health, location, electricity use, and online activity is exposed to scrutiny, raising concerns about profiling, discrimination, exclusion, and loss of control. Traditionally, organizations used various methods of de-identification (anonymization, pseudonymization, encryption, key-coding, data sharding) to distance data from real identities and allow analysis to proceed while at the same time containing privacy concerns. Over the past few years, however, computer scientists have repeatedly shown that even anonymized data can often be re-identified and attributed to specific individuals.[7] In an influential law review article, Paul Ohm observed that “[r]eidentification science disrupts the privacy policy landscape by undermining the faith that we have placed in anonymization.”[8] The implications for government and businesses can be stark, given that de-identification has become a key component of numerous business models, most notably in the contexts of health data (regarding clinical trials, for example), online behavioral advertising, and cloud computing.