For any startups trying to divine where the big data space is headed and where to focus their energies, there are worse places to look than Facebook. The company collects a lot of data, and in order to handle that data it has created, among other things, the Cassandra NoSQL data store and the Hive query language for Hadoop. Its Hadoop cluster currently stores more than 100 petabytes of user data. If there’s a good idea for an application to make big data technologies even more useful, chances are Facebook is already working on it.
So whats the problem? As Big Data science increases our ability to model or simulate complex systems, these models, ironically, become as complex as the real world. But they are not the real world. Whether its astrophysics or the economy, building a computer model still demands leaving some aspects of the problem out. More importantly, the very act of bringing the equations over to digital form means you have changed them in subtle ways and that means you are solving a slightly different problem than the real-world version.
This broad scenario portrays a world in which analytic insight and computing power are nearly infinite and cost-effectively scalable. Once enterprises gain access to these resources, many improved capabilities are possible, such as better understanding customers or better fraud reduction. The enabling technologies and trends on the 2012 Hype Cycle include quantum computing, the various forms of cloud computing, big data, complex-event processing, social analytics, in-memory database management systems, in-memory analytics, text analytics and predictive analytics. The tipping point technologies that will make this scenario accessible to enterprises, governments and consumers include cloud computing, big data and in-memory database management systems.
Executives need to understand that big data is not about subordinating managerial decisions to automated algorithms but deciding what kinds of data should enhance or transform user experiences. Big Data should be neither servant nor master; properly managed, it becomes a new medium for shaping how people and their technologies interact.
That’s why it’s a tad disingenuous when Google-executive-turned-Yahoo-CEO-thought-leader Marissa Mayer declares “data is apolitical” and that her old company succeeds because it is so (big) data-driven: “It all comes down to data. Run a 1% test [on 1% of the audience] and whichever design does best against the user-happiness metrics over a two-week period is the one we launch. We have a very academic environment where we’re looking at data all the time. We probably have somewhere between 50 and 100 experiments running on live traffic, everything from the default number of results to underlined links to how big an arrow should be. We’re trying all those different things
Massive rivers of digital information are a snooze, visually. Yet that is the narrow, literal-minded view. Mr. Smolan’s new project, “The Human Face of Big Data,” which is being formally announced on Thursday, focuses on how data, smart software, sensors and computing are opening the door to all sorts of new uses in science, business, health, energy and water conservation. And the pictures are mostly of the people doing that work or those being affected.
“Big Data is a $100 billion market opportunity as estimated by Merrill Lynch,” Stanek said. “We are nearing a perfect storm: a history of failed Business Intelligence implementations that never lived up to their promise, yet analytics are needed today more than ever.”
Your data quality efforts need to be defined more as profiling and standards versus cleansing. This is better aligned to how big data is managed and processed. While on the surface, big data processing is batch in nature, it would seem obvious to institute data quality rules the way they have always been done. But the answer is to be more service-oriented, invoking data quality rules that provide improved standardization and sourcing during processing versus fundamentally changing the data. In addition, data quality rules are invoked in a customized fashion based on customer service calls from big data processing.
This may seem like a boring set of numbers, but for the first time, it revealed an underlying pattern to war. “It shows that there is something going on in the way these wars are fought that is common to all,” Neil F. Johnson, a physicist at the University of Miami who participated in the research, told Nature.In the course of his research, he and his team collected data on 54,679 “violent events” reported in nine different conflicts, including those in Iraq, Afghanistan, Peru, and Colombia.
he beauty of such Big Data applications is that they can process Web-based text, digital images, and online video. They can also glean intelligence from the exploding social media sphere, whether it consists of blogs, chat forums, Twitter trends, or Facebook commentary. Traditional market research generally involves unnatural acts, such as surveys, mall-intercept interviews, and focus groups. Big Data examines what people say about what they have done or will do. That’s in addition to tracking what people are actually doing about everything from crime to weather to shopping to brands. It is only Big Data’s capacity for dealing with vast quantities of real-time unstructured data that makes this possible.
Big Data is in more places than you know, perhaps even your living room.
Jim Wilson/The New York Times
Smart thermostats made by Nest Labs.
Nest Labs makes a smart thermostat that promotes energy saving by studying its owner’s habits and predicting things about when people are home and what they are likely to do with their home heating and cooling. Using a clever system of awards for the homeowner (green “leaves” for doing the energy-efficient thing), the thermostat is intended to save money through efficiency.