Really interesting post on how Facebook is solving the crazy large data warehouse with compression efficiency. More here.
By applying all these improvements, we evolved ORCFile to provide a significant boost in compression ratios over RCFile on our warehouse data, going from 5x to 8x. Additionally, on a large representative set of queries and data from our warehouse, we found that the Facebook ORCFile writer is 3x better on average than open source ORCFile.
We have rolled out this new storage format to many 10s of petabytes of warehouse data at Facebook and have reclaimed 10s of petabytes of capacity by switching from RCFile to Facebook ORCFile as the storage format. We are in the process of rolling out the format to additional tables in our data warehouse, so we can take further advantage of the improved storage efficiency and read/write performance. We have made our storage format available at GitHub and are working with the open source community to incorporate these improvements back into the Apache Hive project.
While I don’t necessarily agree with all of them, this ones seems very realistic and bound to happen.
Internet of Things (IoT) starts to gain traction.
The explosion in “things” will generate a corresponding wave of rich new sources with varying data formats for edge analytics. IT is still struggling to keep up with the massive information coming from the mobile worker. With the advent of IoT, that mass of information will be multiplied with every asset across an organization, from the pressure sensor on the wellhead to the video monitor at the gas station pump. Business decisions will be accelerated in organizations that have a strategy for rapidly organizing and normalizing data across their ecosystem.
An indepth look into big data from an australian perspective, well written.
Over one third (37.8 percent) of US businesses have already invested in big data, ahead of Europe, the Middle East and Africa (EMEA), which is at 26.8 percent; APAC, which is on 25.6 percent; and Latin America, which is at a distant fourth, with 17.8 percent.
However, Asian businesses show plenty of big data ambition, and a further 44.7 percent will invest in the next two years, lifting the APAC adoption rate to the same proportions as the US, and comfortably ahead of EMEA.
Industries furthest down the investment path are media and communications (35 percent have invested), banking (34 percent), and services (32 percent), while government (16 percent) and utilities (17 percent) are laggards.
Read More – Big data: Big hype or big hope? | ZDNet.
With location-tracking firm Placed, the mobile ad network plans to show the impact of a mobile campaign on foot traffic to a given retail location. The Placed Attribution platform uses the company’s panel of 100,000 opted-in users whose behaviors the company tracks as they move in and out of over 100 million discrete locations a day.
The metrics from the opt-in panel will be integrated with a mobile campaign on Millennial’s network to determine which of the opted-in devices were exposed to an ad and then track whether the user entered the advertised location.
Read more: http://www.mediapost.com/publications/article/209640/millennial-adds-analytics-tools-to-push-mobile-roi.html
Right from the name change of Gartner’s usual Magic Quadrant for BI to include analytics system, this year’s report has a lot to cheer about. There is more clear definitions on what makes up Business Intelligence and Analytics systems. Its broken down into 3 categories: Integration, Infomartion Delivery and Analysis.
The image is self descriptive and more info on each vendor is available as apart of this 35 page report.
For any startups trying to divine where the big data space is headed and where to focus their energies, there are worse places to look than Facebook. The company collects a lot of data, and in order to handle that data it has created, among other things, the Cassandra NoSQL data store and the Hive query language for Hadoop. Its Hadoop cluster currently stores more than 100 petabytes of user data. If there’s a good idea for an application to make big data technologies even more useful, chances are Facebook is already working on it.
via For the future of big data startups, look to Facebook — Data | GigaOM.
So whats the problem? As Big Data science increases our ability to model or simulate complex systems, these models, ironically, become as complex as the real world. But they are not the real world. Whether its astrophysics or the economy, building a computer model still demands leaving some aspects of the problem out. More importantly, the very act of bringing the equations over to digital form means you have changed them in subtle ways and that means you are solving a slightly different problem than the real-world version.
via Big Data And Its Big Problems : 13.7: Cosmos And Culture : NPR.
This broad scenario portrays a world in which analytic insight and computing power are nearly infinite and cost-effectively scalable. Once enterprises gain access to these resources, many improved capabilities are possible, such as better understanding customers or better fraud reduction. The enabling technologies and trends on the 2012 Hype Cycle include quantum computing, the various forms of cloud computing, big data, complex-event processing, social analytics, in-memory database management systems, in-memory analytics, text analytics and predictive analytics. The tipping point technologies that will make this scenario accessible to enterprises, governments and consumers include cloud computing, big data and in-memory database management systems.
via Gartner’s 2012 Hype Cycle for Emerging Technologies Identifies “Tipping Point” Technologies That Will Unlock Long-Awaited Technology Scenarios.
Executives need to understand that big data is not about subordinating managerial decisions to automated algorithms but deciding what kinds of data should enhance or transform user experiences. Big Data should be neither servant nor master; properly managed, it becomes a new medium for shaping how people and their technologies interact.
That’s why it’s a tad disingenuous when Google-executive-turned-Yahoo-CEO-thought-leader Marissa Mayer declares “data is apolitical” and that her old company succeeds because it is so (big) data-driven: “It all comes down to data. Run a 1% test [on 1% of the audience] and whichever design does best against the user-happiness metrics over a two-week period is the one we launch. We have a very academic environment where we’re looking at data all the time. We probably have somewhere between 50 and 100 experiments running on live traffic, everything from the default number of results to underlined links to how big an arrow should be. We’re trying all those different things
via What Executives Don’t Understand About Big Data – Michael Schrage – Harvard Business Review.