Category Archives: Data Warehousing

posts on data warehousing technologies

Microsoft Unveils Database Products at PASS Conference

Microsoft released the first community technology preview (CTP) for the next-generation version of SQL Server, codenamed Denali, Nov. 9. But that is just one of several announcements to come out of the PASS Summit 2010 conference in Seattle this week. In addition to unveiling Denali, Microsoft also announced the release of SQL Server 2008 R2 Parallel Data Warehouse and the new Critical Advantage Program, which offers an end-to-end suite of pretested hardware and software configurations, services and support.

“SQL Server code-named Denali will help empower organizations to be more agile in today’s competitive market,” the SQL Server Team touted on its blog. “Customers will be able to efficiently deliver mission-critical solutions through a highly scalable and available platform. Industry-leading tools will help developers quickly build innovative applications while data integration and management tools help deliver credible data reliably to the right users and new user experiences expand the reach of BI to enable meaningful insights.”

More on EWeek

Rising Tide in the Data Warehouse vs. Data Mart Debate

Is building an enterprise data warehouse (EDW) the best path to business intelligence (BI)? It’s a perennially vexing question that — thanks to a couple of recent trends in BI and data warehousing (DW) — has taken on new life.

The value of the full-fledged EDW seems unassailable. Over the last half-decade, however, some of the biggest EDW champions have moderated their stances, such that they now both countenance the existence of alternatives and, under certain very special conditions, are even willing to admit they’re useful. The result is that although the EDW is still seen as the Holy Grail of data warehousing, departmental (and even enterprise) data marts are now countenanced as well.

Active EDW giant Teradata Inc. is the foremost case in point, but other players — including relative newcomer Hewlett-Packard Co. (HP), which is in the high-end DW segment (by its acquisition of Knightsbridge Solutions) and markets Neoview, a DW appliance-like offering — are staking out similar ground. (In addition to Neoview, HP also partners with both Microsoft Corp. and Oracle Corp. to market appliances in the 1 to 32 TB range.)

Interesting debate on TDWI.

The Flaws of the Classic Data Warehouse Architecture

This CDWA has served us well the last twenty years. In fact, up to five years ago we had good reasons to use this architecture. The state of database, ETL, and reporting technology did not really allow us to develop something else. All the tools were aimed at supporting the CDWA. But the question right now is: twenty years later, is this still the right architecture? Is this the best possible architecture we can come up with, especially if we consider the new demands and requirements, and if we look at new technologies available in the market? My answer would be no! To me, we are slowly reaching the end of an era. An era where the CDWA was king. It is time for change. This article is the first in a series on the flaws of the CDWA and on an alternative architecture, one that fits the needs and wishes of most organizations for (hopefully) the next twenty years. Let’s start by describing some of the CDWA flaws.

The first flaw is related to the concept of operational business intelligence. More and more, organizations show interest in supporting operational business intelligence. What this means is that the reports that the decision makers use have to include more up-to-date data. Refreshing the source data once a day is not enough for those users. Decision makers who are quite close to the business processes especially need 100% up-to-date data. But how do you do this? You don’t have to be a technological wizard to understand that, if data has to be copied four or five times from one data storage layer to another, to get from the production databases to the reports, doing this in just a few seconds will become close to impossible. We have to simplify the architecture to be able to support operational business intelligence. Bottom line, what it means is that we have to remove data storage layers and minimize the number of copy steps.

Great read from BEye Network. Part 1 and Part 2.

DW Appliances – Primer

TDWI has a great article on Data Warehouse Appliances which includes all-in-all solution for enterprises. Neat Read.

In the BI world, the data warehousing appliance extends this metaphor to the enterprise data center with the vision of a high-performance database system that satisfies business intelligence (decision support) requirements and includes the server hardware, network interconnect, database software, and selected load, workload, scheduling, and administration tools needed for quick installation, loading, and ongoing monitoring.

Enterprise data warehousing appliances are popular because they get the job done in many data scenarios. However, in spite of their significant success, data warehousing appliances are not a one-size-fits all proposition, nor, as any vendor will tell you, are they appropriate for every workload profile or data warehousing challenge. A diversity of appliance vendors have emerged, including appliance offerings from the large, established information technology (IT) stalwarts such as HP, IBM, Oracle, and Microsoft. Teradata objects to be called an “appliance,” though it also objects to not being named as an IT stalwart that is relevant to the appliance market.

Best-of-breed innovators continue to contribute to market dynamics. Key differentiators — about which, as a prospective buyer of a data warehousing appliance, you should examine –include the number of successful installed customers in production willing to speak about their experiences (both positive and negative); the details of the technology itself (whether the database is open source and how it is customized, whether the server, disk, and networks are a commodity components and how they can be customized; the breadth and maturity of complementary tools such as inquiry and reporting, ETL, data quality solution); and the price of acquisition and cost of operation. Published results from public benchmarks (such as tpc.org) are also useful for starting a conversation about performance and price, though don’t rely exclusively on the benchmark “winner” since results are frequently updated.

The Elusive Virtual Data Warehouse

Bill Inmon writes on the virtual data warehouse. Interesting Read.

Why then is the virtual data warehouse such a supremely bad idea? There are actually lots of reasons for the vacuity of virtue manifested by the virtual data warehouse. Some of those reasons are:

A query that has to access a lot of databases simultaneously uses a lot of system resources. In the best of circumstances, query performance is a real problem.

A query that has to access a lot of databases simultaneously requires resources every time it is executed. If the query is run many times at all, the system overhead is very steep.

A query that has to access a lot of databases simultaneously is stopped dead in its tracks when it runs across a database that is down or otherwise unavailable.

A query that has to access a lot of databases simultaneously shuffles a lot of data around the system that otherwise would not need to be moved. The impact on the network can become very burdensome.

A query that has to access a lot of databases simultaneously is limited to the data found in the databases. If there is only a limited amount of historical data in the databases, the query is limited to whatever historical data is found there. For a variety of reasons, many application databases do not have much historical data to begin with.

Informatica Positioned In Leaders Quadrant In Data Quality Tools

From Informatica Press Release, this is an important win for Informatica.

Informatica Corporation (NASDAQ: INFA), the leading independent provider of data integration software and services, today announced that it has been positioned by Gartner, Inc. in the leaders’ quadrant in the 2008 Magic Quadrant for Data Quality Tools report.

Ted Friedman and Andreas Bitterer, authors of the report state, “leaders in the market demonstrate strength across a complete range of data quality functionality, including profiling, parsing, standardization, matching, validation and enrichment. They exhibit a clear understanding and vision of where the market is headed, including recognition of noncustomer data quality issues and the delivery of enterprise-level data quality implementations. Leaders have an established market presence, significant size and a multinational presence.”

According to the report, “growth, innovation and volatility (via mergers and acquisitions) continue to shape the market for data quality tools. Investment on the part of buyers and vendors is increasing as organizations recognize the value of these tools in master data management and information governance initiatives.” The complete report, including the quadrant graphic, is available on the Informatica web site at http://www.informatica.com/dq_mq/.

SmartStream Banks on Informatica to Accelerate Customer ROI

From the Press Release -

Informatica Corporation (Nasdaq: INFA), the leading independent provider of data integration software, today announced that SmartStream Technologies, a leading provider of software to the financial services industry, is OEMing the Informatica PowerCenter data integration platform as part of its flagship Transaction Lifecycle Management (TLM) solutions.

In making Informatica PowerCenter the foundation of its SmartStream TLM Business Integration (TLM BI) offering, SmartStream is empowering those customers with complex data environments to accelerate the return on investment of their TLM deployments through the high-performance and cost-effective integration of data involved in transaction cycles.

“The increasing drive to streamline global banking practices means our software needs to manage highly complex and rapid transactions across platforms and different banks. By using Informatica, rather than continually creating bespoke data interfaces, we can enable a faster ROI while freeing our professional services teams to provide more value to customers,” said Neil
Vernon, head of SmartStream’s Product Management Group. “We selected Informatica to power TLM BI following an evaluation where they scored highest against our key criteria of usability, reusability and performance. In addition, it was critical that TLM BI have the focus and support of a recognized best-of-breed vendor such as Informatica.”

Teradata Improves Analytics for Business Users

By Anshu Shrivastava of TMCNet, an article on the new Teradata Warehouse Miner.

The technology of data mining discovers patterns in customer, financial and operational data that can provide valuable business insights, according to Teradata.

The newest enhancements to Teradata Warehouse Miner are supported by the recently announced SAS (News – Alert) Scoring Accelerator for Teradata. Keith Collins, CTO at SAS, said that Teradata’s new functionality, and the recently released SAS Scoring Accelerator, are integrated and complementary solutions.

Teradata’s officials pointed out that the an initial benchmark of the SAS Scoring Accelerator for Teradata demonstrated the ability to process the number of records “45 times faster” than the traditional scoring method.

Moreover, the SAS Scoring Accelerator for Teradata also eliminates the need for manual translation of the SAS scoring code into SQL, or structured query language.

High-Performance dB and DWH Solution from Greenplum and Sun

From the Press Release -

Greenplum, a leading provider of database software for business intelligence, and Sun Microsystems, Inc. (NASDAQ: JAVA) today announced that Reliance Communications is using Greenplum Database, running on the Sun Data Warehouse Appliance, to power a range of applications, from legal and regulatory compliance to call detail record analysis.

Greenplum Database is the world’s fastest, most cost-effective solution for analyzing the massive amounts of information generated by surging worldwide usage of wireless and broadband services. The Data Warehouse Appliance powered by Sun and Greenplum is the industry’s first cost-effective, high-performance super-capacity data warehouse appliance. Purpose-built for high-performance, large-scale data warehousing, the solution integrates best-in-class database, server, and storage components into one easy-to-use, plug-and-play system.

“The Sun Data Warehouse Appliance running Greenplum Database is helping Reliance meet its goal of superior responsiveness in a challenging data environment — one that is characterized by rapid growth and increasing user demand,” said Raj Joshi, VP and Head (Decision Support Systems) at Reliance Communications Limited. “Deploying the joint Greenplum and Sun solution improved our response times and enabled Reliance Communications to improve our data management.”

Reliance Communications Limited is the telecommunications company of Reliance ADA Group which is one of India’s largest industrial groups. Reliance Communications is known for its innovative market offerings and practices. As Reliance has grown to more than 40 million subscribers, providing accurate and timely data support and analytics to all parts of the business has been a challenge. Turning an ad hoc request from historical records could take multiple hours; even loading a day’s worth of data into the system could take up to three hours.

Getting Started with Unstructured Data

From TDWI Article -

To start down this path, you will obviously need to take a more holistic view of your organization’s information and technology architecture to learn what data is available to your end users. You also need to spend time learning what is missing today from the BI environment. Don’t be surprised if people at first cannot articulate their needs in this arena — most people do not believe current tools can support this analysis!

In conjunction with this internal fact-finding, stay abreast of the evolution of “unstructured” content software and service solutions. Although these concepts have been around for some time, some technological developments have emerged only recently to allow some of the more interesting analysis and integration opportunities in this area.

Finally, keep experimenting! The BI market has grown and matured substantially in the last several years, and this is an exciting new area where we can all stretch and investigate. As famous engineer Richard Buckminster Fuller once quipped, “There is no such thing as a failed experiment — only experiments with unexpected outcomes.”