The Business Intelligence Blog

Slicing Business Dicing Intelligence

Archive for the ‘Data Warehousing’ tag

The Flaws of the Classic Data Warehouse Architecture  

This CDWA has served us well the last twenty years. In fact, up to five years ago we had good reasons to use this architecture. The state of database, ETL, and reporting technology did not really allow us to develop something else. All the tools were aimed at supporting the CDWA. But the question right now is: twenty years later, is this still the right architecture? Is this the best possible architecture we can come up with, especially if we consider the new demands and requirements, and if we look at new technologies available in the market? My answer would be no! To me, we are slowly reaching the end of an era. An era where the CDWA was king. It is time for change. This article is the first in a series on the flaws of the CDWA and on an alternative architecture, one that fits the needs and wishes of most organizations for (hopefully) the next twenty years. Let’s start by describing some of the CDWA flaws.

The first flaw is related to the concept of operational business intelligence. More and more, organizations show interest in supporting operational business intelligence. What this means is that the reports that the decision makers use have to include more up-to-date data. Refreshing the source data once a day is not enough for those users. Decision makers who are quite close to the business processes especially need 100% up-to-date data. But how do you do this? You don’t have to be a technological wizard to understand that, if data has to be copied four or five times from one data storage layer to another, to get from the production databases to the reports, doing this in just a few seconds will become close to impossible. We have to simplify the architecture to be able to support operational business intelligence. Bottom line, what it means is that we have to remove data storage layers and minimize the number of copy steps.

Great read from BEye Network. Part 1 and Part 2.

share the post:
  • Twitter
  • Facebook
  • LinkedIn
  • FriendFeed
  • del.icio.us
  • Google Bookmarks
  • HackerNews
  • Live

The article has

one response

Written by Guru Kirthigavasan

April 7th, 2009 at 6:08 pm

DW Appliances – Primer  

TDWI has a great article on Data Warehouse Appliances which includes all-in-all solution for enterprises. Neat Read.

In the BI world, the data warehousing appliance extends this metaphor to the enterprise data center with the vision of a high-performance database system that satisfies business intelligence (decision support) requirements and includes the server hardware, network interconnect, database software, and selected load, workload, scheduling, and administration tools needed for quick installation, loading, and ongoing monitoring.

Enterprise data warehousing appliances are popular because they get the job done in many data scenarios. However, in spite of their significant success, data warehousing appliances are not a one-size-fits all proposition, nor, as any vendor will tell you, are they appropriate for every workload profile or data warehousing challenge. A diversity of appliance vendors have emerged, including appliance offerings from the large, established information technology (IT) stalwarts such as HP, IBM, Oracle, and Microsoft. Teradata objects to be called an “appliance,” though it also objects to not being named as an IT stalwart that is relevant to the appliance market.

Best-of-breed innovators continue to contribute to market dynamics. Key differentiators — about which, as a prospective buyer of a data warehousing appliance, you should examine –include the number of successful installed customers in production willing to speak about their experiences (both positive and negative); the details of the technology itself (whether the database is open source and how it is customized, whether the server, disk, and networks are a commodity components and how they can be customized; the breadth and maturity of complementary tools such as inquiry and reporting, ETL, data quality solution); and the price of acquisition and cost of operation. Published results from public benchmarks (such as tpc.org) are also useful for starting a conversation about performance and price, though don’t rely exclusively on the benchmark “winner” since results are frequently updated.

share the post:
  • Twitter
  • Facebook
  • LinkedIn
  • FriendFeed
  • del.icio.us
  • Google Bookmarks
  • HackerNews
  • Live

The article has

no responses yet

Written by Guru Kirthigavasan

March 24th, 2009 at 8:25 pm

The Petabyte BI World – Wired  

Sensors everywhere. Infinite storage. Clouds of processors. Our ability to capture, warehouse, and understand massive amounts of data is changing science, medicine, business, and technology. As our collection of facts and figures grows, so will the opportunity to find answers to fundamental questions. Because in the era of big data, more isn’t just more. More is different.

This month’s Wired magazine carries one of the most important growing concerns of the scientific community, the uncontrollable growth of data. This growth of data in many directions is nearly killing theories as everything is becoming more and more data controlled.

There are a series of articles ranging from what data miners are digging today to elaborate algorithms that predict air ticket prices to how we can monitor epidemics hour by hour.

If you are a BI entusiast or not, this month’s Wired cover story will challenge all your predictions about science and technology, even if you have a petabyte of data to support it !! Read it, like, right now !!

share the post:
  • Twitter
  • Facebook
  • LinkedIn
  • FriendFeed
  • del.icio.us
  • Google Bookmarks
  • HackerNews
  • Live

The article has

4 responses

Written by Guru Kirthigavasan

July 15th, 2008 at 5:58 am

Ireland is facing a data tsunami  

From Silicon Republic, an interesting article on the exponential growth of data -

CURRENTLY the amount of data worldwide is doubling every 11 months – by 2010 it will double every 11 hours.

Ireland stands in the direct path of this tidal wave of data, warns a senior executive with business intelligence (BI) giant SAS.

Dr John Brocklebank, director of analytic solutions at privately held software giant SAS Institute, believes Irish firms are ill-equipped to deal with this rapid growth in the world’s data.

He says the challenge for Irish companies is to capture and exploit the 1-2pc of data that is relevant to their decision-making processes and strategic objectives.

“We know that, on average, managers spend two hours a day looking for data and that more than half of this is useless to their decision-making process,” Brocklebank explains.

“Most frightening is that 42pc of managers say they accidentally use the wrong information to make a decision at least once a week. So never mind trying to deal with the data in its entirety; it needs to be made meaningful and accurate to support business decisions.

share the post:
  • Twitter
  • Facebook
  • LinkedIn
  • FriendFeed
  • del.icio.us
  • Google Bookmarks
  • HackerNews
  • Live

The article has

one response

Written by Guru Kirthigavasan

June 20th, 2008 at 7:28 am

SmartStream Banks on Informatica to Accelerate Customer ROI  

From the Press Release -

Informatica Corporation (Nasdaq: INFA), the leading independent provider of data integration software, today announced that SmartStream Technologies, a leading provider of software to the financial services industry, is OEMing the Informatica PowerCenter data integration platform as part of its flagship Transaction Lifecycle Management (TLM) solutions.

In making Informatica PowerCenter the foundation of its SmartStream TLM Business Integration (TLM BI) offering, SmartStream is empowering those customers with complex data environments to accelerate the return on investment of their TLM deployments through the high-performance and cost-effective integration of data involved in transaction cycles.

“The increasing drive to streamline global banking practices means our software needs to manage highly complex and rapid transactions across platforms and different banks. By using Informatica, rather than continually creating bespoke data interfaces, we can enable a faster ROI while freeing our professional services teams to provide more value to customers,” said Neil
Vernon, head of SmartStream’s Product Management Group. “We selected Informatica to power TLM BI following an evaluation where they scored highest against our key criteria of usability, reusability and performance. In addition, it was critical that TLM BI have the focus and support of a recognized best-of-breed vendor such as Informatica.”

share the post:
  • Twitter
  • Facebook
  • LinkedIn
  • FriendFeed
  • del.icio.us
  • Google Bookmarks
  • HackerNews
  • Live

The article has

no responses yet

Written by Guru Kirthigavasan

June 17th, 2008 at 6:02 am

Teradata Improves Analytics for Business Users  

By Anshu Shrivastava of TMCNet, an article on the new Teradata Warehouse Miner.

The technology of data mining discovers patterns in customer, financial and operational data that can provide valuable business insights, according to Teradata.

The newest enhancements to Teradata Warehouse Miner are supported by the recently announced SAS (News – Alert) Scoring Accelerator for Teradata. Keith Collins, CTO at SAS, said that Teradata’s new functionality, and the recently released SAS Scoring Accelerator, are integrated and complementary solutions.

Teradata’s officials pointed out that the an initial benchmark of the SAS Scoring Accelerator for Teradata demonstrated the ability to process the number of records “45 times faster” than the traditional scoring method.

Moreover, the SAS Scoring Accelerator for Teradata also eliminates the need for manual translation of the SAS scoring code into SQL, or structured query language.

share the post:
  • Twitter
  • Facebook
  • LinkedIn
  • FriendFeed
  • del.icio.us
  • Google Bookmarks
  • HackerNews
  • Live

The article has

no responses yet

Written by Guru Kirthigavasan

June 14th, 2008 at 6:36 pm

From Text Analytics to Data Warehousing  

I liked the recent article of Seth Grimes which talks about Text Analytics Accuracy. His article, today, on Intelligent Enterprise, pointed me to the IBM article on IBM® OmniFind™ Analytics Edition which talks in detail about extracting unstructured data from e-mail, Web pages, news and blog articles and building a data warehouse out of them to unlock the huge potential which was previously untapped.

In recent months/weeks, the focus on unstructured data is becoming more and more as businesses and vendors are starting to understand the power of this unstructured data and how it can text mined and used to the benefit of the exterprises. And its a good this.

A must read. Highly Recommended.

IBM OmniFind Analytics Edition

Text analytics enables you to extract more business value from unstructured data such as emails, customer relationship management (CRM) records, office documents, or any text-based data. IBM® OmniFind™ Analytics Edition provides rich text analysis capabilities and interactive visualization to enable you to find patterns and trends hidden in large quantities of unstructured information. The text analysis results from OmniFind Analytics Edition are in XML-format and can also be stored, indexed, and queried in a DB2 database. This allows you to incorporate your text analysis results into existing business applications and reporting tools by using regular SQL or SQL/XML queries. This article provides an overview of text analytics with OmniFind Analytics Edition and describes several ways of bringing its analysis results into DB2, in relational or pureXML™ format.
..
..
OmniFind Analytics Edition provides the ability to interactively explore and mine the results of text analysis, as well as structured data that is typically associated with unstructured text. For those of you familiar with business intelligence applications, you can think of it as content-centric business intelligence, in that it aggregates the results of text analysis to detect frequencies, correlations, and trends. Typical use cases include:

Analysis of customer contact information (e-mails, chats, problem tickets, contact center notes) for insight into quality or satisfaction issues
Analysis of blogs and wikis for reputation monitoring
Analysis of internal e-mail for compliance violations or for expertise location

share the post:
  • Twitter
  • Facebook
  • LinkedIn
  • FriendFeed
  • del.icio.us
  • Google Bookmarks
  • HackerNews
  • Live

The article has

one response

Written by Guru Kirthigavasan

May 18th, 2008 at 6:29 pm

Sybase and Sun create the World’s Largest Data Warehouse  

One Petabyte of mixed relational and unstructure data. That’s neat.

More from the Sybase Press Release -

Sybase, Inc. (NYSE: SY), the largest enterprise software and services company exclusively focused on managing and mobilizing information, today announced that the Sybase® IQ analytics server has set a new Guinness World Record™ by powering the world’s largest data warehouse on a Sun™ SPARC® Enterprise M9000 server. This accomplishment was achieved using Sybase IQ, BMMsoft ServerSM and the Sun Microsystems® Data Warehouse Reference Architecture. This winning combination enables more data to be stored in less space, searched and analyzed in less time, while consuming 91 percent less energy and generating less heat and carbon dioxide than conventional solutions.

Powered by the category-leading column-oriented database Sybase IQ, the data warehouse is certified to support a record-breaking one petabyte of mixed relational and unstructured data—more than 34 times larger than the largest industry standard benchmark1 and twice the size of the largest commercial data warehouse known to date2. In total, the data warehouse contains six trillion rows of transactional data and more than 185 million content-searchable documents, such as emails, reports, spreadsheets and other multimedia objects.
..
..
Designed from the ground up as an analytics server, Sybase IQ produces its impressive results because of a unique architecture combining a column-oriented data structure with patented indexing and a scalable grid. Sybase IQ offers extraordinarily high performance at a lower cost than a traditional, row-oriented database. And, unlike traditional row-based data warehouses, the stored data in Sybase IQ is compressed by up to 70 percent of its input size, creating the most optimal and elegant analytics solutions.

“The results of this benchmark showcase Sybase IQ’s capabilities to handle real-world scenarios, querying vast amounts of data representing the transactions processed across the worldwide financial trading networks over multiple years.” said Francois Raab, president, InfoSizing, the consulting firm that oversaw the benchmarking of the record. “Sybase IQ has proven its production strength in handling the volume of multimedia documents representative of the electronic communication between half a million financial traders.”

share the post:
  • Twitter
  • Facebook
  • LinkedIn
  • FriendFeed
  • del.icio.us
  • Google Bookmarks
  • HackerNews
  • Live

Data Warehousing on a Shoestring Budget  

TDWI is running a series on developing and deploying Data Warehousing, frugally. It’s a 3 part series. Read Part 1 and 2.

Although seemingly difficult, you can make choices, which allow for the beneficial realization of data warehousing while also minimizing costs. By balancing technology and carefully positioning your business, your organization can quickly create cost-effective solutions using data warehousing technologies.

There are a few simple rules to help you develop a data warehouse on a shoestring budget:

* Use what you have
* Use what you know
* Use what is free
* Buy only what you have to
* Think small and build in phases
* Use each phase to finance or justify the remainder of the projects

It’s also a must read for businesses which have enough business sponsorship and enormous resources. Tough times in the marketplace like these call for an economical way of staying ahead on the business curve. And that’s exactly the point of this series.

I like the detailed approach Nathan Rawling towards this topic.

share the post:
  • Twitter
  • Facebook
  • LinkedIn
  • FriendFeed
  • del.icio.us
  • Google Bookmarks
  • HackerNews
  • Live

The article has

no responses yet

Written by Guru Kirthigavasan

May 13th, 2008 at 9:24 pm

Protegrity & Teradata announce DW encryption performance  

Press Release – Protegrity and Teradata today announce unmatched data warehouse encryption performance

Protegrity Corporation, the leading provider of Data Security Management solutions, and Teradata, the global leader in Enterprise Data Warehousing today announced new cryptography performance of over 6 million decryptions and over 9 million encryptions per second, enabling customers to maximize data protection while minimizing impact on business operations.

“Our research shows that businesses worry that data protection may impact application performance. These test results show that the partnership between Teradata and Protegrity delivers proven enterprise solutions that help customers protect data and achieve regulatory compliance with industry-leading system performance,” said Gordon Rapkin, president and chief executive officer of Protegrity.

The Protegrity Defiance® Defiance DPS uses Teradata User Defined Functions (UDFs) to embed encryption/decryption functionality in the database. Teradata’s high performance UDF implementation and parallel architecture provides highly efficient execution, and then scales that performance linearly as the system grows in size.

“High performance data protection enables our customers to confidently meet their privacy and compliance obligations,” said Randy Lea, vice president, Teradata products and services. “The efficiency of this data protection solution frees our customers from the technical challenges to focus on the value of business analytics. This data, rich with insight, provides a better understanding of consumers.”

The test included the Defiance® Data Protection System utilizing industry standard strong Advanced Encryption Standard and Teradata 12 on a six-node (12 Intel® Xeon® processors) Teradata 5550 Platform.

Since 2004, Protegrity and Teradata have collaborated on enterprise-level data encryption and management. In June 2005, the companies announced a global partnership to deliver database security for Teradata customers.

ABOUT TERADATA
Teradata Corporation (NYSE: TDC) is the world’s largest company solely focused on raising intelligence through data warehousing and enterprise analytics. Teradata is in more than 60 countries and on the Web at www.teradata.com.

ABOUT PROTEGRITY
Protegrity delivers centralized data security management solutions that protect sensitive information from acquisition to deletion across the enterprise. Protegrity’s customers maintain complete protection over their data and business by employing software and solutions specifically designed to encrypt data, safeguard web applications, and manage and report on security policy.

The company’s singular focus is on developing solutions that protect data. Protegrity employees are security technology specialists with deep expertise in encryption, key management, web application firewalls and security policy in distributed environments. Maximize security with minimal business impact with Protegrity’s Defiance® Suite, the high performance, transparent solution optimized for the dynamic enterprise.

To learn more, visit www.protegrity.com or call 203.326.7200

share the post:
  • Twitter
  • Facebook
  • LinkedIn
  • FriendFeed
  • del.icio.us
  • Google Bookmarks
  • HackerNews
  • Live

The article has

one response

Written by Guru Kirthigavasan

May 13th, 2008 at 6:43 am