The Business Intelligence Blog

Slicing Business Dicing Intelligence

Archive for the ‘Data Mining’ tag

101 – Data Mining and Predictive Analytics  

In today’s world mining of text, Web and media (unstructured data) plus structured data mining, the term information mining is a more appropriate label. Mining a combination of these, companies are able to make the best use of structured data, unstructured text and social media. Static and stagnant predictive models of the past don’t work well in the world we live in today. Predictive analytics should be agile to adapt and monetize on quickly changing customer behaviors in our world, which are often identified online and through social networks.

Better integration of data mining software with the source data at one end and with the information consumption software at the other end has led to improvement in the integration of predictive analytics with day-to-day business. Even though there haven’t been significant advancements in predictive algorithms, the ability to apply large data sets to models and the ability to enable better interaction with business has led to improvements in the overall outcome of the exercise.

There is a great introduction to the world of data mining and predictive analytics here.

The article has

no responses yet

Written by Guru Kirthigavasan

January 24th, 2012 at 7:31 am

The Jargon of the Novel, Computed  

Scholars in the growing field of digital humanities can tackle this question by analyzing enormous numbers of texts at once. When books and other written documents are gathered into an electronic corpus, one “subcorpus” can be compared with another: all the digitized fiction, for instance, can be stacked up against other genres of writing, like news reports, academic papers or blog posts.

One such research enterprise is the Corpus of Contemporary American English, or COCA, which brings together 425 million words of text from the past two decades, with equally large samples drawn from fiction, popular magazines, newspapers, academic texts and transcripts of spoken English. The fiction samples cover short stories and plays in literary magazines, along with the first chapters of hundreds of novels from major publishers. The compiler of COCA, Mark Davies at Brigham Young University, has designed a freely available online interface that can respond to queries about how contemporary language is used. Even grammatical questions are fair game, since every word in the corpus has been tagged with a part of speech.

More…

The article has

one response

Written by Guru Kirthigavasan

July 31st, 2011 at 6:18 am

Portrait Software Utilizes Analytics to Provide PA/DM  

From Press Release -

Forrester evaluated the top nine predictive analytics and data mining (PA/DM) solution vendors across 53 criteria, segmenting them into the three categories including current offering, product strategy, and market presence. As a leader offering “mature, high-performance, scalable, flexible, and robust PA/DM solutions,” Portrait received the 3rd highest score for Product Strategy & the 6th highest score for Current Offering.”

Among the vendor products the Forrester(TM) Wave evaluated were Portrait Customer Analytics, Portrait Uplift Optimizer, and Portrait Self-service Analytics. According to the Forrester(TM) Wave, “Portrait provides a user-friendly, feature-rich PA/DM solution portfolio in support of real-time scoring, interaction optimization, uplift optimization, and campaign management for customer analytics.”

“Powerful customer analytics have always been the core driver of Portrait’s innovative marketing solutions, but analytics itself only takes you so far,” said Luke McKeever, CEO, Portrait Software. “Portrait’s ability to not only incorporate analytics but to action the insights they deliver enables us to provide our customers with highly intelligent solutions that help them operate as a customer-centric organization, differentiating them from their competitors while simultaneously improving their marketing ROI.”

The article has

one response

Written by Guru Kirthigavasan

February 16th, 2010 at 2:06 am

Microsoft Unveils Apps for Crime-Fighting Data Mining  

Once again, software is fighting crime. Microsoft unveiled a suite of tools and initiatives for law-enforcement groups “specifically designed to improve public security and safety,” the company said.
..
..
It’s also the latest example of law enforcement officials arming themselves with better technology to help fight crime. The FBI, for instance, said that new database and data-sharing efforts have resulted in solving a number of difficult highway serial killings.

Gathering that data is key. That’s why Microsoft this week said it is giving a free tool to INTERPOL called the Computer Online Forensic Evidence Extractor (COFEE), an application that “uses common digital forensics tool to help officers at the scene of the crime.”

The company is working on a mobile version for future release, said Richard Domingues Boscovich, senior attorney for Microsoft’s Internet security program, told InternetNews.com in an e-mail.

A larger tool set for large-scale crimes is Microsoft Intelligence Framework, which is aimed at helping intelligence and law enforcement agencies coordinate information to detect and prevent terrorism, and to solve organized and major crime cases. The framework offers tools for storing and analyzing evidence and information across a variety of sources

From EarthWeb article.

The article has

2 responses

Written by Guru Kirthigavasan

April 22nd, 2009 at 8:15 am

SPSS Rebrands Its Analytical Offerings  

The new version of the SPSS modeling product — the erstwhile Clementine — is now known as PASW Modeler 13; its text analysis product (formerly Text Mining for Clementine) is now PASW Text Analytics 13. SPSS says that, over the course of the year, the rest of the SPSS product line will update under the PASW umbrella — including Statistics and Data Collection.

David Vergara, director of product marketing for SPSS, explains that the change was intended to help customers and prospects understand what the products are doing and how each offering pieces together within the broader portfolio.

Aside from the name change, the new versions of SPSS products focus on usability — and not just for data experts. Wettemann says that SPSS has “recognized that moving beyond the data analyst audience is where you get the real power.” PASW Modeler 13 features a drag-and-drop interface, and functionality that will appeal to business users. Two integral updates include a “comments” tool, in which users can flag notes within the software, and automated data preparation. Data automation mitigates human error and avoids common issues in data quality.

From Destination CRM.

The article has

no responses yet

Written by Guru Kirthigavasan

April 14th, 2009 at 6:11 am

Data Mining Moves to HR  

For most of its eight-year history, Cataphora has focused on digital sleuthing. The company hunts for statistical signs of fraud. But in the past few years, Cataphora has been dispatching its data miners into a new market: statistical studies of employee performance.

The trend, though early, is unmistakable, and it extends far beyond Redwood City. Number crunching, a staple for decades in the quantifiable domains of engineering and finance, has spread in recent years into marketing and sales. Companies can now model and optimize operations, and can calculate the return on investment on everything from corporate jets to Super Bowl ads. These successes have led to the next math project: the worker. “You have to bring the same rigor you bring to operations and finance to the analysis of people,” says Rupert Bader, director of workforce planning at Microsoft (MSFT).

Such a mission might have been laughable a decade ago. But as the role of computers in the workplace expands, employees leave digital trails detailing their behavior, their schedule, their interests, and expertise. For executives to calculate the return on investment of each worker, their human resources departments are starting to open their doors to the quants.

From Business Week, an insightful article on how value of each employee is determined by HR using Data Mining/Analytics.

The article has

one response

Written by Guru Kirthigavasan

March 22nd, 2009 at 7:44 am

The Petabyte BI World – Wired  

Sensors everywhere. Infinite storage. Clouds of processors. Our ability to capture, warehouse, and understand massive amounts of data is changing science, medicine, business, and technology. As our collection of facts and figures grows, so will the opportunity to find answers to fundamental questions. Because in the era of big data, more isn’t just more. More is different.

This month’s Wired magazine carries one of the most important growing concerns of the scientific community, the uncontrollable growth of data. This growth of data in many directions is nearly killing theories as everything is becoming more and more data controlled.

There are a series of articles ranging from what data miners are digging today to elaborate algorithms that predict air ticket prices to how we can monitor epidemics hour by hour.

If you are a BI entusiast or not, this month’s Wired cover story will challenge all your predictions about science and technology, even if you have a petabyte of data to support it !! Read it, like, right now !!

The article has

4 responses

Written by Guru Kirthigavasan

July 15th, 2008 at 5:58 am

From Text Analytics to Data Warehousing  

I liked the recent article of Seth Grimes which talks about Text Analytics Accuracy. His article, today, on Intelligent Enterprise, pointed me to the IBM article on IBM® OmniFind™ Analytics Edition which talks in detail about extracting unstructured data from e-mail, Web pages, news and blog articles and building a data warehouse out of them to unlock the huge potential which was previously untapped.

In recent months/weeks, the focus on unstructured data is becoming more and more as businesses and vendors are starting to understand the power of this unstructured data and how it can text mined and used to the benefit of the exterprises. And its a good this.

A must read. Highly Recommended.

IBM OmniFind Analytics Edition

Text analytics enables you to extract more business value from unstructured data such as emails, customer relationship management (CRM) records, office documents, or any text-based data. IBM® OmniFind™ Analytics Edition provides rich text analysis capabilities and interactive visualization to enable you to find patterns and trends hidden in large quantities of unstructured information. The text analysis results from OmniFind Analytics Edition are in XML-format and can also be stored, indexed, and queried in a DB2 database. This allows you to incorporate your text analysis results into existing business applications and reporting tools by using regular SQL or SQL/XML queries. This article provides an overview of text analytics with OmniFind Analytics Edition and describes several ways of bringing its analysis results into DB2, in relational or pureXML™ format.
..
..
OmniFind Analytics Edition provides the ability to interactively explore and mine the results of text analysis, as well as structured data that is typically associated with unstructured text. For those of you familiar with business intelligence applications, you can think of it as content-centric business intelligence, in that it aggregates the results of text analysis to detect frequencies, correlations, and trends. Typical use cases include:

Analysis of customer contact information (e-mails, chats, problem tickets, contact center notes) for insight into quality or satisfaction issues
Analysis of blogs and wikis for reputation monitoring
Analysis of internal e-mail for compliance violations or for expertise location

The article has

one response

Written by Guru Kirthigavasan

May 18th, 2008 at 6:29 pm

Microsoft Sets Sights on Data Mining Dominance  

“[We don't] have all the functionality of something like a SAS or an SPSS, because that’s just not our market,” he concedes. It comes down to a difference of scale, Farmer argues: SAS and SPSS typically target larger, more expensive deployments — typically with users well-versed in the usage of their tools. Microsoft is targeting a different kind of data mining consumer: the Excel analyst, for example, who might not have much (if any) experience — with data mining, predictive analytics, or statistical analysis for that matter.

“By the way, I don’t mean to say we can’t hit the high-end. Within Microsoft, we have our own database marketing team. We’re one of the largest companies in the world. We have a huge database marketing team who do classic customer analysis. These guys were all SAS users, but when they joined Microsoft, they started using our tools. The entire process runs on our database, they actually use the Excel [data mining] add-ins to do it. It’s not that there’s nothing they don’t miss, [it's that] they are able to achieve the same business results using our tools.”

Last year, Microsoft released a data mining and predictive analytic add-on for its Excel 2007 product (see http://www.microsoft.com/downloads/details.aspx?FamilyId=7c76e8df-8674-4c3b-a99b-55b17f3c4c51&DisplayLang=en). The add-on, which is similar to Microsoft’s well-known SQL Server BI Accelerator products, integrates natively with Excel 2007. It introduces a new “Data Mining” tab that exposes several pre-built functions, including forecasting, accuracy charting, cross-validation, exception highlighting, category detection, key influencers, shopping basket analysis (the last is a SQL Server 2008-only function) and many others.

From an article on ESJ.

The article has

no responses yet

Written by Guru Kirthigavasan

May 7th, 2008 at 6:15 pm

Data Mining Prescribed To Ensure Drug Safety  

From Info Week -

This week, WellPoint — one the nation’s largest health insurers — revealed it’s investing millions of dollars in a three-year project to build such a drug surveillance system in collaboration with the FDA and several academic institutions, including Harvard University, University of Pennsylvania, and the University of North Carolina. The Safety Sentinel System will mine and analyze aggregate claims, lab, and pharmaceutical data from WellPoint’s 35 million members, who generate 1.4 billion “claim lines” of data each year, said Marcus Wilson, president of HealthCore, WellPoint’s medical outcomes research subsidiary, which WellPoint acquired in 2003 and is overseeing the new project.

The article has

no responses yet

Written by Guru Kirthigavasan

April 21st, 2008 at 6:32 pm