Archive for the ‘Data Mining’ tag
Portrait Software Utilizes Analytics to Provide PA/DM
From Press Release -
Forrester evaluated the top nine predictive analytics and data mining (PA/DM) solution vendors across 53 criteria, segmenting them into the three categories including current offering, product strategy, and market presence. As a leader offering “mature, high-performance, scalable, flexible, and robust PA/DM solutions,” Portrait received the 3rd highest score for Product Strategy & the 6th highest score for Current Offering.”
Among the vendor products the Forrester(TM) Wave evaluated were Portrait Customer Analytics, Portrait Uplift Optimizer, and Portrait Self-service Analytics. According to the Forrester(TM) Wave, “Portrait provides a user-friendly, feature-rich PA/DM solution portfolio in support of real-time scoring, interaction optimization, uplift optimization, and campaign management for customer analytics.”
“Powerful customer analytics have always been the core driver of Portrait’s innovative marketing solutions, but analytics itself only takes you so far,” said Luke McKeever, CEO, Portrait Software. “Portrait’s ability to not only incorporate analytics but to action the insights they deliver enables us to provide our customers with highly intelligent solutions that help them operate as a customer-centric organization, differentiating them from their competitors while simultaneously improving their marketing ROI.”
Microsoft Unveils Apps for Crime-Fighting Data Mining
Once again, software is fighting crime. Microsoft unveiled a suite of tools and initiatives for law-enforcement groups “specifically designed to improve public security and safety,” the company said.
..
..
It’s also the latest example of law enforcement officials arming themselves with better technology to help fight crime. The FBI, for instance, said that new database and data-sharing efforts have resulted in solving a number of difficult highway serial killings.Gathering that data is key. That’s why Microsoft this week said it is giving a free tool to INTERPOL called the Computer Online Forensic Evidence Extractor (COFEE), an application that “uses common digital forensics tool to help officers at the scene of the crime.”
The company is working on a mobile version for future release, said Richard Domingues Boscovich, senior attorney for Microsoft’s Internet security program, told InternetNews.com in an e-mail.
A larger tool set for large-scale crimes is Microsoft Intelligence Framework, which is aimed at helping intelligence and law enforcement agencies coordinate information to detect and prevent terrorism, and to solve organized and major crime cases. The framework offers tools for storing and analyzing evidence and information across a variety of sources
From EarthWeb article.
SPSS Rebrands Its Analytical Offerings
The new version of the SPSS modeling product — the erstwhile Clementine — is now known as PASW Modeler 13; its text analysis product (formerly Text Mining for Clementine) is now PASW Text Analytics 13. SPSS says that, over the course of the year, the rest of the SPSS product line will update under the PASW umbrella — including Statistics and Data Collection.
David Vergara, director of product marketing for SPSS, explains that the change was intended to help customers and prospects understand what the products are doing and how each offering pieces together within the broader portfolio.
Aside from the name change, the new versions of SPSS products focus on usability — and not just for data experts. Wettemann says that SPSS has “recognized that moving beyond the data analyst audience is where you get the real power.” PASW Modeler 13 features a drag-and-drop interface, and functionality that will appeal to business users. Two integral updates include a “comments” tool, in which users can flag notes within the software, and automated data preparation. Data automation mitigates human error and avoids common issues in data quality.
From Destination CRM.
Data Mining Moves to HR
For most of its eight-year history, Cataphora has focused on digital sleuthing. The company hunts for statistical signs of fraud. But in the past few years, Cataphora has been dispatching its data miners into a new market: statistical studies of employee performance.
The trend, though early, is unmistakable, and it extends far beyond Redwood City. Number crunching, a staple for decades in the quantifiable domains of engineering and finance, has spread in recent years into marketing and sales. Companies can now model and optimize operations, and can calculate the return on investment on everything from corporate jets to Super Bowl ads. These successes have led to the next math project: the worker. “You have to bring the same rigor you bring to operations and finance to the analysis of people,” says Rupert Bader, director of workforce planning at Microsoft (MSFT).
Such a mission might have been laughable a decade ago. But as the role of computers in the workplace expands, employees leave digital trails detailing their behavior, their schedule, their interests, and expertise. For executives to calculate the return on investment of each worker, their human resources departments are starting to open their doors to the quants.
From Business Week, an insightful article on how value of each employee is determined by HR using Data Mining/Analytics.
The Petabyte BI World – Wired

Sensors everywhere. Infinite storage. Clouds of processors. Our ability to capture, warehouse, and understand massive amounts of data is changing science, medicine, business, and technology. As our collection of facts and figures grows, so will the opportunity to find answers to fundamental questions. Because in the era of big data, more isn’t just more. More is different.
This month’s Wired magazine carries one of the most important growing concerns of the scientific community, the uncontrollable growth of data. This growth of data in many directions is nearly killing theories as everything is becoming more and more data controlled.

There are a series of articles ranging from what data miners are digging today to elaborate algorithms that predict air ticket prices to how we can monitor epidemics hour by hour.
If you are a BI entusiast or not, this month’s Wired cover story will challenge all your predictions about science and technology, even if you have a petabyte of data to support it !! Read it, like, right now !!
From Text Analytics to Data Warehousing
I liked the recent article of Seth Grimes which talks about Text Analytics Accuracy. His article, today, on Intelligent Enterprise, pointed me to the IBM article on IBM® OmniFind™ Analytics Edition which talks in detail about extracting unstructured data from e-mail, Web pages, news and blog articles and building a data warehouse out of them to unlock the huge potential which was previously untapped.
In recent months/weeks, the focus on unstructured data is becoming more and more as businesses and vendors are starting to understand the power of this unstructured data and how it can text mined and used to the benefit of the exterprises. And its a good this.
A must read. Highly Recommended.

Text analytics enables you to extract more business value from unstructured data such as emails, customer relationship management (CRM) records, office documents, or any text-based data. IBM® OmniFind™ Analytics Edition provides rich text analysis capabilities and interactive visualization to enable you to find patterns and trends hidden in large quantities of unstructured information. The text analysis results from OmniFind Analytics Edition are in XML-format and can also be stored, indexed, and queried in a DB2 database. This allows you to incorporate your text analysis results into existing business applications and reporting tools by using regular SQL or SQL/XML queries. This article provides an overview of text analytics with OmniFind Analytics Edition and describes several ways of bringing its analysis results into DB2, in relational or pureXML™ format.
..
..
OmniFind Analytics Edition provides the ability to interactively explore and mine the results of text analysis, as well as structured data that is typically associated with unstructured text. For those of you familiar with business intelligence applications, you can think of it as content-centric business intelligence, in that it aggregates the results of text analysis to detect frequencies, correlations, and trends. Typical use cases include:Analysis of customer contact information (e-mails, chats, problem tickets, contact center notes) for insight into quality or satisfaction issues
Analysis of blogs and wikis for reputation monitoring
Analysis of internal e-mail for compliance violations or for expertise location
Microsoft Sets Sights on Data Mining Dominance
“[We don't] have all the functionality of something like a SAS or an SPSS, because that’s just not our market,” he concedes. It comes down to a difference of scale, Farmer argues: SAS and SPSS typically target larger, more expensive deployments — typically with users well-versed in the usage of their tools. Microsoft is targeting a different kind of data mining consumer: the Excel analyst, for example, who might not have much (if any) experience — with data mining, predictive analytics, or statistical analysis for that matter.
“By the way, I don’t mean to say we can’t hit the high-end. Within Microsoft, we have our own database marketing team. We’re one of the largest companies in the world. We have a huge database marketing team who do classic customer analysis. These guys were all SAS users, but when they joined Microsoft, they started using our tools. The entire process runs on our database, they actually use the Excel [data mining] add-ins to do it. It’s not that there’s nothing they don’t miss, [it's that] they are able to achieve the same business results using our tools.”
Last year, Microsoft released a data mining and predictive analytic add-on for its Excel 2007 product (see http://www.microsoft.com/downloads/details.aspx?FamilyId=7c76e8df-8674-4c3b-a99b-55b17f3c4c51&DisplayLang=en). The add-on, which is similar to Microsoft’s well-known SQL Server BI Accelerator products, integrates natively with Excel 2007. It introduces a new “Data Mining” tab that exposes several pre-built functions, including forecasting, accuracy charting, cross-validation, exception highlighting, category detection, key influencers, shopping basket analysis (the last is a SQL Server 2008-only function) and many others.
From an article on ESJ.
Data Mining Prescribed To Ensure Drug Safety
From Info Week -
This week, WellPoint — one the nation’s largest health insurers — revealed it’s investing millions of dollars in a three-year project to build such a drug surveillance system in collaboration with the FDA and several academic institutions, including Harvard University, University of Pennsylvania, and the University of North Carolina. The Safety Sentinel System will mine and analyze aggregate claims, lab, and pharmaceutical data from WellPoint’s 35 million members, who generate 1.4 billion “claim lines” of data each year, said Marcus Wilson, president of HealthCore, WellPoint’s medical outcomes research subsidiary, which WellPoint acquired in 2003 and is overseeing the new project.
MS in Verticals – Buys Predictive Analytics company, Farecast
Seattle Pi’s Venture Blog has the full story from the start to the end.
Farecast was started by University of Washington computer scientist Oren Etzioni, initially bankrolled by Madrona, built with people from local companies such as Alaska Airlines and AdRelevance and, ultimately, acquired by Microsoft.
Though Farecast had multiple bidders, McIlwain said Microsoft was a good fit since the two companies had worked together in the past and had a similar vision for online search. The proximity of the two companies also played a part, he said.
The acquisition follows the merger of Kayak.com and SideStep, the market leader in next generation travel search. That deal led to new opportunities for Farecast, including discussions with Microsoft which heated up in the past 90 days.
“That consolidation presented opportunities for Farecast … partly differentiated because of their predictive capabilities but also because of who they might have been able to align with in the industry to be a strong and differentiated number two, hoping some day to overtake and become number one,” he said.
Madrona has produced a number of hits recently, with the sales of ShareBuilder, World Wide Packets and iConclude.
Also a quick analysis from Motel Fool on this buy -
Microsoft needs more deals like this one, especially if the Microhoo deal comes undone, and the software giant has the means to go shopping. I’ve suggested that Microsoft pursue potential buyout candidates like The Knot (Nasdaq: KNOT) and Bankrate (Nasdaq: RATE) for the same reason that Farecast works. Whether it’s wedding planning, home refinancing, or booking that flight to visit your parents in Chicago, this is the quality traffic that Microsoft and Yahoo! lack right now.
Stanford students working on Netflix Algorithms
Anand Rajaraman, the co-founder of Kosmix also teaches Data Mining at Stanford. Here’s an interesting note from his blog.
Some of his students are working to crack algorithms for the on-going Netflix “Better Recommendation Logic” Prize of $1 million. Read it !!
Here’s how the competition works. Netflix has provided a large data set that tells you how nearly half a million people have rated about 18,000 movies. Based on these ratings, you are asked to predict the ratings of these users for movies in the set that they have not rated. The first team to beat the accuracy of Netflix’s proprietary algorithm by a certain margin wins a prize of $1 million!
Different student teams in my class adopted different approaches to the problem, using both published algorithms and novel ideas. Of these, the results from two of the teams illustrate a broader point. Team A came up with a very sophisticated algorithm using the Netflix data. Team B used a very simple algorithm, but they added in additional data beyond the Netflix set: information about movie genres from the Internet Movie Database (IMDB). Guess which team did better?
The article has
no responses yet