Archive for the ‘Text analytics’ tag
SPSS Rebrands Its Analytical Offerings
The new version of the SPSS modeling product — the erstwhile Clementine — is now known as PASW Modeler 13; its text analysis product (formerly Text Mining for Clementine) is now PASW Text Analytics 13. SPSS says that, over the course of the year, the rest of the SPSS product line will update under the PASW umbrella — including Statistics and Data Collection.
David Vergara, director of product marketing for SPSS, explains that the change was intended to help customers and prospects understand what the products are doing and how each offering pieces together within the broader portfolio.
Aside from the name change, the new versions of SPSS products focus on usability — and not just for data experts. Wettemann says that SPSS has “recognized that moving beyond the data analyst audience is where you get the real power.” PASW Modeler 13 features a drag-and-drop interface, and functionality that will appeal to business users. Two integral updates include a “comments” tool, in which users can flag notes within the software, and automated data preparation. Data automation mitigates human error and avoids common issues in data quality.
From Destination CRM.
Text Analytics Accuracy
Seth Grimes writes a very interesting article in the B-Eye Network, on Text Analytics and how accurate they are in deployments.
A Must Read for Text Analytics Teams.
Here’s an interesting paragraph from the article -
The accuracy of information retrieval (for instance, the results returned by a search) and of information extraction (where important entities, concepts and facts are pulled from “unstructured” sources) is typically measured by an f-score, a value based on two factors – precision and recall.
Precision is the proportion of information found that is correct or relevant. For example, if a Web search on “John Lennon” turns up 17 documents on Lennon and also 3 exclusively about Yoko Ono, who is of little interest but was associated with Lennon due to co-occurrence of the two individuals’ names in a large number of documents, then the precision proportion would be 17/20 or 85%.
Recall, by contrast, is the proportion of information found of information available. If there were actually 8 documents legitimately about John Lennon that were not found, perhaps because only a small portion of each was devoted to Lennon, leading to low “term density,” then the recall would be 17/25 or 68%.
The article has
no responses yet