The Business Intelligence Blog

Slicing Business Dicing Intelligence

Archive for the ‘Text analytics’ tag

Text Analytics Accuracy  

Seth Grimes writes a very interesting article in the B-Eye Network, on Text Analytics and how accurate they are in deployments.

A Must Read for Text Analytics Teams.

Here’s an interesting paragraph from the article -

The accuracy of information retrieval (for instance, the results returned by a search) and of information extraction (where important entities, concepts and facts are pulled from “unstructured” sources) is typically measured by an f-score, a value based on two factors – precision and recall.

Precision is the proportion of information found that is correct or relevant. For example, if a Web search on “John Lennon” turns up 17 documents on Lennon and also 3 exclusively about Yoko Ono, who is of little interest but was associated with Lennon due to co-occurrence of the two individuals’ names in a large number of documents, then the precision proportion would be 17/20 or 85%.

Recall, by contrast, is the proportion of information found of information available. If there were actually 8 documents legitimately about John Lennon that were not found, perhaps because only a small portion of each was devoted to Lennon, leading to low “term density,” then the recall would be 17/25 or 68%.

The article has

one response

Written by Guru Kirthigavasan

May 15th, 2008 at 7:01 am

Posted in Analytics, General

Tagged with ,