Text Analytics Accuracy
Seth Grimes writes a very interesting article in the B-Eye Network, on Text Analytics and how accurate they are in deployments.
A Must Read for Text Analytics Teams.
Here’s an interesting paragraph from the article -
The accuracy of information retrieval (for instance, the results returned by a search) and of information extraction (where important entities, concepts and facts are pulled from “unstructured” sources) is typically measured by an f-score, a value based on two factors – precision and recall.
Precision is the proportion of information found that is correct or relevant. For example, if a Web search on “John Lennon” turns up 17 documents on Lennon and also 3 exclusively about Yoko Ono, who is of little interest but was associated with Lennon due to co-occurrence of the two individuals’ names in a large number of documents, then the precision proportion would be 17/20 or 85%.
Recall, by contrast, is the proportion of information found of information available. If there were actually 8 documents legitimately about John Lennon that were not found, perhaps because only a small portion of each was devoted to Lennon, leading to low “term density,” then the recall would be 17/25 or 68%.
[...] liked the recent article of Seth Grimes which talks about Text Analytics Accuracy. His article, today, on Intelligent Enterprise, pointed me to the IBM article on IBM® OmniFind™ [...]
From Text Analytics to Data Warehousing | The Business Intelligence Blog
19 May 08 at 11:30 am