Archive for the ‘Text analytics’ tag
Text Analytics Accuracy
Seth Grimes writes a very interesting article in the B-Eye Network, on Text Analytics and how accurate they are in deployments.
A Must Read for Text Analytics Teams.
Here’s an interesting paragraph from the article -
The accuracy of information retrieval (for instance, the results returned by a search) and of information extraction (where important entities, concepts and facts are pulled from “unstructured” sources) is typically measured by an f-score, a value based on two factors – precision and recall.
Precision is the proportion of information found that is correct or relevant. For example, if a Web search on “John Lennon” turns up 17 documents on Lennon and also 3 exclusively about Yoko Ono, who is of little interest but was associated with Lennon due to co-occurrence of the two individuals’ names in a large number of documents, then the precision proportion would be 17/20 or 85%.
Recall, by contrast, is the proportion of information found of information available. If there were actually 8 documents legitimately about John Lennon that were not found, perhaps because only a small portion of each was devoted to Lennon, leading to low “term density,” then the recall would be 17/25 or 68%.
The article has
one response