Category Archives: Text Analytics

The Jargon of the Novel, Computed

Scholars in the growing field of digital humanities can tackle this question by analyzing enormous numbers of texts at once. When books and other written documents are gathered into an electronic corpus, one “subcorpus” can be compared with another: all the digitized fiction, for instance, can be stacked up against other genres of writing, like news reports, academic papers or blog posts.

One such research enterprise is the Corpus of Contemporary American English, or COCA, which brings together 425 million words of text from the past two decades, with equally large samples drawn from fiction, popular magazines, newspapers, academic texts and transcripts of spoken English. The fiction samples cover short stories and plays in literary magazines, along with the first chapters of hundreds of novels from major publishers. The compiler of COCA, Mark Davies at Brigham Young University, has designed a freely available online interface that can respond to queries about how contemporary language is used. Even grammatical questions are fair game, since every word in the corpus has been tagged with a part of speech.

More…

SPSS Rebrands Its Analytical Offerings

The new version of the SPSS modeling product — the erstwhile Clementine — is now known as PASW Modeler 13; its text analysis product (formerly Text Mining for Clementine) is now PASW Text Analytics 13. SPSS says that, over the course of the year, the rest of the SPSS product line will update under the PASW umbrella — including Statistics and Data Collection.

David Vergara, director of product marketing for SPSS, explains that the change was intended to help customers and prospects understand what the products are doing and how each offering pieces together within the broader portfolio.

Aside from the name change, the new versions of SPSS products focus on usability — and not just for data experts. Wettemann says that SPSS has “recognized that moving beyond the data analyst audience is where you get the real power.” PASW Modeler 13 features a drag-and-drop interface, and functionality that will appeal to business users. Two integral updates include a “comments” tool, in which users can flag notes within the software, and automated data preparation. Data automation mitigates human error and avoids common issues in data quality.

From Destination CRM.