Text Analytics For Everyone
The power of text analytics is now readily accessible through intuitive and openly available software. Running analysis is straightforward enough, as the examples in this article illustrate. However, the prerequisites of impactful insights haven’t changed. Proper problem definition, domain expertise, and stakeholder engagement are key to well-guided mining and actionable output—whether you’re improving a user interface or the broader user experience.
Most estimates state that at least 80% of enterprise information and new data generated are in text form. The increasing potential that text analytics holds is undisputed:
- Municipalities around the world leverage content-based predictive and root cause analysis to support public welfare initiatives. (Read more)
- Manufacturers and retailers use text analytics to support a range of applications including marketing, risk monitoring, staff recruitment, and more. (Read more)
- The aviation industry analyzes reports from pilots, mechanics, and other personnel to identify patterns related to airline safety. (Read more)
- The hospitality industry mines consumer generated content to support operations, new service development, positioning, and pricing. (Read more)
- The finance industry incorporates customer feedback in efforts to improve service levels and reduce fraud. (Read more)
There are a number of freeware applications that support text analytics, opening the arena to curious practitioners across design and data science disciplines. Three platforms that are easy to use and that work nicely together as part of an integrated analytics framework are KH Coder, OpenOffice Base, and Wordaizer:
KH Coder, developed by Koichi Higuchi (Associate Professor of Social Sciences at Ritsumeikan University, Kyoto, Japan), is a great starting point for identifying themes in large unstructured data sets, such as online reviews or open-ended customer feedback. Three particularly useful features of KH Coder include content reduction, word frequency, and advanced analysis/data visualization capabilities:
- Stemming and lemmatization reduce words to their roots for association, while the stopword function allows you to exclude commonly occurring but irrelevant words.
- The frequency function provides a holistic snapshot of the most common themes, ranking words by occurrence and by part of speech.
- Advanced analytics and visualization capabilities, such as hierarchical cluster analysis and co-occurrence networks, provide greater insight into themes by illustrating relationships between key words.
Co-occurrence networks highlight high-frequency words occurring together, creating content communities. Members of one community are highly interconnected but have weaker relationships with the rest of the network. The separation of communities, or white space, can be seen as uncharted territory and may represent opportunities with disproportionate advantages if bridged.
This example illustrates online reviews for a random hotel, with the fictitious name of Paradise Inn (location names have also been masked). Themes guests talk about range from the beach to resort fees, topics that can be further mined for greater insight to support positioning, communications, and product / service improvement initiatives.
OpenOffice Base, from the Apache Software Foundation, is a comprehensive database management platform suitable for simple queries and more complex relational table operations. Although not designed specifically as a content analysis tool, OpenOffice Base provides a flexible environment for text mining that is fast and fluid. With basic SQL commands, key themes can be quantified and rich qualitative insights uncovered. Among some of Base’s strengths:
- Ability to search for content meeting specific conditions with SQL commands.
- Well integrated SQL command and output windows to quickly find what you want.
- Fully functioning click-and-drag GUI to perform operations such as view formatting and search.
This example builds on the above co-occurrence network to quantify the importance of each community based on the number of hotel guests mentioning key words associated with a given theme. Mention frequencies are rebased to all defined cases as a proxy for community importance.
Further investigation suggests this hotel is especially strong on service elements. However, resort fees surprise a number of guests, representing an opportunity to improve communications and further differentiate. The verbatim here paraphrases one guest’s review.
Wordaizer is a data visualization tool that forms wordclouds based on text frequencies. Wordaizer is similar to the popular Wordle application, but provides greater flexibility in terms of image modification and management. And as a desktop application rather than a web based platform, there’s no issue in terms of data confidentiality.
Wordaizer provides impactful illustrations of text, which developer Sybren Haagsma terms “wordlets”. As a build on the above analysis, Wordaizer illustrations can be used to succinctly capture strengths and opportunities. This particular example nicely captures many positioning elements of (the fictitiously named) Paradise Inn.
With its creative applications, Wordaizer is being used in business, scientific, and educational environments. Wordaizer also belongs to a broader suite of image and data visualization tools that capture the developer’s fascination with and passion for patterns.
Like other dimensions of consumer research and business intelligence, text analytics as a capability on its own isn’t a panacea and doesn’t automatically generate groundbreaking insight. Text analytics has its share of challenges, such as data access, integrity, and classification. And as with any other conduit of insight, softer and non-technical aspects trump all else in terms of importance. Consultant Meta Brown captured it well by explaining, “People who are using text analytics are saying that they are not getting the positive returns they had hoped for. They aren’t getting them because they didn’t start with an end in mind and work backwards.”
Text analytics holds great potential whether you design interfaces or science data, as the platforms showcased above illustrate. However, it’s likely to have limited impact, even with the best of tools, if problem definition, contextual understanding, and stakeholder buy-in are not made priorities.
Image of word cloud courtesy Shutterstock