Most businesses have a huge amount of text-based data, such as memos, company documents, emails, reports, media releases, customer records and communication, websites, blogs and social media posts. Until recently it wasn’t always that useful, at least in terms of easily extracting business-critical insights. But that has all changed thanks to text analytics.
Understanding text analytics
Text analytics, also known as text mining, is a process of extracting value from large quantities of unstructured text data. While the text itself is structured to make sense to a human being (i.e. A company report split into sensible sections) it is unstructured from an analytics perspective because it doesn’t fit neatly into a relational database or rows and columns of a spreadsheet. Traditionally, the only structured part of text was the name of the document, the date it was created and who created it.
Access to huge text data sets and improved technical capability means text can be analysed to extract high-quality information above and beyond what the document actually says. For example, text can be assessed for commercially relevant patterns such as an increase or decrease in positive feedback from customers, or new insights that could lead to product tweaks, etc. As such, text analytics is now capable of telling us things we didn’t already know and, perhaps more importantly, had no way of knowing before. And these insights can be incredibly useful in business.
Text analytics is particularly useful for information retrieval, pattern recognition, tagging and annotation, information extraction, sentiment assessment and predictive analytics. It could, for example, shed light on what your customers think of your product or service, or highlight the most common issues that your customers complain about.
Making sure your text is analysis-ready
It’s not enough for the text to be in a digital format, it also needs to be datafied. If you copied a page from a book as a jpeg file, you would technically have a digital copy of the text but it would be no good for running text analytics. What you need is datafied text like the text we see in many e-readers which allow you to interact with the text (by highlighting sections, adding notes, searching the text, etc.). So, any old paper files that you want to analyse will need to be rendered in a digital but also datafied format.
Once the text is ready there are a number of commercially available text analytic tools that can help you. Which one you use will depend on your objective.
Text analytics in action
Unsure how you would use text analytics in practise? Say, for instance, you are concerned about the level of employee engagement in your company and decide to conduct an employee engagement survey. You could read through hundreds of questionnaire responses and that might give you some good ideas, or a sense of who is happy and who is not, but it wouldn’t really give you any indication of trends or what the collective was really feeling.
Text analytics would allow you to assess all that free-flowing unstructured text and establish trends or clusters of opinion in the business, divisions and within specific teams. Text analytics is also having a big impact beyond the world of business. In healthcare, for example, companies are using text analytics to extract large amounts of information from patient medical records – information that can then be used to understand the overall health of the population and improve treatment methods. One such company, Apixio, analyses the information found in electronic healthcare records, such as GP notes, consultant notes, radiology notes, pathology results, etc. To analyse this information, which comes in a wide variety of formats and may even be handwritten, they first have to turn it into something that computers can analyse. They do this using OCR (optical character recognition) technology to create a textual representation of the information that computers can read and understand. The data can then be analysed at an individual patient level, or it can be aggregated across the population in order to derive big-picture insights around disease prevalence, treatment patterns, etc. Apixio hopes that by mining such practise-based clinical data for information – who has what condition, what treatments are working, etc. – we can learn a lot about the way we care for individuals and make improvements based on actual knowledge of what works and what doesn’t.
A word of warning
Converting older, paper-based text documents into something that can be used for analysis can be very time consuming and expensive, so it’s best to be selective rather than attempting to analyse everything you have lying around in your archives. Also keep in mind that most data has a shelf life. Rather than converting old text into an analysis-ready format, it is often better to focus on the new text data you already have access to, such as emails and social media posts.
Where to go from here
If you would like to know more about strategy, KPIs and performance management, cheque out my articles on:
Bernard Marr is a bestselling author, keynote speaker, and advisor to companies and governments. He has worked with and advised many of the world's best-known organisations. LinkedIn has recently ranked Bernard as one of the top 10 Business Influencers in the world (in fact, No 5 - just behind Bill Gates and Richard Branson). He writes on the topics of intelligent business performance for various publications including Forbes, HuffPost, and LinkedIn Pulse. His blogs and SlideShare presentation have millions of readers.