Text Analytics: the reality of accuracy

December 18, 2015

The accuracy of automated text analytics is a big debate in the Voice of the Customer and big data world. Lucy Russell, Product Manager at eDigitalResearch – and a key driving force behind the release of our latest HUB Text Analytics tool – discusses the role of accuracy for verbatim analysis in everyday business situations.

When sharing our new HUB Text Analytics tool with clients, one of first questions I’m always asked is ‘What level of accuracy can I achieve?”

Speaking to people keen to use automated verbatim analysis tools, they’ve often been promised 100% accuracy from this sort of technology in the past, only to find that this is never the case.

Accuracy is often dependent on a number of different factors. Using HUB Text Analytics, 100% accuracy is achievable, but only when using narrow topics and small data volumes – and that’s never the reality.

So, what is the reality? And how can you ensure you achieve the most from your text analytics tool?

Discovery vs. Categorisation

We often find that clients have one of two objectives for their text analytics tool – either they’re keen to discover emerging theme and trends or categorise topics into quantifiable data.

Discovery is all about speed – it’s key to understand what people are saying quickly in order to act, especially before your competitors do. It’s for this reason that accuracy is less important for discovery purposes – as long as the tool consistently looks at each comment in the same way, it will quickly highlight prominent words and phrases.

On the other hand, quantify what customers are saying is less about speed and more about accuracy: the first step on your accuracy mission is to define your text analytics objective and understand how much accuracy will play a part.

Customisation is key – one size does not fit all

Every source of textual data is different; it means that one size does not fit all when it comes to accuracy.

Whether you’re analysing customer service data with social media comments or simply looking at inputs from across different areas of your business, tailoring defined rules and parameters is key to achieving high levels of accuracy. While standard rule sets are often a great starting point, being able to tweak those standards continuously is a key part in achieving higher levels of accuracy.

Any text analytics tool that doesn’t offer users any customisation should be disregarded immediately.

The bigger the goal, the easier it is to score

Categories – and more importantly, how many categories – has a big impact on the level of accuracy you can achieve.

Widely speaking, the broader your category net, the higher level of accuracy you’ll likely achieve. While a granular range of categories might provide a more specific assessment of what your customers are saying, the more categories there are for comments to be categorised against, the higher the chance that a comment might end up in the wrong one.

For example, take some comments about staff. It’s easy to detect in the comments below what department or staff level each customer is talking about.

“When I rang you up, Sandra was very helpful and answered my questions quickly.”

“Your Engineer was very polite and tidied up after himself”

But customer comments doesn’t always give you the context required.

“I was very impressed with Pete, he gave great service and fixed my phone problem, thank you”

This particular comment doesn’t confirm if Pete was a phone agent or engineer, however, it does fall into the boarder category of staff. The association of a name and the term ‘phone’ together, however, could result in an ambiguous ‘phone agent comment’ categorisation.

Of course you can adjust rules to exclude this particular scenario but to cover every possible sentence structure is on impossible. There’s often a trade off when trying to achieve perfection. You need to decide within your business what you’re after – an extremely high level of accuracy or granular category detail.

Transparency is king

Any good text analytics tool will allow you to continuously alter your set-up – change in language trends, the input of new data and new data sources will all influence the level of accuracy you can achieve when automating verbatim analysis.

In order to maximise a tools accuracy, it’s important to see where it might be going wrong and quickly and easily put it right. A primary goal of HUB Text Analytics is allowing users to take control of accuracy levels and allow them to put right any wrongs quickly and easily; users have full control and visibility over criteria, or better yet, utilise our team of in-house research experts who will manage the whole process for you.

Accuracy diagram


Stop trying to achieve accurate results

Trying to reach 100% accuracy should never be an objective of text analytics users – while adjusting and tailoring rules and categories is key for achieving a high level of accuracy, too much interference could actually lead to a down-turn.

Instead, an accuracy level of 80-90% is much more realistic and achievable.

Even a human brain could never understand, analyse and categorise comments accurately each and every time. Typos, miss-spellings, multiple word meanings, language trends and variability of sentence structure are all everyday occurrences that effect how textual data is treated.