| October 16th, 2009 by Jessica Tsai |
As a speaker in one of the final sessions of Forrester’s Business Technology Forum 2009, Forrester Research Analyst Leslie Owens started off by expressing her gratitude to the final batch of attendees sticking it out until the very end of the conference, which concluded last Friday.
Owens then began her presentation, “Leverage Information, Workers, And Technology To Thrive In A Lean Environment,” with an anecdote about how Pfizer built a text-mining platform. The pharmaceutical company crunched all of its data, but it needed a way to access it — something more powerful than just search. What the company needed was semantic search capabilities that recognize terms that may be synonymous, or in a class of items (e.g., all beta blockers).
Forrester defines text analytics as a “process (changing behaviors and way you look at information) of extracting and analyzing (what you pull out) information patterns in text collections.”
The 4 capabilities of text analytics includes:
- entity extraction (things that can be pulled out and broken down);
- categorization (group similar information together);
- relationship mapping (connect entities to one another that may be abstract from the text itself); and
- sentiment analysis (reveal the mood and tone of the text, where we trust a computer to pick up on human sentiment). Sentiment analysis, however, isn’t ready for “prime time” because of sarcasm, all the characteristics that allow us to have such a rich language.
The topic of the talk was originally supposed to focus on content analytics, but Owens changed it as a result of the research she conducted this summer. Owens realized there wasn’t a full set of content (text, image, video, audio) analytics tools, whereas business pains currently demand the need for text analytics – whether it’s a snippet of text or an entire paper.
Owens points out the ubiquitous phrase “voice of” — whether it’s the voice of the industry, employees, competitors, or customers — all of which can provide deep insight into help companies learn about the company. “Voice of the customer” tends to be the “theme” text analytics vendors try to push, but it’s more than that, she said. You’re not only listening, you’re solving a problem, and saving money when you know where the problems are coming from (e.g., scientific literature, satisfaction surveys, technician notes, patent filings). Moreover, social media is certainly giving companies an opportunity to listen proactively and make changes in response.
Owens went on to discuss the challenge publishing company McGraw-Hill had when 30 million pages-worth of insight was locked in its database. The company was able to create topic pages based on SQL queries and automate that into rules to a point where those rules could be run against the text. This allowed those interested in the topics to find them easily based on specified topic domains – a revolution, Owens said, for the publishing industry. Scientific and publishing communities are early adopters of text analytics because they have the “best” text — that is, their content is the most polished and trusted.
Where text analytics hits its sweet spot is in the merging of structured and unstructured content. Business intelligence vendors are excited about doing more than just talking about data and looking back on it, Owens said. They’re interested in finding out WHY consumers bought and what action to take as a result of that insight.
Attendees expressed their concern about relying on insight from text analytics. Accuracy, Owens admitted, simply isn’t there yet. Semantic specialty vendors are trying to tackle the market in little bits and pieces (Owens keeps track of about 40 to 50 vendors, such as Clarabridge and Attensity). Listening platforms are typically marketing-oriented semantic tools that enable companies to pick up mentions of their brands. Information management vendors (IBM, Oracle) are branching into this area as well, while also OEMing the specialty vendors.
What text analytics ultimately comes down to is improving the customer experience. If, for instance, call centers are converting all their voice recordings to text and mining that data, maybe the goal is to not turn customers around quickly and instead hear what they have to say.
Today, the technology is still more targeted toward the sophisticated end user, but more packaged and turnkey solutions will be available eventually. It’s a waste of time for employees to have to be so diligent about tagging, Owens said. She admits that she used to be cynical, but now believes entity extraction does work. However, text analytics is domain-dependent, so don’t just hand off a mess of files and expect it to be magically sorted.
Going forward, companies should push for text analytics using the argument that they can listen to the customer, and solve internal business problems that can’t be identified through a simple search. All of these things can be measured, Owens said, which will help companies identify where they need to go and spend — and by eliminating waste, companies will be practicing lean.
How much time and energy needs to go into defining the vocabulary, one audience member asked? According to Owens, the work is currently being done with the help of pre-existing vocabulary databases and dictionaries, also called “cartridges.” There’s a split in the industry between machine learning approaches and rules-based approaches. Academics are now moving toward machine-based learning, where the system gets smarter and smarter, breaking down the sentence and learning how words are being used and essentially connecting its own dots.
The technology is still a work in progress, especially when it comes to making business sense out of it. This, of course, didn’t make it easy for Owens to tackle the questions to challenges yet unresolved, forcing her to make a request of her own – “Any less-tricky questions?”


