An Interview with Antonio S. Valderrábanos
At Cebit in Hannover, Germany, I caught up with Antonio S. Valderrábanos, the founder of Bitext. His company has been growing rapidly, announcing deals with Salesforce and the Spanish government. Bitext provides multilingual semantic technologies, with probably the highest accuracy in the market, for companies that use text analytics and natural language interfaces. I interviewed him in April 2008. Bitext has added significant and important technology to its multi-lingual content processing system. In addition to support for more languages, the company is getting significant attention for its flexible sentiment analysis system. The company is participating in US and European technical meetings; for
example, the KNIME workshops which took place in Zurich, Switzerland after Cebit. The full text of my interview appears below.
The obvious question is, “How did you get involved in semantics?”
As you know, at the beginning we worked at large multinational companies that invested in natural language processing, like IBM, Novell, or at research institutes like Carnegie Mellon University.
These projects were usually run by programmers and engineers who did not did too deeply into the power of linguistics. We thought that NLP problems required a more flexible design, capable of adjusting to the ambiguity inherent to natural languages.
Will you give me an example?
Yes, for example, “flies” may be a noun, but also a verb. We say “time flies like an arrow” versus “fruit flies like bananas”. We thought that computers should be able to parse both sentences and get the right meaning.
With that goal in mind, we started the development of our NLP platform. This platform is flexible enough to perform multilingual analysis just by exchanging grammars, not modifying the core engine.
What’s the advantage of this approach?
Our system and method give us a competitive advantage with regards to quick development and deployment. The most important information challenges we resolve for clients have to do with multilingual and multipurpose content access based on text analytics involving entity/concept/event extraction and sentiment analysis. Currently, our NLP platform can handle 10 languages. Unlike most linguistic platforms, our system “snaps in” to existing software. Customers want solutions which can be up and running quickly and deliver enhanced information access. Bitext delivers on both counts.
When did you become interested in text and content processing?
My interest in text/content processing goes all the way back to my university years, when I was studying linguistics and computer science. Back then, computers were still very rudimentary as far as NLP processing goes, but I started out with dictionary development using a hard-copy thesaurus of the Spanish language, containing over 100,000 words classified with semantic relations.
The dictionary was useful but hard to use in paper, I decided to develop a computer program for looking up words in my dictionary. I was able to “computerize it” so to speak.
I derived considerable satisfaction when I accomplished the first Spanish dictionary which could be used in a computer. From this experience, my interest in computational linguistics grew. Founding Bitext was a natural step. I think I progressed from a hard copy dictionary to the broader issues of content processing, from morphological, to syntactic, to lexical and semantic processing.
Bitext has been investing in search, content processing, and text analysis for several years. What are the general areas of your research activities?
Our main area of research is focused on deep language analysis, which captures the semantics of text.
Our work involves dealing with word meanings and truly understanding what they mean, interpreting wishes, intentions, moods or desires. Having this level of linguistic analysis, we can enrich many different business applications which need semantics in order to be effective.
What do you mean “applications”?
Good question. The applications range from search engines and NL interfaces (like virtual assistants) to text analytics solutions for social customer relationship management, social media monitoring, enterprise feedback management, voice of the customer, and business intelligence.
Many vendors argue that mash ups are and data fusion are “the consumerized way information retrieval will work going forward.” I am referring to structured and unstructured information. How does Bitext perceive this blend of search and user-accessible outputs?
Consumerization brings us closer to the user, and this is very good news. Learning what is it that users want is an important information asset.
Consumerization can help us identify applications that users want, like review sites, and design semantic enhancements for those sites. Remember we complement third-party applications with NLP functionality. It will be interesting to see which search or text analytics applications get traction.
Regarding data mash-ups, these approaches centralize access to information and they will trigger high value applications. So merging data is good. However, systems have to be clever enough to keep track of the origin of the information, since this origin provides context. As you know, data from a customer support system is different from data from a business process management system or a legacy repository of unstructured information.
Without divulging your firm's methods or clients, will you characterize a typical use case for your firm's search and retrieval capabilities / text analytics?
We just need to know what type of content, according to our client, is useful for her business purposes, and then we program the relevant linguistic structures which describe those purposes in our NLP platform, as simple as that.
Let me give you an example. Let’s say a client wants to detect angry customers sending short messages via Twitter. What we do is to analyze typical examples of angry comments about the customer’s products or services and program linguistic structures such as “subject ‘I’ plus strongly negative verb”, with which our NLP platform will detect sentences like “I resent/hate/detest/loathe your customer service” and their variations).
In the end, we semantically interpret these sentences as negative comments. But we can detect not only sentiment, we can go beyond looking for intent, threats, product feature wish-lists, etc.
The result of our linguistic analysis is made available via our API. Usually, the whole customization process takes several days maybe a little more, depending on the number of linguistic structures to detect and the languages involved.
So far, we have prepared solutions or “verticalized” for some different markets. One example is our significant work in call centers and customer support applications. We support social customer relationship management with sentiment analysis, or voice of the customer using sentiment analysis and categorization. In addition, we can constantly monitor improvements and enhance accuracy.
In the medium term, we are planning to merge text analytics with search functionalities resulting in a natural language search interface coupled with facets based on concept and entity extraction.
What are the benefits to a commercial organization or a government agency when working with your firm?
That’s my favorite question. Bitext delivers accuracy, reliability and flexibility. Each of these three benefits spring form our approach to deep linguistic analysis. We analyze every word in every sentence, and we extract their meaning. We perform sentiment analysis not just by mere numbers and statistics, but by truly “understanding” what it is that the words and their context mean. Based on our tests, our method goes well beyond what some of the other vendors offer in accuracy. With some clients for whom accuracy is relevant, we commit to delivering a given accuracy, say 80%, and commit to that by contract.
Can you name some specific advantages?
Sure. Bitext has a modest memory footprint. In fact, a typical implementation consumes about five megabytes of space for memory and data. Our text throughput can hit 12 megabytes of content per hour and per core. Our system can be easily scaled. Unlike some vendors we are platform agnostic. Our system runs on Windows or Linux. We offer easy integration with existing enterprise software or enterprise search systems.
How does an information retrieval engagement with your firm move through its life cycle?
Our approach has to do with the flexibility as the main feature of our technology. We have an NLP platform which performs a modular linguistic analysis of any type of text and, as such, we can fine-tune grammars and dictionaries to better handle the output required by our customers.
In a nutshell, we take a personal interest in each of our clients’ requirements. Our client focus means that we can go beyond canned solutions based on keywords and delve deeply into linguistic structure. In this way we provide a solution which can detect semantically meaningful information which fits customer’s needs.
Remember. We can do so in multiple languages because our modular approach. When a new language capability is required we can exchange dictionary and grammars of the languages we support without changing the NLP core engine.
In addition, we make our technology available via our API so our customers can try the technology right away and provide feedback for improving coverage and precision.
Visualization has been a great addition to briefings. On the other hand, visualization and other graphic eye candy can be a problem to those in stressful operational situations. What's your firm's approach to presenting "outputs" for end user reuse or for mobile access?
Our data is easily fed into visualization applications. For example, Salesforce uses Bitext technologies. We have implemented a similar strategy for QlikView, Information Builders, and Oteara, etc. In search, we follow the same approach: we provide semantic capabilities to search engines/interfaces.
I am on the fence about the merging of retrieval within other applications. What's your take on the "new" method which some people describe as "search enabled applications"?
We are merging search and text analytics, as two ways of making life easier for users. Taking a top down approach for natural language search or semantic search, users can express more powerful queries with less effort. We also have a bottom up approach with text analytics applied at indexing time, users get a handle to identify texts at a higher level, like texts that have been categorized as complaints or orders of new products, to name two examples.
There seems to be a popular perception that the world will be doing computing via iPad devices and mobile phones. My concern is that serious computing infrastructures are needed and that users are "cut off" from access to more robust systems. How does your firm see the computing world over the next 12 to 18 months?
iPads, tablet devices in general, and mobile phones are becoming the main computing devices in a world where almost everybody will be always online. This opens a new whole arena for mobile applications which will have to cater for any single need mobile users may have.
Our claim is that where there is written text involved, be it via social media or output of speech recognizer, language technologies as the ones developed by Bitext will have a prominent role to play. This means that there will be more demand for text analytics in our Big Data world, more demand for semantics which make sense of users’ requests, and more intelligent insights which can predict future behavior of users.
A computing revolution has begun to arrive and it is gaining momentum.
Put on your wizard hat. What are the three most significant technologies that you see affecting your search business?
The most relevant trend we see is the integration of pure search (enhanced with natural language processing functionalities with text analytics. The idea is that a user can ask a question, for example, for negative opinions on a given product that mention customer service. In this environment, text analytics enriches the indexing phase so the index is not based on isolated words, as usual, but on meaningful expressions (like entities, “New Balance”, or concepts, “customer service” and on connections between these expressions (“New Balance improves its customer service”). Being able to query in natural language and select only texts with certain features (for sentiment, category…) opens a new era for end-users exploiting text databases.
This trend will be meaningful in different areas like social CRM, where user comments from social media can be connected to customer support function. And in business intelligence, a system will connect classical numeric indicators with user comments coming from the contact center. Business management will not be anymore about numbers, but about text integrated with numbers.
Where does a reader get more information about your firm?
Our Web site (www.bitext.com) has detailed information about our technology. And I would suggest that your readers look at our demos for the API and also our live demo on sentiment analysis for the tablet market in Twitter (in English) and the telecom companies in the Spanish market (in Spanish). If a person wants to see how our system operates in a text rich environment on a Web site, I would suggest a look at NaturalFinder running against DARPA content.
Bitext’s capabilities and product line up has grown organically and in response to client demand. Bitext positions its natural language and search technology as adding functionality to an existing system.
Many vendors advocate a “rip and replace” approach. Bitext does not; it can add natural language functionality or support for structured and unstructured information to your existing search solution. Bitext’s architecture allows the firm’s technology to integrate with almost any enterprise application.
Coupled with the company’s strong support for non-English content, we think that Bitext warrants a close look.
Stephen E Arnold, March 20, 2013