Bitext

An Interview with Antonio Valderrabanos

Artificial intelligence and text analytics have become the motivating force behind making sense of human utterances. Artificial intelligence knits together a number of techniques and technologies. The result is software which can, often without human intervention, learn and take action automatically. Examples range from human-machine interfaces to personalizing information access in a newsfeed, from recommending a book or video to allowing Google Home to answer a spoken question.

Text analytics refers to the application of mathematical procedures to content, whether written or spoken. The data generated provide important signals about the importance of hard-to-discern information about people, places, events, and products. The interaction of smart software with text analytics clears hurdles that have for decades made it difficult to identify an important item of information regardless of language and use that information to make modern systems more fluid, responsive, and intelligent. One of the world leaders in text analytics has its headquarters in Madrid, Spain. The company’s technology enables some of the world’s largest companies to deliver products and breakthroughs which break down information access barriers.

The reason for Bitext’s surge to success is the shift from traditional desktop computers to mobile technology. Today’s mobile-first environment forces organizations to provide useful, responsive solutions accessed through flexible and user-friendly interfaces like always-improving keyboards and voice interfaces. Some devices provide basic services like writing an email; others make it possible for a person to dictate an email in one language and have it rendered in a second language to interacting with a robot using voice commands.

The success of the Amazon Echo and Dot has allowed an eCommerce company to create a new product category at a time when other capable technology companies continue to produce me-too products. Also a new generation of consumers expect their devices to respond to voice instructions, regardless of the human’s native language. Appliances, automobiles, and home entertainment systems need voice interfaces and deeper word knowledge. The era of button-pushing is giving way to a more intuitive, more natural way of turning on a coffee maker or getting information to make decisions.

Bitext is a leading technology company specializing in high-accuracy natural language applications that perform according to the user’s expectations. The company has created multilingual text analysis technology in more than 30 languages. The company takes a linguistic approach to text analysis, leveraging the knowledge of computational linguists for accurate and fine-grained text analysis. Bitext has developed a proprietary platform for multilingual analysis, ranging from tokenization, lemmatization, and part-of-speech tagging and parsing, to lexical analysis, including syntactic and semantic analysis, which is unlike competitors’ systems.

I interviewed the entrepreneur / technologist who founded Bitext in 2008. You can read that interview at this link. Flash forward almost a decade, and Bitext’s clients are some of the most important and recognized technology companies in the world. Bitext has non-disclosure agreements with these clients that prevent the company from stating the clients’ names and describing world-changing products in development. Nevertheless, Bitext is one of a small number of companies that can allow a consumer to speak in one language and have a service or product understand what the user wants.

I had heard about Bitext’s technology adoption by one of the world’s largest information and content processing companies. In order to learn as much as I could about this significant business deal, I spoke with Sr. Valderrábanos in his office in Madrid, Spain. Bitext was founded in 2008 to provide OEM multilingual linguistic technology for text analytics, search and other business areas. In the academic field, Sr. Valderrábanos has a Ph.D. in Computational Linguistics in the area of parsing. The company also has a US office located in Silicon Valley, a short drive to some of his firm’s major customers.

A decade ago, Bitext had business relationships with specialist firms like dtSearch, a developer of search systems for Microsoft Windows. Today, Bitext has expanded the reach of his company’s natural language and “linguistic” innovations to cover artificial intelligence, knowledge graph databases, and chatbots.

The full text of my conversation with Sr. Valderrábanos appears below:

What is the problem your company and its technology have solved?

Our view is that understanding text or human utterances is the next big challenge for the many industries and for some very important global companies like Amazon and Alexa, Google and its Home and Now projects, and Microsoft and Cortana. Understanding natural language and the information it contains reaches from software devices or bots to worldwide services like customer analysis. Think of increasingly capable mobile phone assistants like Siri or Samsung Viv. Imagine what innovators will create when it is possible to create next-generation applications such as Telegram, WhatsApp, or Facebook Messenger with cross-language capabilities. In today’s world, technology like Bitext’s can add new dimensions to the analysis of multi-lingual information for natural security and fraud detection software.

Is there an end point for language understanding?

On one hand, it may seem that once software can understand human language, not much work is left to do. On the other hand, Bitext thinks that linguistic applications will form a new type of operating system. If we are correct in our thought that language understanding creates a new type of platform, it follows that innovators will build more new things on this foundation. That means that there is no endpoint, just more opportunities to realize new products and services.

When you look across the landscape of language understanding technology, what shortcomings do you see in the available technologies, systems, and products?

It is true that there are many companies and research groups working very hard to allow software to make sense of human language in text or spoken form. I think that for machine learning and artificial intelligence one big problem is the need to assemble hand-annotated datasets. Subject matter experts have to assemble information that can train systems, provide entities and variations in names, and ensure that new content with new twists on language are available to train and retrain systems.

What happens if the systems are not properly trained?

That’s a good question, which many people do not consider. Over time, the accuracy of the system degrades. This is because the training data does not change and the content the system is processing does. The “drift” creates answers, which are not related to the user’s information need. And the key to effective training is high quality data. We enrich text with both language knowledge (phrases and dependencies coming from our parser) and world knowledge, what people call knowledge graphs or event databases. This combination enhances the process of feature generation and feature extraction in today's AI applications.

What’s the other problem?

Okay, the other problem I see is that today’s language-centric systems usually require a long time to deploy. What happens is that today’s understanding problem takes months to implement. By the time the system is online and ready to use, the problem may have changed or the market moved forward.

So both problems are related to synchronization. In one case, it is the time required to make a system work for the customer’s information. In the other case, the time required to deploy and become operational is too long so the solution is not matched to the market need. Is that close to what you see?

Yes, exactly. Timing translates to costs, missed opportunities, and the types of problems that many people experience when trying to extract customer insights with a high degree of accuracy.

Does Bitext suffer from these timing problems?

No, not very often. We have worked very hard to reduce the time required to set up and deploy our technology. We have, I believe, two competitive advantages. The first is our proprietary technology. The second is that the time to move from the client’s knowing he or she has a problem to putting a high-value solution in front of users or customers is the speed with which our systems can be put into use. Our work for car manufacturers like Renault, Volkswagen, or Audi is a good example; deployment times are measured in weeks.

I have noticed that many search and retrieval vendors say they have software, which can allow a company to deploy a virtual assistant for customer service. I am not confident that keyword search does a very good job of helping a customer get an answer from a customer support help call or a Web site with a software agent behind the scenes. What's your view?

Search depends on understanding what the user wants. It is possible, of course, for a search vendor to create rules for certain types of customer questions. But what if the customer does not have a question the rules can answer? The result is that many of the customer support implementations are not very helpful and may cause the customer to complain on social media or go to another vendor. Search is one thing. Understanding what human utterances mean is another. My opinion is that virtual assistants have not consolidated as a way getting value out of expensive mobile devices, and besides some trivial tasks like sending a short text message or setting up an alarm they have not make their way in to drastically changing our lives. However, many large companies are investing in this direction, so I'm optimistic about the future of virtual assistants.

How significant are problems in text analysis?

Losing one customer because a voice-enabled customer support application angers the customer is not a big issue for some large companies. But text analysis can have some significant consequences. In my experience, many of the current systems are still unreliable and unpredictable. For example, speech recognition and chatbots sometimes work perfectly and sometimes get what you are saying ridiculously wrong. If a doctor is using voice to obtain information about a treatment for a child in the emergency room, no one wants that service to be either unpredictable or wrong. Furthermore, would you send an important business email that has been dictated to a speech recognition system without carefully checking it a couple times?

No, I would not.

There have been some very big scandals, which could have been prevented or curtailed with Bitext’s type of text analysis technology. Think about information intercepted by government intelligence services, which are not analyzed. The systems are not fast enough or generate so many errors, no one trusts the system outputs. At Enron, a compliance officer could have identified fraud and taken steps to intervene. The LIBOR price fixing could have been identified from the analysis of available information. Stephen, I ask you, “How many expensive scandals can we prevent with text robots? Giving confidence and safety to shareholders and taxpayers.”

With the shift to mobile devices, why is traditional interaction moving towards text analysis and bots?

We have been thinking about this for a while. As you know, attention spans are now different from what they were when I was in my university course work. Young and old today focus on multitasking. Many, many people expect everything to be immediate and fast.

Can you give me an example?

Of course. Many people do not edit messages before publishing something in social media or looking for something in a search engine. Keyboards slow down our current frenetic pace and can be a speed bump to efficient multitasking. Technologies like natural language and voice interfaces complement the way people work and relax today. Right now, digital keyboards are the main interaction channel with mobile devices, improving their performance makes a big difference for the end user.

I understand. How do you describe it to a Fortune 500 company or a government executive who wants to know what Bitext technology can do to solve problems related to analyzing large volumes of text?

I am not sure I have a one or two word answer that sums up our company. Google can say, “Online advertising.” Instant understanding, of course. My explanation is that Bitext has created an easy to scale and multilingual deep linguistic analysis or DLA platform. Our technology reduces costs and increases user satisfaction in voice applications or customer service applications. I see it as a major breakthrough in the state of the art.

What’s the next level of detail you offer to help the executive who needs a solution for his or her company’s mobile application?

I then explain that our type of linguistic analysis is in a better position to extract structure from text. Our technology for deep linguistic analysis is based on knowledge about language (dictionaries or lexical resources, grammars, ontologies, and knowledge graphs) and it can handle the structure of language at all levels (morphology, syntax, and semantics). By taking into account the structure of language, Bitext DLA understands complex phenomena like negation (I never liked it) and conditionality (I'd like it if it were cheaper) accurately, especially in complex cases where two sentences have a similar wording, but entirely different meanings (like “I don’t plan to buy this product” and “I'd plan on buying it if it were cheaper”). So our DLA is specifically designed to find the structure in (apparently) unstructured text. The result is that mobile applications incorporating Bitext reduce user-related frustration. Happier customers translates to more revenue and reduced customer loss.

Does Bitext integrate into an existing software system or does it replace what the licensee already has?

We fit into what the licensee has. We can be seen as an add-on that provides additional quality data for any AI engine.

Give me an example, please.

Okay, you know about the widely adopted natural language processing technology known as the Stanford University Parser, right?

Yes, it is pretty good and open source too.

Yes, so Bitext DLA provides parsing data for text enrichment for a wide range of languages, for informal and formal text and for different verticals to improve the accuracy of deep learning engines and reduce training times and data needs. Bitext works in this way with many other organizations’ systems. With Bitext, the language coverage of an existing system is easy to expand. We now have text analysis technology in more than 30 languages. If a licensee needs a particular language capability, we can easily add this to our language library.

Are there new products and services coming in 2017 and 2018?

Yes, we have a strong record of developing new services. We anticipate introducing a solution for training data bottlenecks; that is, increase the speed of deploying a Bitext-enabled solution even more. We want to provide a fraud detection solution, which can integrate easily with existing systems from such companies as BAE Systems, IBM, and others. Plus we want to tune our fraud detection solution, so it can increase the effectiveness of content analysis systems used by law enforcement and intelligence organizations worldwide. We also have some new components arriving soon, which will improve smartphone usability and customer support solutions. These components involve named entity resolution, event extraction, and extended world knowledge data or knowledge graph data.

What are the research and development projects your teams in Spain and the US are working on?

Stephen, please, appreciate that I cannot reveal details. But from a airplane perspective I can say that Bitext wants to enhance our system’s ability to watch what users do and then incorporate those cues into our system. We want to provide these inputs to our system and to the specialists who optimize our numerical recipes. Also, we have been talking about moving from our present system to a more comprehensive artificial intelligence system. Our idea is to take our existing innovations and pushing the boundaries of what one can do with today’s hardware and infrastructure. We know some companies in Boston and Silicon Valley have secured millions of dollars to work on this type of large-scale system. Stephen, do you know a company interested in funding us so we can invent a solution to global, real-time language analysis?

How big is the market for linguistic technology?

That’s a good question. We have estimated that the market for artificial intelligence applied to text is a US $420 billion market now. Statista estimates that the number of smartphone users worldwide for 2017 is of 2.32 billion. If only a third of those will adopt speech interfaces that means that we are close to 0.8 billion people market. The market gets larger when one considers fraud, customer service, insights generation, and other markets. What’s interesting is that none of these are new opportunities. The challenges have been around for decades.

But there are dozens upon dozens of companies in addition to IBM, Facebook, Samsung, Amazon, Microsoft, and Google working in NLP, and content analysis today. What is Bitext’s system and method?

That’s a question which I never fully answer. We have developed proprietary approaches that are built with the base of scientific study of language and not from an engineering problem-solving approach. I think I remember that you call these “numerical recipes.”

Yes, I prefer that term to algorithms.

Well, we have algorithms combined with high quality linguistic data, and we have a number of very capable people.

Do they have Ph.D.s like you?

Sure, we have some people with deep academic training, PhDs and PhD students. We also have experts with backgrounds in engineering and mathematics that allow us to shape our technology to make sense of human utterances in dozens upon dozens of languages. When it comes to enabling new types of interfaces, university training is important, but it may not be the most important thing. Bitext is a company composed of talented people who make numerical recipes that work. That’s our secret sauce as some say in Silicon Valley.

Where can a reader get more information about Bitext?

The best place is from our Web site at www.bitext.com. We have white papers, a blog, and explanatory materials.

Thank you.

ArnoldIT Comment

Bitext has emerged as one of the leaders in text analysis. It clients include government agencies, search, and content processing companies, and diversified high technology companies. The company has a proprietary solution to making sense of text. The technology integrates with existing systems, whether proprietary, open source, or commercial-of-the shelf systems. In the last two years, Bitext has emerged as a leader in linguistic methods for delivering next-generation voice interfaces and content understanding and analysis solutions. It is a company to watch.

Stephen E. Arnold, March 21, 2017

Search AIT

Bitext

An Interview with Antonio Valderrabanos