An Interview with Oleg Rogynskyy
Oleg Rogynskyy's goal is to democratize text analytics, and turn it into the coolest thing on the block. He is the founder and CEO of Semantria, a text and sentiment analysis startup that's disrupting the market. The company's goal is to provide text analytics to anyone, regardless of technical skill, in less than 3 minutes and for under $1000. When he's not leading the troops at Semantria, Oleg is actively following world politics, reading the latest in sci-fi, listening to house music, and travelling to countries that don't speak English.
ArnoldIT’s Ric Manning spoke with him in October 2013. The full text of the interview appears below.
What's the history of Semantria?
I founded Semantria in 2011. I've been in the industry for many years, first working for Nstein, then I worked with Lexalytics. While working with Lexalytics, I realized that text analytics technology is mature enough that it could be very useful for your average Joe.
I felt like there were thousands of untapped use-cases, so I asked myself one important question: how can I make text analytics and sentiment analysis available to a nontechnical user, in under three minutes, and for less than $1000? And that's how Semantria was built.
When did you become interested in text and content processing?
Let me think. I believe I first encountered this technology in 2007, when I met with the Nstein guys who were doing cool editorial, user-generated, and social media content processing for publishers. The buzzword was text mining.
I was pulled in because text analytics is the cornerstone for any artificial intelligence application. Context is the most important piece of artificial intelligence, as well as the most complex one, and that's what text analytics does: it understands context. I mean, technology like voice-to-text tells you what someone's saying, other technology can do logical decision-making, but when it comes to understanding context, all those things are secondary.
What information challenges and problems does the company seek to resolve for its clients?
That’s a very good question. At Semantria, we want to simplify and democratize access to text analytics technology. We want people to be able to get up and running in no time, with a small budget, and actually derive value from our technology. The classic story is you buy a system worth $100k and don't deploy it.
I have heard that Nstein technology was often licensed but not put into production? Is that true?
You will have to ask the Nstein and OpenText guys about that. I don’t have any comment on other company’s activities.
And at Semantria?
We wanted to build something that works right away, and can be put on your credit card, because we know that many organizations interested in our technology may not have hundreds of thousands of dollars to spend. Do you have an American Express Black card?
No. What problems does Semantria address?
So we make it simple for our clients to solve the following problems:
First, some organizations have too much text to read. For example, a Twitter stream or surveys with many responses.
Also, there is the need to move quickly and reduce the time to get to market. Many survey results come with an expiry date before they're irrelevant.
Then there is reporting the information. Anyone can use their Excel smarts to build simple/interesting reports and visuals out of unstructured data. But that can take some time, and Semantria accelerates this step.
Finally, users need to analyze text with the same impartiality each time. A human might see a glass as half full or half empty, but Semantria will always see a glass with water.
Semantria uses the Salience engine from Lexalytics. Why did you choose that technology and how is it deployed?
I've had a very long and successful relationship with Jeff Catlin and Seth Redmore at Lexalytics, and their technology is, in my opinion, the best of breed. There's a reason why the largest tech firms out there rely on Lexalytics' software. I mean, every day, your average executive relies on data generated by Lexalytics somewhere in their business intelligence or analytics system. It's heavily configurable, proven, and has the best reputation on the market. Plus, the people who make it are quite savvy.
What have you done with Lexalytics’ software?
We took the Lexalytics software, which is an enterprise-level, text-processing engine, and wrapped it into a series of cloud services that expose its powerful engine, in a simple fashion, through the simplest REST API.
We sponsor lots of hackathons, and I've seen people integrate Semantria and start processing text in less than 15 minutes. I've also seen random people who've just learned about Semantria online, get setup in less than 20 minutes without asking us a single question. It'll probably take you more time to set up your microwave.
Does Semantria have specific target markets for its products?
The majority of our customers are from English speaking countries such as the US, Canada, UK, and Australia. Since we've released support for five languages, and as of September 2013 Chinese (Mandarin), we've seen a skyrocketing demand for these languages, in particular Portuguese and German. Given this, we're rapidly expanding our presence in new markets such as Brazil, Germany, Mexico, France, and, of course, China.
Is your positioning focused on small and mid-sized organizations or on larger firms?
We want to serve organizations of any size.
For instance, our software is being used by some of the largest consumer software companies out there, because we are the only ones who can tell the difference between "dirty windows" and the software Windows.
On the other end of the spectrum, our Excel add-In has proven to be an indispensable tool for numerous academic researchers, consultants, and similar fields. Examples of our users include Accenture, IBM PriceWaterhouseCoopers, many system integrators in India, a marketing director at a clothing manufacturing company, the chief scientist at the largest manufacturer of scented candles, and more. The list of customers is quite varied.
Does your system replace existing software?
That’s an interesting question. Yes, lately we've also been replacing lots of IBM SPSS Text Analytics seats. I think this is giving us new visibility within IBM.
On the enterprise side of things, I think there are some rumors about why one of our competitors just went back to its funding sources to raise $80 million. I surmise that the company is subsidizing the discounts needed to keep some customers from switching away to services like ours or to those of our quite nimble partners like NewBrandAnalytics or Empathica.
What areas of research has your team been exploring?
I don’t want to go into too much detail. I can say that we have a two-pronged approach.
The first is Deep Learning. Some firms in this sector like AlchemyAPI claim deep learning is something new that nobody has ever done, but this is untrue. Lexalytics has been doing it for years, quietly, just like Google, Facebook, and others, and is at the forefront of Deep Learning research and application. Lexalytics has its Concept Matrix. I find this method quite forward looking. The best part is, we're running Lexalytics' Salience under the hood of Semantria.
What’s the impact on your customers?
It means our Deep Learning technology is backed by years of research and hundreds of happy customers. We keep refining our implementation of this industry-leading approach.
And the second area of research?
The second part of our research involves crowdsourcing. Imagine we have hundreds of Semantria instances in the cloud, servicing hundreds of customers in multiple verticals.
Because our service is customizable, everyone's text processing experience can be unique and tailored to them. We have thousands of heavily qualified executives or researchers teaching our cloud how to better process text, every minute of every day, on top of Lexalytics’ extensive Deep Learning efforts. And we do learn, our accuracy is growing exponentially every month.
What is the competitor response to your approach?
Some of our competitors rely exclusively on machine learning and neural networks to train their capabilities. The problem with letting a machine learn on its own is that machine learning can easily get out of control, and learn something incorrect which will give you bogus results.
Even worse, it's often impossible to go back, find out what caused the problem, and unlearn it. In our case, the machines not only learn from patterns, but also from qualified and well-trained people.
So we rely on deep learning, and use humans to make sure our machines learn correctly. Based on what our customers say, most competitors cannot beat our approach.
Without divulging your firm's methods or clients, can you characterize a typical use case for your firm's text and sentiment analysis capabilities?
I can highlight one or two examples to show how we work with our customers.
Recently a customer experience management company started pulling in negative mentions of a client's brand from social media. With our advice, the firm was able to use the mentions to create tickets in real-time for customer support to resolve. This led to near real-time support for their customers, fewer product returns, and much more love on social media.
We have some standard use cases from companies that can use us for social media monitoring (analyzing tweets, comments, etc), customer experience management (analyzing customer feedback, service tickets, reviews, etc), and market research (open-ended survey responses). We make these available and if customers have a question, we assist them.
What are the benefits to a commercial organization when working with your firm?
This is one of my favorite topics. We include unlimited training and support with every subscription package, because we like to make sure our customers are taken care of every step of the way. We're always on the cutting edge of technology because of our affiliation with Lexalytics, which gives us access to their latest technology, and we're SaaS (in the cloud), which means we can deploy it overnight.
What about the platform’s advantages?
We offer a sophisticated customer crowdsourcing platform, as mentioned before. Since we have hundreds of customers working with the same cloud, our machine learns from them every day. Things like what they change in the configuration, how they analyze data, which dictionaries they use, and so on, are picked up by our machines. We then use this data to determine what our users want, and improve the service quality for all our customers.
One challenge to those involved with squeezing useful elements from large volumes of content is the volume of content and the rate of change in existing content objects. What does your firm provide to customers to help them deal with the volume problem?
Since everything in Semantria is heavily customizable by the user, the rate of change of existing content objects depends entirely on the user. Some customers change throughput flows daily, some change yearly.
Keep in mind that there are several global dictionaries and schemas we employ that affect every user.
Would you give me some specifics?
Sure, there are our sentiment bearing phrase dictionary, our Wikipedia based topic map, and our curated named entity dictionaries, as well as a parts of speech tagger and grammatical parser. We update these periodically.
What’s the cycle for updating these resources?
Our sentiment bearing phrase dictionaries are updated on a three-month cycle. The Wikipedia topic map, however, is regenerated every six months, because processing the entirety of Wikipedia is an extremely processor-intensive process.
Our entity extraction list is curated on a daily basis. We have a team of dedicated people scavenging the internet and updating our entity list based on thousands of different sources.
And what’s the delay between your updating and the availability of the new dictionaries to your customers?
Since we're SaaS, there's no latency for index updates: our system sits on many eight-core, Amazon Web Services instances.
Right now our cluster is based on roughly 100 machines on the Amazon elastic compute cloud and can process thousands upon thousands of documents per second. If we need more throughput, the system scales easily.
Another challenge, particularly in professional intelligence operations, is moving data from point A to point B; that is, information enters a system but it must be made available to an individual who needs that information or at least must know about the information. What does your firm offer licensees to address this issue of content "push", report generation, and personalization within a workflow?
We leave the integration solution to the customer, and we don't do any report generation. In terms of push, we offer a standard set of functionality. Our approach allows us to push processed data to customer systems in real time.
In terms of customization, and personalization, everything in Semantria is customizable through our REST API interface, as well as through our Semantria Excel Add-In, which means customers can tailor their experience to their needs in real time via our application programming interface or via Excel for the non-technical users.
Visualization has been a great addition to briefings. On the other hand, visualization and other graphic eye candy can be a problem to those in stressful operational situations? What's your view of visualization?
We don't present outputs, however we work with a fantastic company called The Nerdery, who are the masters of visualization. They integrate well with us and are great people to work with.
Also, many of our users process content with our Excel Add-In, and they use native Excel charting to create great-looking, easily accessible reports that are compatible with Excel, PowerPoint, and most of the Microsoft Office suite.
There seems to be a popular perception that the world will be doing computing via iPad devices and mobile phones. My concern is that serious computing infrastructures are needed and that users who embrace mobile platforms are cut off from access to more robust systems. What's your view of mobile access and how does your firm see the computing world over the next 12 to 18 months?
We are happily in the cloud, and in the cloud we trust. We have android and iOS software development kits in the works, so whoever wants to talk to our API from mobile devices will be doing it with ease very soon.
Put on your wizard hat. What are the three most significant technologies that you see affecting your text mining business?
I see faster evolution and commoditization of cloud resources, especially Amazon Web Services offerings. The cheaper and more powerful the cloud becomes, the cheaper, faster, and more precisely we can do text mining and sentiment analysis.
I think that parallel processing frameworks like Hadoop allow us to process more, faster, and in a more distributed fashion. At the same time they allow our customers to deal with more data, thus increasing the need for our analysis of that data.
No one can ignore mobile technology. The volume of information landing on your mobile is going to grow exponentially. At some point you're going to need someone or something to read your text messages, emails, tweets, and other messages for you. Text mining can consume all of the content, keeping context in mind, and deliver the right information when you need it, straight to your device.
Where does a reader get more information about your firm?
You can visit us at https://semantria.com, or try us out by pasting your favorite blog post in our web demo at https://semantria.com/demo. We also have a strong following on Quora. You can look us up on there for third party opinions. We also tap Twitter, Facebook, and Linkedin.
Semantria is one example of a next-generation content processing company breaking with tradition. In the past, text processing was expensive, complex, and required massive investments in infrastructure to deploy. The Semantria approach reduces the cost, the learning curve, and the walls that make it difficult for many organizations to tap text processing technology. If an organization is looking for a powerful, easy-to-deploy solution to text processing challenges, Semantria is worth a look.
Stephen E. Arnold, October 22, 2013