An Interview with Ric Upton
Digital Reasoning (www.digitalreasoning.com) has been on our radar for more than a year. The company is privately held, but we have heard that the firm's bookings continue to grow. The introduction of sophisticated analytics for entities has sparked what is now a new discipline – entity based analytics. We were able to speak with Digital Reasoning’s Dr. Ric Upton. Dr. Upton will be leading Digital Reasoning's Washington, DC area office and team. The full text of my interview with him appears below.
How did you get interested in content processing?
I began my career working for the US Intelligence Community where a vast amount of content is collected and analyzed. As the growth in the quantity of data that was being collected accelerated, it became imperative that we find ways to process and analyze the data more intelligently, rapidly and efficiently.
What attracted you to Digital Reasoning?
I was attracted to Digital Reasoning after having looked at a number of firms offering technologies and solutions in this field. This was both as a potential partner to those firms as well as a potential investor. I felt that Digital Reasoning had one of the most compelling approaches to dealing with content processing, integrating state of the art NLP, entity analytics, and context based reasoning with a highly scalable platform, available in both enterprise and cloud-based offerings.
When you think about Digital Reasoning's unique approach to entity oriented analytics, what distinguishes the firm's technology from other firms offering "intelligence" focused software?
Digital Reasoning’s approach is a complete solution. Our flagship product, Synthesys®, offers an integrated suite of intelligent mechanisms leveraging advanced NLP and related technologies to identify and extract entities from vast quantities of unstructured (and structured) data, sophisticated methods for dealing with co-referencing and context to make sure the entities extracted can be associated with a unique, real world person, place or thing, and highly effective mechanisms for identifying and understanding the relationships between entities over space and time. Most other solutions in this field are focused on a part of the problem; Digital Reasoning’s solution is the first to deal with the entire problem, eliminating the time, cost, and complexity of building an integrated solution at our customers’ sites.
Big data is the undoing of many search and content processing systems. How does Digital Reasoning handle high volume flows of real time information? Are there technical differentiators or is Digital Reasoning relying on cloud type scaling?
Digital Reasoning’s solution relies on our customers and partners to capture flows of real time information. That said, our processing and analytics often have to complement these high volume data flows. We do this in part through judicious use of cloud-based processing augmented by intelligent methods of processing and storing data as it becomes available so that we can avoid the need to perform batch processing or redundant processing of previously-captured data. Our architecture is also adaptable and allows key elements of our processing to be performed closer to the edge of the network, potentially reducing the amount of data, and the associated latency, that can often stymie systems trying to deal with high volume, real-time data flows.
There has been considerable confusion about the value of content centric analytics. Digital Reasoning focuses on people, places, events, numbers, and other entities. What's the value of the firm's approach?
Digital Reasoning’s approach to big data analytics differs from solutions that apply conventional statistical algorithms to structured data. Instead, we are focused on automating the understanding of the meaning buried within content that is not well-structured or completely unstructured.
The most rapidly increasing form of data today is unstructured data; data contained within email, SMS messages, tweets, blogs, wikis, social media in addition to more conventional newswires, cables and the like. The quantity of this data has grown to the point where it is not possible for human beings, even human beings augmented by sophisticated search and text handling tools to look at it, and analyze it, in a timely fashion. Our customers are looking to find and extract facts (relationships between people, places, events, etc) that are actionable, whether that means influencing a significant financial decision, uncovering an insider threat, better understanding the composition/ demographic of a particular social network, or a threat to our National security.
Most analysts fall into a routine when it comes to analyzing data. Tables of numbers are often ignored. Systems output flashy graphics, but some of the visuals are impenetrable to me. What does Digital Reasoning do to make its systems' outputs useful to an analyst or business professional?
Digital Reasoning often leverages the reporting and visualization capabilities embedded in the user interface tools the analysts within a particular organization are familiar with and trained in. That said, we are processing and analyzing data that can generate very complicated associations and involve millions or greater entities that creates challenges to assimilate. We are constantly investigating technologies that allow analysts to focus on and more quickly understand the data – and questions – they are interested in. This is a cognitive problem; one that can’t be dealt with using only visualization techniques or better chunking of the data.
What specific analytics problems does Digital Reasoning address? Can you give me a typical example of how Digital Reasoning allowed a client to make more informed decisions?
Our flagship product, Synthesys®, solves the problem of achieving actionable intelligence out of massive amounts of unstructured and structured text. Our customers have too much data to understand without the luxury of time or armies of people to read it all. But, they need to understand the real meaning of the data in context to act upon it as if it were all read and comprehended.
A typical customer might be trying to completely understand how to locate an individual within massive amounts of reports where the connections with people, places, organizations, contact info, and even names and aliases are not always clear. Sifting through all this data to accurately develop this profile even among misspellings, aliases, code names, etc. is typically something that can only be done by reading. Our ability to automate understanding is critical to customers with concerns about time, accuracy, completeness, or even the ability to leverage the massive amount of data they have generated.
Content exists in a bewildering number of formats, types, and sizes. What types of content can Digital Reasoning process and make actionable? What must a licensee do if an unknown type of content has to be made available to the Digital Reasoning system?
Digital Reasoning can deal with virtually any form of text content in existence today. If there is a format or type unique to a customer or that we haven’t seen before, API’s are available to ensure we can ingest it. The most critical issue goes beyond form or format however. By this I mean the way in which information is expressed in a particular form of content. The way in which information is expressed in a tweet or SMS message differs substantially from the way it might be expressed in blog or email. Our system has the ability to learn and adapt to the “language” – the way in which information is expressed – used in different forms of content. We can do this without developing an a priori model of that language, using a very efficient training mechanism. We can process and analyze information expressed in languages we have never seen before, whether the glyphs of tweets or Mandarin Chinese, in weeks. This allows our system to adapt rapidly to new or emerging forms of content, and multiples the number of problems to which it is applicable.
Over the years, content analysis has become a utility in some cases and just another portfolio tool in others. Where does Digital Reasoning's Synthesys product fit in an organization? Does it "play well" with other enterprise systems?
Since our platform touches the data at ingest in raw form, we become a integral element of the business process pipeline for processing data and making it available to analysts, before they’ve even seen it. We also provide a number of interactive analytics that then allow the analyst to explore and understand the data like never before. Synthesys is not just a tool, it’s a platform that can profoundly impact the entire way in which analysts perform their work.
What are the benefits to a licensee who uses Digital Reasoning's Synthesys system to extract meaning from big data? Don't other vendors assert the same argument? What will differentiate Digital Reasoning going forward as the competition heats up?
Many vendors are addressing the problem of entity extraction in one form or another. As I mentioned earlier, however, we provide entity extraction that is rich, extracting entities that are unique in the real world, and that can be grounded both in space and time. We don’t just extract a name, we can develop and create a persona – the sum of what a person is called, where they have been and when, their relationships with other persona, their behaviors over time, etc. This is a profound advance in the state of the art of entity extraction and a more sophisticated sense of extracting meaning. We can do similar things with events. The key is to produce facts that are meaningful and actionable to an analyst. The facts should provide an “audit trail” of evidence concerning how they were derived, and should not just be limited to what is assertions, but have some predictive consequence as well. No one else does this.
What are the steps a licensee must follow once the Digital Reasoning system has been licensed? How does the licensee get Digital Reasoning up an running? Are there partners available to assist a Digital Reasoning licensee?
A Synthesys licensed customer has a few basic steps to get up and running – similar to most large scale enterprise software solutions today. Synthesys runs in a Hadoop scalable environment on commercially available hardware. Software from any of our infrastructure partners – Cloudera, Datastax, MapR and IBM – are potential options. If needed, installation and training services can be provided by Digital Reasoning or one of our growing number of integration partners. More information on all Digital Reasoning technology and integration partners can be found on our website.
When you look forward six to 12 months, what are the major trends you are monitoring in entity extraction? Is any other firm pushing into the entity oriented analytics sector which Digital Reasoning dominates?
We are always monitoring better ways to process and understand language using hybrid syntactic/semantic methods and automated learning techniques. But we also recognize that entity extraction is not just limited to text. Entities can be extracted from many other forms of media, for example video or audio streams. An enormous amount of this data is being generated and collected today, and we are only now beginning to see effective ways of dealing with it that will allow it to be treated as an equal data partner with more textually oriented sources. We are also highly focused on architecture as it affects scalability, with a strong feeling that more and more processing and analytics will have to be deployed to the edge of our networks. This means that the term cloud processing will evolve in a new and exciting way to exploit resources that might not even exist today.
As for other emerging competitors, the number is definitely growing with the growth in the market but we still have distinct advantages by providing an “integrated circuit” for big data analytics by combining many key components into an integrated solution. This combined with our ability to achieve actionable intelligence without data preparation or ontologies gives us confidence that we can maintain our leadership position.
How does a reader get more information about Digital Reasoning?
For more about the company or our flagship product, Synthesys, visit the company website at www.digitalreasoning.com. Visitors will find information about the people, products and partners as well as download white papers or request a live demonstration.
Digital Reasoning Systems continues to make remarkable advances in tackling problems associated with real time processing and entity based analytics. The company's technology works with structured and unstructured data.
The value of Digital Reasoning is not only the ability to leverage an organization’s existing knowledge base, but also to reveal critical hidden information and relationships that may not have been apparent during manual or other automated analytic efforts. Synthesys™ is a registered trademark of Digital Reasoning Systems, Inc. We recommend that readers struggling with big data take a close look at Digital Reasoning.
Stephen E. Arnold, September 20, 2011