An Interview with Christopher Ahlberg
Recorded Future, a privately-held firm, has distinguished itself in several ways. First, the company received financial support from In-Q-Tel, the investment arm of the US intelligence community and from Google, a company known for its voracious interest in next-generation technology. Second, the company has ignited the blogosphere with its fact-filled and informative posts on the firm Web log. Topics have included scanning the horizon of automobile technology which included eye-popping visualizations of “big data”. Third, the company has been the subject of considerable discussion by analysts, competitors, and legal experts.
On April 2, 2011, I spoke with Christopher Ahlberg, the founder of Recorded Future. The company is competing in what for lack of a better term a market sector I call “predictive analytics.” Part business intelligence and part content processing, Recorded Future processes a wide range of inputs, analyzes entities and other identified elements, and uses the content and metadata to fuel numerical processes. The object of the computationally intensive operations is to provide insights about likely outcomes. The name of the company, Recorded Future, evokes both traditional content processing and the next-generation techniques of predictive analytics.
Like Palantir, a data analytics company which made headlines after landing $90 million in venture funding, Recorded Future uses easy-to-grasp, high-quality graphical outputs. The system can output tables and hot linked results lists. Recorded Future recognizes that users want to “see” information in a context as well as have the ability to dig into the underlying data or explore a particular set of outputs through time. But what makes Recorded Future different is that interest from both In-Q-Tel and Google makes Recorded Future like a company that nails a million dollar contract and wins the Fields Medal on the same day.
I was able to talk with Mr. Ahlberg at his home base of Boston, Massachusetts. The development group of Recorded Future is in Sweden. The full text of my interview with Mr. Ahlberg appears below.
Thanks for taking the time to talk with me. What’s the driver for Recorded Future?
The founders behind Recorded Future were part of the same team that built Spotfire.
That’s a TIBCO company now, right?
Yes. It sounds as if you are familiar with Spotfire, which is a tool for visualizing and analyzing large sets of structured data. We ended up selling that company after building it to US$60 million, which we thought was a healthy business. After a short break, my colleagues and I started to think about what to do next.
What really stood out for us was the incredible richness of the Internet as a data source. Google and others certainly has shown how powerful indexes can show you the path to interesting documents. But what if we could make "the Internet" available for analysis.
So we set out to organize unstructured information at very large scale by events and time.
Can you give me an example?
Of course, a query might return a link to a document that says something like "Hu Jintao will tomorrow land in Paris for talks with Sarkozy" or "Apple will next week hold a product launch event in San Francisco"). We wanted to take this information and make insights available through a stunning user experiences and application programming interfaces. Our idea was that an API would allow others to tap into the richness and potential of Internet content in a new way.
Hakia has made quite an impact with its SENSEnews service. Is your system applicable to the financial community’s interests as well?
Absolutely. Let me provide you with some color on what our system can do. In quantitative analysis (for example in finance) we can prove the predictive power of our data (stock returns, volatility). The method applies to other areas as well. For instance, we think about this as providing data through user experiences for end users to do analysis. The key differentiator for our system is that the query and results can be about the past or the future.
Applications range from law enforcement to financial analysis to health and medical challenges.
When did you become interested in text and content processing?
That’s a good question. I've always had a keen interest in data analysis and visualization. Even back in 1993 as part of my PhD I worked on what was called the FilmFinder which took large amounts of textually oriented data (what's now IMDB) and allowed you to explore that data in a visual manner.
Later, when we were working on Spotfire in finance and government we had lots of interest in visualizing and analyzing textual data. - Some of this work required our working with outputs that were generated by a range of text analysis tools.
It struck me that if we could turn textual information into temporal events (through clever linguistics) we could organize data for analysis. I saw that if we actually built the whole stack as a service for people, we could do this in a really attractive fashion and solve some significant and difficult information problems for people.
Recorded Future has been investing in search, content processing, and text analysis for a number of years. What's the R&D strategy going forward?
We started thinking about how to do this in early 2009 and have had engineering staff (by now a pretty good group) on this for about 18 months. We've deployed our product to some of the absolutely most sophisticated analytic organizations in both finance and government. Also, we gained a whole range of users around the world. The goals we've set out has been quite ambitious: To allow analysts work with the "Internet as a source" and to succeed in actual financial predictions.
We know our technology delivers results, so we knew we were on a very promising path that few had traveled successfully. Thus, support of In-Q-Tel and Google has been extremely positive. It is good to have highly regarded organizations validate one’s methods. Each organization gives us access to quite sophisticated people in the domains in which we work.
For the technical work, we do not have any magic. There is just hard work and our desire to deliver results to our users and customers. Some people think that In-Q-Tel and Google have magic. But there is no magic, just more work and lots of problem solving. Some people think there is magic, but there is none. We benefit from both organizations’ interest in and constructive criticism of our system.
Many vendors argue that mash ups are and data fusion are "the way information retrieval will work going forward? I am referring to structured and unstructured information. How does Recorded Future perceive this blend of search and user-accessible outputs?
Yes, I think it's very true that we live in a time when data can and should come together better than ever before. Now, that doesn't mean that that's easy.
Architecturally we try to address this by building our user interfaces on the very same API that we provide to customers and partners so that we force ourselves to have a very standard, transparent way of accessing our data. We document this publicly.
Our customers are very interested in mashing up data from different sources. I think you call this “data fusion” in your Inteltrax.com blog. We want to mash up data and applications, not just data.
Would you give me an example?
Of course, this might be integrating our timeline visualizations with geospatial applications. Alternatively we integrate our data on corporations through identifiers such as stock tickers with external equity pricing/returns data. We try to prepare for these scenarios. We've published open examples; for instance, http://www.predictivesignals.com/2010/12/seconds-away-from-news-analytics.html, with our data loaded in Google Spreadsheets, R, etc. The Recorded Future data will gain in value when our system becomes more pervasive.
Without divulging your firm's methods, will you characterize a typical use case for your firm's content processing, tagging, and search and retrieval capabilities?
We have two primary use cases. One is end users doing analytic research. This use case can be as simple as "I'm looking to buy a large block of Apple shares. Find me upcoming events, scheduled and unscheduled/speculative for Apple over next 12 months so that I can weigh these external catalysts in to my analysis".
The other major use case is integrating our data into quantitative analysis. For instance, an an analyst may for example have an equity or commodity pricing model and would like to weigh in events and time as a factor.
What are the benefits to a commercial organization or a government agency when working with your firm?
We realize that what we're doing is something totally new. As you know, there are plenty of tools for information/entity extraction, etc. But the way we focus on higher end concepts such as events and time we'd like to think is fairly unique. We've packaged all of this up into a hosted service that users can access with out even having to think about "entity extraction" or the like.
Building such a service certainly takes some fine tuning. We try to be very humble about that. And we have been most fortunate to build a solid group of customers who're very successful in using our tools. Quite encouraging!
How does an information retrieval engagement move through its life cycle?
Because we deliver a hosted service, we can listen to the customer and respond to each customer’s requirements. As a cloud or hosted service, we essentially manage the whole cycle. We add new sources on a continuous basis, new concepts that are extracted, new UI improvement, new API calls, etc.
Our commitment to innovation and system enhancements is a big part of the ethos of the company.
One challenge to those involved with squeezing useful elements from large volumes of content is the volume of content AND the rate of change in existing content objects. What does your firm provide to customers to help them deal with the problems of “big data”?
This is a great question. Of course, there is delay in information. When a barge with cobalt get stuck somewhere in Congo and the information eventually hits a trading floor in Chicago that information doesn't travel in milliseconds.
What we do is to tag information very, very carefully. For example, we add metatags that make explicit when we locate an item of data. We tag when that datum was published. We tag when we analyzed that datum. We also tag when we find it, when it was published, when we analyzed it, and what actual time point (past, present, future) to which the datum refers. The time precision is quite important. Time makes it possible for end users and modelers to deal with this important attribute.
At this stage in our technology’s capabilities, we're not trying to claim that we can beat someone like Reuters or Bloomberg at delivering a piece of news the fastest. But if you're interested in monitoring, for example, the co-incidence of an insider trade with a product recall we can probably beat most at that.
Another challenge, particularly in professional intelligence operations, is moving data from point A to point B; that is, information enters a system but it must be made available to an individual who needs that information or at least must know about the information. What does your firm offer licensees to address this issue of content "push", report generation, and personalization within a work flow?
Okay, good points. We've built and provided integrations to environments such as Google Spreadsheets, R, Spotfire, etc. We have loads and loads of ideas for how our data can hit productivity environments. I can’t reveal the details of what is coming from Recorded Future. I can ask you to take a close look at the enhancements to our cloud service that will become available in the near future?
Is that a “recorded future?”
Yes, the enhancements I referenced are a 0.999999 probability of becoming available. Our customers are quite vocal in their needs, and we are responding as you and I are talking today.
There has been a surge in interest in putting "everything" in a repository and then manipulating the indexes to the information in the repository. On the surface, this seems to be gaining traction because network resident information can "disappear" or become unavailable. What's your view of the repository versus non repository approach to content processing? What are the "hooks" between content processed by Recorded Future and a more traditional type of analytics system?
To be honest, at this stage we have really not worked at this at all. We are focused on Internet content and will be tackling other types of content in the near future.
No problem. I appreciate your focus. Let me ask about visualization. Visualization has been a great addition to briefings. On the other hand, visualization and other graphic eye candy can be a problem to those in stressful operational situations? What's your firm's approach to presenting "outputs" that end users can easily ingest?
I think the challenge is not so much whether visualization is good or not. I know that you and I can discuss at length ways to help an end user get the fact or insight needed. I think the key is that in some use cases users would like to explore/visualize/analyze very actively, whereas in other use cases users would like to just be alerted for interesting patterns or see a report that someone else has done. For example in Recorded Future anyone can share an analysis on Twitter or Facebook. That's the level of ease we want to provide for sharing.
Thank you for providing such interesting insights into Recorded Future’s capabilities. However, I am on the fence about the merging of retrieval within other applications. What's your take on the "new" method which some people describe as "search enabled applications"?
Our plan is not really at all to compete with someone like Autonomy, Endeca, or Exalead. We want to build great service which indexes "the Internet" and makes it available for analysis. We believe that will be very valuable for people in finance, government, marketing, sales, etc. Think about it perhaps as the next generation of business intelligence.
There seems to be a popular perception that the world will be doing computing via iPad devices and mobile phones. My concern is that serious computing infrastructures are needed and that users are "cut off" from access to more robust systems? How does Recorded Future Recorded Future see the computing world over the next 12 to 18 months?
You are right. There certainly is a massive thrust of data and applications moving into the cloud. We'd like to ride that wave. Business intelligence and search has been behind there. But it's happening now and we'll ride that wave. But I think you're right to be concerned that this might, in fact, cut off users from internal data and applications. We have some interesting strategies in mind to address this.
Put on your wizard hat. What are the three most significant technologies that you see affecting your search business?
The three trends affecting what we're doing are, first, the shift from on premises systems to the cloud. Data and tools are moving into hosted data centers aooooooooooond the pace is, in my opinion, accelerating.
Second, I am interested in the rapid uptake of scalable database technologies. These systems allow us to index very large amounts of data. Recorded Future’s innovations thrive on large volumes of data.
Third, I think HTML5 is important. That technology allows us to build very compelling, Web-based user experiences.
Where does a reader get more information about Recorded Future ?
I think I would suggest that anyone wanting more information visit http://www.recordedfuture.com. I also want to invite people to read our blogs such as http://blog.recordedfuture.com/ and http://www.analysisintelligence.com/. You can also email us at email@example.com.
Recorded Future is going to be a disruptive company. The firm has a solid base of customers among governmental entities in the US and in Europe. With the support of Google, Recorded Future is going to find that interest among Google’s enterprise customers is a certainty. Like other companies offering next-generation technology, Recorded Future will have to continue to innovate and ward off competitive thrusts from giants like IBM, Oracle, and SAP. In addition, established players in analytics like i2 Ltd and the upstart Palantir will challenge Recorded Future in certain markets. Nevertheless, our view is that the future of Recorded Future is bright. This is a company to watch.
Stephen E. Arnold, April 5, 2011