An Interview with Raul Valdes-Perez and Jerome Pesenti
Vivisimo established itself as a leader in federated search; that is, taking a query, passing it to multiple systems, and returning a single result list. But federation poses a special problem--duplicate results. Vivisimo's approach virtually eliminated duplicate results while introducing another useful feature -- on-the-fly clustering. The company's innovations won it technical accolades and sparked the growth of this Carnegie-Mellon University spin out. I spoke with the two founders of Vivisimo -- Raul Valdes-Perez (born in Cuba but an official Chicago, Illinois, native since the age of five) and Jerome Pesenti (a French educated computer scientist who finds the bone-chilling Pittsburgh, Pennsylvania, weather "interesting".
I spoke with the two founders of Vivisimo – Raul Valdes-Perez (born in Cuba but an official Chicago, Illinois, native since the age of five) and Jerome Pesenti (a French-educated computer scientist who finds the bone-chilling Pittsburgh, Pennsylvania, weather "interesting". I became more familiar with the company when my son worked for Vivisimo for two years. I spoke with both Raul and Jerome in their Pittsburgh headquarters. There was a buzz of excitement in the air. Vivisimo had just landed more than $4 million in venture funding. Like Endeca, an infusion of capital gives a search and content processing company an opportunity for expanding its services, scope, and impact in a market hungry for solutions that work.
For clarity, I have merged each of these technologists' comments into a single answer to my questions.
Last week, you announced that the company had closed on a $4M investment from North Atlantic Capital. To my knowledge, this is the first time I’ve heard of Vivisimo using venture capital. Can you tell me why, after eight years, you are raising capital and what Vivisimo plans to do with it?
This is Vivisimo's first tapping of external capital for some five years, our last investment was a very small sum provided by Innovation Works, a Commonwealth of Pennsylvania-funded organization. We’ve been growing nicely as a profitable company, so the purpose of working with North Atlantic is to help fund our expansion in sales, marketing, and support. We feel we have the best product, so we need to extend our reach: get the word out, generate more leads, and create and support more sales channels. Jeff Hornung is now part of our team as Vivisimo's vice president of (see our recent hire of a VP of Strategic Alliances. The investment will let us expand while the market is in a high-growth phase.
How did Cuba and France come together to create Vivisimo?
A personal referral. I was in Paris, doing a thesis defense at the University of Paris. I asked the thesis chairman -- the insightful Prof. Ganascia -- to recommend top students to visit at Carnegie Mellon University. I was teaching computer science and engaged in a number of research projects.
To make a long story short, Prof. Ganascia suggested Jerome [Pesenti], who was looking for a 16-month stay at a foreign university. In France a student can discharge his military obligation by studying in another country in order to fulfill his military obligation.
I did not know that.
Yes, that is right. Jerome was an exceptional intellect, and he had a rich, multi-disciplinary background. That was a real fit for my research interests. In 1998, we invited him to Carnegie-Mellon University, and he arrived for the fall 1998 semester. He did tell usthat the weather in Pittsburgh was slightly different from that in Paris.
Carnegie-Mellon I launched him into a research project with "fail-fast" instructions: if we didn't see a way to achieve a 10x improvement fairly quickly, drop it and move on to project number two. Luckily, we didn't see a path to the 10x goal in that first research effort. So the next research question involved how to enable a user to make better sense of a mass of search results, with less effort, than with the status quo of listing out the results one-by-one?
Ultimately, that project and our work gave rise to Vivisimo's founding technology.
By the way, I [Dr. Raul appreciate your reference to my Cuban birthplace. But I actually grew up in Chicago after my parents went into exile just shy of my fifth birthday. I think Vivisimo is Cuba-Windy City-French, a global mixture like our customer base.
Vivisimo started out with a great federating and deduplicating function. Was that the core of the Vivisimo technology?
Vivisimo's founding technology was a novel algorithm for text clustering that solved one key problem for users: grouping search results into labeled categories such that the categories were natural looking, concise, accurate, and distinctive. We describe this as presentation that would not repeat the same material under a slightly different name. The jargon for this is called federation and de-duplication, which is not technically very accurate from our point of view.
Federated search and deduping were developed in parallel, originally as a way to demonstrate the power of clustering on Web search results from multiple search engines. We were among the first researchers to crack these problems in a way that is intuitively useful to a user.
How did you make the decision to jump from federation to full scale "behind the firewall" search?
Vivisimo started selling our clustering / federated search engine product. Over the course of two or three years, we had the opportunity to interact with most of the search solutions in the commercial channel.
We were a small company, and we realized that better known and much larger search system vendors had some functional gaps. In fact, our clustering / federated search engine was pushing in new directions. Our customers told us, "There's a need for a better search system built on your technology."
We learned that there was a big need for a better search solution! It also allowed us to understand and collect customer requirements and identify where existing solutions were failing. In the end, we designed a search solution from scratch to better address the market's needs.
Since Vivisimo is a Carnegie Mellon spin-off, we already had on staff the necessary expertise. Many of our employees were CMU alumni with strong backgrounds in information retrieval, making it easier for us to develop rapidly a full-featured search platform incorporating de-duplication, federation, and other Vivisimo innovations such as scaling, high-performance, and easy customization.
What is the unique benefit of Vivisimo in enterprise installations? Obviously you can deduplicate results and you can federate. But what else is distinctive?
The Velocity platform has two major differentiators. One is its great flexibility and adaptability for complex information technology environments and the second is the user experience, the person doing the search.
The first point – flexibility and adaptability - is critical for the success of ambitious search deployments. A search engine needs to interact with a very large number of information and enterprise systems, has tremendous scaling requirements, and may need to adapt its workflow or relevance to particular data structures or user needs. Unlike many of our competitors, we pay great attention to the configurability and deployability of our system. Velocity (now the official name of our system) can be administered from a Web browser. We engineered Velocity to be one of the most flexible and scalable platforms. You can perform point-and-click configuration through a Web browser for things that often require API programming on our competitors' systems.
What are some examples?
Okay, instead of opening a programming editor, let's say we have to connect to a large number of IT systems; for instance, document repositories from content management systems, interact with several security frameworks, and take advantage of single sign on functionality. Our graphical interface eliminates the need to write or change scripts to make use of these features.
Another administrative function is supporting a distributed or a redundant architecture to ensure 24x7 uptime. You can perform the needed configurations from the browser-based administrative interface. Busy system administrators really appreciate this feature.
You can also perform very granular customizations to control how context is extracted and processed. No direct interaction with XML and other files is needed. Not only does out approach save time, it sharply reduces the chance for an error.
How do you implement your indexes?
Velocity also supports a unique index structure. Our technique is based on text flow rather than vector representation. Velocity's approach enables a very rich syntax, extensive query customization and personalization.
What do you mean by personalization?
Velocity can adapt to complex data structures like hierarchical attachments, versions, dereferenced meta-data on an individualized basis. The administrator can control this individualization, of course. We also support such unique features as field level security, field level updates and early binding without latency. Velocity delivers control and security without compromising performance.
How does this benefit me as a Velocity user?
This is our second point –- the focus on the experience of the end-user.
Why is this?
When a system is deployed inside an enterprise it does not have "competition" - unlike Web sites on the Web. People are used to a simple and successful search experience on the Web. This definitely raises the bar inside the firewall for search vendors. Unlike the Web, where link analysis allows for very good relevance, a good search experience behind the firewall requires advanced, yet easy to use dynamic navigational tools like clustering, faceting, spotlighting & personalized ranking all of which is default functionality in the Velocity platform.
Did you anticipate these issues when you formed Vivisimo?
Interestingly, these two differentiators are directly inherited from Vivisimo's history. The flexibility of the Velocity platform has much to do with our "federation" root. By this I mean our distributed architecture, Velocity's powerful and extensible transformation layer, and the use of XML at every step, including configuration, query workflow, data transformation.
Our emphasis on powerful no-nonsense user experience comes from our clustering root. This means that clustering as a text mining technology that can be widely used by non-expert without extensive customizations.
I like your USA.gov service. How did you displace AT&T and Fast Search & Transfer for this plum engagement?
After learning, through dialogue with the USA.gov (previously known as FirstGov.gov) staff, what their challenges and goals were, we developed a vision of what the service could look like and implemented it at gov.clusty.com, which they liked very much and, we believe, greatly influenced their choice.
Some of their challenges were the high costs, frozen implementation, poor ranking, and consequently the sub par ratings they were getting for the service. Less than a year after the makeover, USA.gov was awarded the Pioneer Award from Federal Computer Week, plus received high ratings from academic evaluators of e-government services.
In your work with Microsoft, what have you learned about indexing content residing on US government servers? Is it substantive, or is there a lot of "soft" information and information that can't be indexed because of latency problems with government servers?
We have learned that there is a wealth of very interesting content just waiting to be leveraged and made properly accessible/searchable to citizens and researchers worldwide.
There isn't any major issue with the data itself or the way it's served. It only requires that it be properly leveraged by the search platform. By this I mean that the search platform be flexible enough to adapt to the content (which is not the case of a generalist web search engine).
We are constantly working with governments (we also provide search to the governments of New Zealand and Israel) to surface this content and make more usable for all citizens.
The folder concept struck me as one of Vivisimo's most useful features. I know that Northern Light tried to implement folders, but ran into performance issues. Your system always worked with almost zero latency. What is your approach?
My recollection is that the Northern Light Web search engine relied on a pre-built taxonomy.
Velocity clustering builds the folders on the fly, without any pre-processing, achieving the same end-user benefit, but without the costs and headaches of taxonomy building.
Except in rigid examples like eCommerce search, I actually think that taxonomies don't help end users much. I don't hear much about taxonomies anymore, but when they were more in fashion, I used the example of searching for Three Rivers Stadium, Pittsburgh's old combined baseball/football stadium, which was imploded and replaced with two spanking new sports fields.
Search results reporting on the implosion would turn up in a Construction Industry folder in Northern Light, whereas clustering would form a crisp, informative folder called Implosion, which would not be found in anybody's world taxonomy. The same query on our system put the "implosion" folder with pertinent results in our clustering display.
There's a huge push for mobile search. What's your view of this type of search?
Six months ago, Raul told me, "You are one of those guys who won't carry around a mobile phone." I didn't carry any mobile device. That makes me pretty unusual for a computer scientist, I guess.
But the few times in the past I tried to use PDAs, but they always ended back in their box after a few days of fiddling with them. Today, I am now Vivisimo's smartphone addict because I now keep in touch even while vacationing in a tropical paradise far from Pittsburgh's winter.
What made a difference this time?
Three big improvements in technology convinced me to change from a mobile Luddite to a power user.
First, I now have a mobile multi-purpose device – in fact, my Motorola Q is even lighter and slimmer than my previous voice-only phone. Second was a faster, reliable connection and third was a usable web browser. Very quickly I realized that I could use my smartphone for 90 percent of the tasks previously done on my laptop, while carrying less than vie percent of the weight.
While becoming a heavy smartphone user, I also quickly became a frustrated user.
With a smartphone, you are so close, but still so far from full office productivity. For example, finding an old email is either impossible (if older than a few weeks) or requires scrolling down a huge list. Reading Microsoft Office documents attachments is a zooming-and-panning thumb workout. Accessing our hosted CRM solution required downloading additional software with no improved usability. And finally, accessing most office resources such as fileshares, knowledge bases and wikis was just impossible.
As low-power devices with great always-on connectivity, smartphones are the perfect thin client. However, many mobile applications are still designed using the old synchronization model designed for the poor connectivity that PDAs possessed. As a result, emails, contacts and attachments need to be downloaded to the device rather than dynamically fetched from a server on demand.
What's the fix?
The simplest, easiest and fastest solution to these problems is search. In addition to its main functionality of search, mobile search takes care of three important factors that make it an ideal mobile gateway: connectivity, security and presentation.
The search engine takes care of connecting to each repository, grabbing the data and the security framework, and converting the data to a text format (HTML/XML) that can be viewed on simple devices. Now, instead of having to build a mobile application on top of each repository, we think you can just leverage the search infrastructure to make all these repositories securely accessible through the search application.
This sounds too good to be easy, doesn't it?
Well, there is a catch. Search only provides a universal “read” gateway; it does not reproduce all the functionality of the underlying applications, just the “access” part. This limitation is a necessary trade-off as it limits the complexity of the integration, makes the search workflow very simple and allows search to be ubiquitous. On a smartphone, given the limited input capabilities, the ability to “write’ is often not critical.
Will you be entering the mobile search sector? If so, what will you be doing?
We made two announcements regarding mobile search in 2007 when we introduced a mobile version of Velocity that extends enterprise search to all mobile devices. Also, we have introduced a mobile search to our consumer demonstration site, Clusty.
The interest in search continues to ratchet upward. What do you see as the major trends in search for the enterprise over the next 12 months?
I think there are several trends to watch.
First is what I call "universal Search". This is the ability to search from a single point all important information for all employees in a large organization - seamlessly handling large number of repositories, many different security framework and adapting to the different user profiles.
Second is collaboration. This means giving users the means to find information via search and give users the ability to share easily knowledge with each other and collaborate directly in the search interface. Search is the only application that has the ability to integrate and present content from all applications through a single interface, so it is a natural collaboration point.
Finally, personalization. This means giving the user more say and more control about the functioning of the search (relevance, data sources, meta-data, alerts and feeds, and dashboards, among other features). This is a continuation of our enterprise 2.0 initiative, announced in late 2007.
There's quite a bit of confusion about "behind the firewall" search in the wake of the Autonomy Verity buy out, the failure of Entopia, the Microsoft Fast acquisition. What's your take on the state of the market?
The key market implication is that, contrary to earlier claims by Microsoft, simpler enterprise search functionality like SharePoint's aren't good enough for the complex problems of the enterprise, especially cross-repository search that needs to harmonize disparate security models.
As for Vivisimo: we already competed with Fast. We don't think Microsoft is going to make Fast's product any better, at least not for a while. It may get them more customers from people who are very Microsoft-centric in their software purchases, but those weren't great sales prospects for us anyway.
In this storm of market turbulence, I think enterprise search customers will react by seeking simple predictability: software that works well and deploys quickly and reliably, with implementation expertise available from its designer and vendor.
I know you don't want to let out any secrets, but what's the next big thing from Vivisimo?
The next big things for Vivisimo will be along the lines of the trends that we highlighted above. Also, expect to see us announce wins in some of the largest corporations, demonstrating that universal search is now a reality in the most challenging environments where there are more than 100,000 users, dozens of repositories, and multiple security frameworks.
In Beyond Search, Vivisimo is one of a handful of companies I pegged as "up and coming vendors to watch". The company provides solutions that users embrace. The interface innovations rest upon a solid core of engineering. Unlike many search systems, Vivisimo delivers performance, scalability, and stability. These virtues complement the sophisticated on-the-fly text processing functions that make content access and exploration easy, fast, and enjoyable.
Stephen E. Arnold, March 24, 2008