An Interview with Runar Buvik
Runar Buvik came to my attention with his Open Text Search (Beta) service. You can find the service at http://www.opentestsearch.com/. Enter a query and select the search engines you wish to compare and click search. The results will display the hit lists from a Microsoft search system, SearchBlox, the Google Mini (no longer available from Google but representative of the GB-7007 and GB-9009 approach), Thunderstone, Constellio, mnoGoSearch, and the now retired IBM OmniFind Yahoo service. When I first looked at the service, I noticed a search system called Searchdaimon. I met with Mr. Buvik in London in the summer and then followed up with him on September 20, 2012.
I learned that Searchdaimon is an enterprise search system which is proprietary but designed to support open source systems. The interesting aspect of Searchdaimon is that the system is positioned as an appliance which can be run within VMware, Xen, and VirtualBox as a “virtual appliance.” The system also runs from the Amazon cloud. For those looking for a turn-key solution, Searchdaimon is available as an appliance. The system can be snapped into Microsoft environments. An optional connection to Microsoft Active Directory is available.
The full text of my interview with him appears below:
Thanks for taking the time to talk with me.
Glad to do it. I read the Search Wizards Speak series. It has a great deal of useful information which is not otherwise easy to find in one place.
What's the history of Searchdaimon's system?
Searchdaimon was a spinoff from the information retrieval community at the Norwegian University of Science and Technology (NTNU).
My co-founder Magnus Galåen and I started to do research into Internet search around 1998. We were both studying information retrieval at NTNU. In 2002 we met investors with an interest in information retrieval, Stian Rustad and Espen Øxnes. The idea was that we would commercialize the search technology we developed.
In 2003, we made our first product, Boitho.com.
What was Boitho’s focus?
That product was an Internet search engine That opened many doors and we partnered with ExactSeek.com to make an Internet search engine for the US market. We took our experience with scaling and promising technologies like virtualization and began to work on an enterprise search version of our system. Boitho.com focused on the Norwegian eCommerce sector. The ExactSeek product contains content about more than 500 million Web sites.
When did you become interested in information retrieval?
I first got interested in information retrieval and search technology when I was working on a Web portal project in the late 1990s. I was quite young, and it seemed to me that the opportunity in making information findable was quite large.
When I started my studies at NTNU, I discovered a large, highly regarded information retrieval community. Some of the founders of the original Fast Search & Transfer company were involved in the program or were professors. I studied in Trondheim and as you may know, Fast Search first, then Google and Yahoo had set up research and development operations in Trondheim, which was becoming a magnet for advanced thinking about search and retrieval.
Where does your online comparison service fit in?
You know that in addition to working at Searchdaimon I also run a website called Open Test Search. I was looking at different systems. It seemed to be a good use of my time to put up side-by-side comparisons of enterprise search engines. I gathered together the systems I have tested and make it easy for anyone to see how the systems perform.
Currently I have a setup with nine search engines. Each has indexed the same datasets. Users can then run the same query in all of them to compare the result side by side.
I have also included a software feature. A person can download some scripts and code to setup their own comparison. I have heard that many companies looking for a search system use my comparison page to evaluate different search engines as part of a purchasing process.
What information challenges does the company seek to resolve for its clients?
Searchdaimon primary makes a ready to run virtual search appliance called Searchdaimon ES.
“ES” means “enterprise search, right”?
Yes. The basic version delivers a comprehensive enterprise search solution without charge. The free version supports 10 users. For more users, we offer a very competitively priced license. The system, whether for a few users or thousands, provide site search and search in internal corporate data.
What are the main features of the system?
The functions are comparable to the features and functions available from HP Autonomy, Endeca, Exalead, and other aggressively marketed systems. For example, Searchdaimon offers filtering, sorting, content federation, search suggestions, spell checking of user queries, stemming and lemmatization, a graphic interface for the administrative services, logs, statistics, and the other components of a modern enterprise information retrieval system. The ES is a fully featured enterprise search solution that can index different content types scattered across multiple servers and storage devices. The system offers full text search to end users.
Yes, we offer documentation.
Do I need to register to try the free version?
No, we want to make it easy for those looking for a more effective search system to try Searchdaimon. To get a copy just navigate to http://www.searchdaimon.com/download/. That’s it.
What about your customers’ use of the system?
I can’t mention our customers’ names. So most of our customers index the content on the organization’s Intranet, on file shares, in databases both SQL and NoSQL, email repositories, and content management systems. The user navigates to a single search screen, enters a query, and the results list presents a federated or unified set of relevant documents.
Can I use Searchdaimon ES within other applications?
Yes, because we are fundamentally committed to open source software, we provide the documentation and access methods needed to integrate Searchdaimon into other applications. It is also easy to build a search-based application on top of the Searchdaimon system.
Do I need to use on premises hardware for a search based application or a large scale enterprise search system?
The customer can use either on premises or a cloud-based approach. We designed the system to make it easy to deploy Searchdaimon ES in many ways. Most of our customers either run the system as a virtual machine on the customers VMware/ XEN/ VirtualBox servers or in a cloud.
Searchdaimon has been investing in search, content processing, and text analysis for several years. What are the general areas of your research activities?
Traditionally most of Searchdaimon’s focus has been on advanced information retrieval techniques and handling large datasets. We made our own search technology in-house in the C programming language.
Later as our technology became more mature we spent a great deal of time making our technology easy to get started with. Now you can have a basic search system up and running in minutes. A customer can just start Searchdaimon ES from the Amazon cloud and index some Web sites.
I think our main emphasis has been ease of use. We are now moving into new areas, but I don’t want to talk about those yet.
What can your system acquire and index?
Out-of-the-box one can index Web sites, RSS feeds, SharePoint, Microsoft Exchange, Twitter, Zendesk, SuperOffice, WordPress and most types of file shares and databases. It is also possible to push data to the Searchdaimon ES over http and to create your own data connectors using our framework directly from our Web-based administrator panel. In addition one can use connectors from EntropySoft.
Many vendors argue that mash ups are and data fusion are "the way information retrieval will work going forward." How do you see this blend of search and user-accessible outputs?
I am a strong believer in having good data summaries in the search results. The Searchdaimon ES will present structured data directly as a table among the search results. Often you'll see the information you need in the user interface, without having to open the data source itself. For example for a hit in a SharePoint list of contacts we build a virtual contact card.
Without divulging your firm's methods or clients, will you characterize a typical use case for your firm's search and retrieval capabilities?
A typical user would be a medium size company in the knowledge industry. They would either be running our system on their own VMware, XEN virtualization platform, or in the Amazon cloud. In the future we believe more and more of our customers will be using cloud based solutions. We expect to have an announcement about Microsoft Azure in a very short time.
What are the benefits to a commercial organization or a government agency when working with your firm?
Searchdaimon is easy to get started with. It ships ready to run and don’t requires any consultants etc. to get you started. We also have a price advantage over comparable systems.
But that doesn’t mean that our system isn’t as advanced as the competitors. The ES is very flexible, and can be used to do almost any search project, including Internet scale, if a customer needs that.
What does your firm provide to customers to help them deal with the volume problem?
Searchdaimon's background is Internet search, so we are used to handling large volumes of data. There is no practical limit to the number of Searchdaimon ES virtual servers which can be clustered together. Therefore, big data present no particular problems.
How is indexing handled?
Searchdaimon ES does index updates in batches. This makes it cheap to handle large volumes, though there will be several seconds delay between adding new content until it is available in the search results. The only exception is deletion, where you can delete data in real time.
There has been a surge in interest in putting "everything" in a repository and then manipulating the indexes to the information in the repository. On the surface, this seems to be gaining traction because network resident information can "disappear" or become unavailable. What's your view of the repository versus non repository approach to content processing?
I agree that putting everything into the repository sounds like a good idea. In addition to be able to serve documents that has disappeared for some reasons. One could also develop additional services, like allowing users to see revision history of files, and serve as a single access point for data. Mobile devises would also greatly benefit from this, because they may not be able to access the underlying data source, but a simple html based search system for every device. However because of issues related to security and our need of keeping our search engine as efficient as possible, Searchdaimon hasn’t yet made any attempts to build in a data warehouse yet.
Put on your wizard hat. What are the three most significant technologies that you see affecting your search business?
The ongoing adaptation of cloud computing and virtualization has made it much easier to experiment with search technology. Back in the days one had to physically receive our search appliance through an email. Now you can launch it as a virtual machine in the Amazon cloud with a single click, and only pay $0.12/hr.
Data volume is now growing exponentially. We are routinely seeing users with email boxes in the hundreds of thousands of emails and search projects where they plan to collect “everything” for “all time”. The great spread of advance mobile devises is also exciting. Maybe we will see something like Apples Siri for internal enterprise data soon?
Where does a reader get more information about your firm?
Our Web site http://www.searchdaimon.com/ has all the details.
Searchdaimon is company which offers a fully featured, price competitive enterprise search system. With the consolidation of search vendors like Autonomy and Exalead, Searchdaimon is a company poised to provide a robust search system which can be used as a foundation for search based applications. The firm’s support of cloud implementations is one of its distinguishing features. Worth a look.
Stephen E. Arnold, October 9, 2012
Sponsored by Augmentext