An Interview with Franz Kögl
At the Enterprise Search Summit in London, England, June 2012, I spoke with representatives of a German search and content processing company. IntraFind opened for business in year 2000. What I found interesting was the firm’s strong commitment to open source search technology. I am working on a report about open source search, and IntraFind is one of the commercial firms involved in open source search with a track record that spans a decade and more.
I was able to arrange a series of follow up conversations with one of the founders of the company, Franz Kögl. The full text of our interview and subsequent discussion appears below:
What's the history of your firm?
IntraFind was founded by Bernhard Messer and me in 2000 with one idea. We wanted to give internet users a system which would deliver simple and quick access to the information in an enterprise.
Early on, we decided to focus on Intranet and enterprise search. For more than a decade, we have had the vision of providing a system which can extract what we call “the hidden knowledge: from unstructured and structured textual information. We wanted to handle the many different types of content sources. IntraFind is now grown up to 27 people, and some more will follow soon. We also have well known references within the enterprise segment.
How do your clients perceive your solution?
Ah, good point. I think that from our clients’ view, IntraFind provides a solution which makes it really simple for every employee to access corporate knowledge in a quick and easy way.
An important point is that IntraFind means more than just search. We go, as you say, “beyond search.” After more than 11 years doing enterprise search, I still find the challenges exciting. We closely work together with our customers to make clear the wide range of possible use cases enabled by IntraFind products.
About two years ago we started implementing solutions for metadata generation. Again this is completely beyond search. Our clients have reacted very positively on the increased ways to make use of IntraFind software products in day-to-day use.
When did you become interested in text and content processing?
In 1999, Bernhard Messer, my founding partner, and I began set out to provide consulting and professional services to organizations struggling with information access. As you know, the German language is very complex, e.g., full of irregularities and multiple-word terms. And so we started to develop.
Yes, I remember learning the word bildungsroman and reading Wilhelm Meisters Lehrjahre. Tough sledding for me.
Yes, for me too. The point is that stemmers do not deliver the quality we need for German. Therefore, we came up with the idea to develop a morphologically enhanced and semantic-based system which shifts the burden of complexity from the user to our system.
What are other areas of IntraFind’s research activities?
We actively fund research and development to constantly provide best of breed technology to our customers. And of course we also invest into research to support our comprehensive text analytics stack. The work underway includes semantic technologies, ontologies, machine learning, natural language processing, relationship and sentiment analysis, and entity extraction. We are also doing research in what we call “opinion mining,” which is an important part of our work with social content
Do you have government support for your research?
Yes, some. For example, we are now working in a research project of the German Federal Ministry of Education and Research. We are identifying customer statements and opinions regarding special products and services in content from the social Web.
Facebook and Twitter?
Yes, but other sources as well, particularly those used in Germany. In the project we are pre-processing the content, which often lacks a senseful context, for example a Word document from a specific author in a particular company department. We want to prepare the content so it can be further analyzed in a social customer relationship management solution.
Do you have any other research underway?
Yes, we are working to make scaling a large system easier for a system administrator responsible for an IntraFind implementation. Our field work has made clear that a wizard-driven administrative interface is important. We have an easy to deploy system, as we recognized that scaling is an area in which system administrators need and want more automation.
Many search vendors are now in the business intelligence business. Some replace a results list with a report or a mash up of different content. Does your firm support these new-style interfaces for those using an IntraFind system?
Yes, we have many clients who want more than a Google-style laundry list of results. We believe that the future is to improve the information access to unstructured data with structured information. At IntraFind, the system gives the user an "artificial" structure to support users' demands.
For example, we are putting significant effort into metadata extraction and classification. The idea is to allow users to have a more structured search experience. We also want to offer more interface options which allow users to click on a related item or topic and get more information without another trip to the search box.
Our TopicFinder and Named Entity Recognition technologies are being well received by our customers. With IntraFind, the licensee can implement a presentation of information or a search page which meets the needs of organization and users. The old style results list can be displayed, but the newer interfaces are more engaging, make suggestions, and provide quick and easy access.
Without divulging your firm's methods or clients, will you characterize a typical use case for your firm's search and retrieval capabilities?
Our typical use cases range from simple full text search with file share indexing within an enterprise to intelligent metadata extraction from various source systems - in unstructured and structured data – from small (10,000 items) to large scale (400 million items) – from simple user interfaces to a knowledge cockpits within search based applications and very specialized solutions which are strongly integrated in working processes. We are very strong on providing a single point of search to special data sources and enriching those data by providing intelligent classification and extraction methods. Our very successful business line is providing our SDK to be embedded into third party products.
What are the benefits to a commercial organization or a government agency when working with your firm?
We have more than 12 years’ experience in information retrieval. Our team of professionals has high-value, deep expertise in the area of enterprise search and text mining. We have first-rate skills in the use of open source modules, including Lucene. Our team has improved the quality of the Lucene system. IntraFind provides customers a very stable, high quality and fully tested system.
IntraFind has a strong customer focus. We are able to provide a standard software which is installed within 15 minutes, but if there are customer-specific requirements we are also flexible to adapt the software to our customer’s individual demands.
I would say that IntraFind’s strong customer focus, its flexibility and deep know-how, in combination with our best-of-breed technology are the strongest arguments why companies would want to work with IntraFind.
How does an information retrieval engagement with your firm move through its life cycle?
Our approach is, “Think big, start small.” Many of our customers start with an application like an internal search. We then can extend the service piece by piece. For example, a new requirement comes from another department or business unit. It is pretty typical for us to start work in a single department. Then the client decides to use our iFinder as corporate search solution.
What is special about our approach is that we combine computer linguistics within information retrieval. It's not the idea itself which is unique, but the way we have implemented it.
What does your firm provide to customers to help them deal with the “big data” problem that many organizations are experiencing?
Good question. Huge amounts of data are frequently discussed at client meetings. We can handle almost any volume of data. We have different methods to match specific client situations. If updating the index is a key consideration, we work with the client to make clear that the update time of the full-text index is largely dependent on the connected sources and the attendant system.
If the source system implements an "event" driven mechanism--like content management systems or file shares do--the update of an index documents will be processed within a few seconds. This is what we call "near real-time search".
For processing data from various sources we provide several technological approaches. The first approach is a classical crawling or poll approach where the data are being crawled from the source system and indexed. The poll approach also supports incremental updates of changed data within the source system. The crawling performance and frequency can be customized and adjusted according to the individual customer needs and environment requirements.
Our second approach is the push approach where actually the source systems are triggering the indexing process whenever a source item is changed. Here we offer Web services and also unified XML interfaces for allowing this kind of data transfer.
The third approach is an event based indexing where the source system needs to provide the ability to trigger events in case of changes. Our special connectors are connected to those events and processing any change on the fly. Using the push approach or event triggering based approach we can say that we provide instant updates to an index.
Do you support personalization of information access within a work flow?
We offer a possible search entry via a dashboard (as part of our user interface), where users can see new documents matching their "query-agents" or the topics they defined. This “self profiling” is assisted by text analytics. Personalization is reflected in such user configurable settings as "individual favorite repositories", "individual usage of widgets on the user interface (e.g. facets) /user individual configuration of the UI,” or using individual usage parameters to manipulate the relevance of search results.
What's your view of the repository versus non repository approach to content processing?
Actually, we do not see our software as a content repository. What we do is to analyze data at indexing time and make them searchable for the user. Access to the original data is provided via a response from the connected system . We have only a few pieces of additional content in our search software; for example, user generated content such as annotations.
What is your view of visualization in search systems?
We have made a number of experiments with different visualization techniques and determined that some ways of visualization distract the user. So some kind of visualization can reduce the user’s productivity. May be this will be changed in the future, but currently our clients are expecting something which is clear und easy to understand. Visualizations are quite useful in some marketing situations.
What’s your view of search enabled applications?
For sure, in the future every modern application will be "search enabled.” We offer a very intuitive software development kit. Licensees can use this to enrich enterprise applications with IntraFind technology.
Do you support mobile information access?
iPad, iPhones or other mobile devices are great for accessing emails, surfing the internet and so on. Information access is something for which a mobile device is well suited. There are many apps which work brilliantly. We will have a mobile version of our user interface in the very near future. We want to provide these new devices with a more intuitive search experience.
Put on your wizard hat. What are the three most significant technologies that you see affecting your search business?
I think the technologies which permit content access from anywhere are going to be quite important. There will be more and more search-driven applications. Also, the use of virtual systems as well as increased data storage capability are going to trigger innovation.
What we have to do in future is to ensure that IntraFind provides the same usability to each user regardless of the access device. We are preparing virtual appliances to make it very easy for administrators to install and run an enterprise search system. We want to make it easy for IntraFind to integrate in a seamless, way with massive data systems.
Where does a reader get more information about your firm?
Please visit www.intrafind.com.
IntraFind is a company anchored in “next generation search.” Lucene provides the basic foundation for the company’s search solution. But IntraFind has invested in text mining and entity extraction technology. As a result, the company offers a solution which moves beyond key word retrieval. If an organization is looking for a modern search system which permits the development of search-based applications, IntraFind is worth a close look and a test drive.
Stephen E Arnold, July 10, 2012