An Interview with Paul Doscher
Open source search offers organizations and Web solutions an alternative to the proprietary commercial information retrieval systems which have dominated for decades. In the last 18 months, many of the high-profile, proprietary search solutions have been acquired by larger companies. The amount paid for some of these firms has been significant, even by the standards of Silicon Valley. Hewlett Packard paid about $10 billion for the IDOL technology and Autonomy’s wide ranging content processing products.Oracle, in the span of a few months, purchased Endeca, a company’s whose technology dates from the late 1990s and InQuira, a natural language processing vendor focusing on the customer service market.
Lexmark acquired Isys Search Software and Brainware in an effort to provide its customers with ways to find information. The acquisition parade was kicked off with the Microsoft purchase of the Fast Search & Transfer assets in early 2008. Dassault Systèmes acquired Exalead and has shifted the information retrieval company’s focus, surprising some licensees with the changes. In short, an earthquake has rippled across the search-and-retrieval landscape.
What’s happened in the space of a few years is that a gap has opened in the market for search solutions which work and arrive without the handcuffs, price tags, and attitude of traditional enterprise search vendors. Open source search has been legitimized by IBM. After supporting the open source Eclipse initiative, IBM embraced Lucene and incorporated that open source technology into its expensive Content Analytics and Watson solutions. You remember Watson. The natural language processing system “won” the television game show jeopardy. The IBM marketing machine left out the fact that TV game shows are “produced” and that the core technology was open source software. Those savvy in search realized that if IBM could build products on Lucene and Solr, the technology was ready for prime time.
Lucid Imagination is one of the—if not the—leading vendors of open source search solutions. Since the firm was found in 2008, the company has become a stalwart in the Lucene/Solr open source search community and the leading provider of open source enterprise search solutions. The company conducts training sessions, operates a dynamic series of conferences about open source search, and provides engineering and technical services, software, and support to clients worldwide.
The company’s new President and CEO is Paul Doscher, the former President and CEO of Jaspersoft, one of the most successful open source analytics companies and the former CEO of Dassault Exalead, INC. a vendor of a proprietary enterprise search system.
I spoke with Paul in Lucid Imagination’s new headquarters located near Oracle, the database giant, and Google, the online search and advertising system. The full text of my interview with him appears below:
Thanks for taking the time to speak with me. When did you join Lucid Imagination?
Thanks for giving me the opportunity to talk about Lucid Imagination. To answer your question, I joined Lucid in December 2011. I was the CEO of Dassault Exalead in the US, and I found the opportunity with Lucid Imagination’s open source business model an interesting opportunity. As you know, open source software is now recognized as big business.
Yes, RedHat just posted a $1.0 billion in revenue, right?
That’s right. IBM’s use of Lucene has been a wake up call to many of the enterprise search vendors. I saw an opportunity to respond to what I think is an important turning point in the enterprise software market in general and for search and findability in particular.
What's the history of Lucid Imagination?
Lucid Imagination was formed in November 2008 as a commercial enterprise to help sponsor the existing Apache Lucene/Solr open source community. The company wanted to broaden the community globally and increase the adoption of the technology within corporate information technology.
A combination of fostering the community and delivering value-added products and services?
Yes, the community is the key to the success of open source software. In a traditional proprietary enterprise search company, upgrades are planned. Bug fixes may be grouped. New versions may come along according to a timetable set by the vendor’s product roadmap. In open source, the community spots a problem and fixes it. If a new feature is needed, members of the community will create the service or perhaps two or three services. The solution which delivers will bubble to the top and everyone can take advantage of this process. Lucid is a believer in the value and efficiency of the open source model.
What information challenges does the company seek to resolve for its clients?
Like most enterprise search software, Lucid Imagination provides the opportunity for corporate IT to develop and build more valuable, interactive, query-based information applications incorporating a wide variety of content for far more effective decision making.
Can you give me an example?
You mentioned the conferences we sponsor. Each program is filled with examples of high-profile Lucid Imagination customers leveraging open source software. When a user taps into Netflix, the search system is based on Lucene/Solr. Cisco Systems, EMC, and eHarmony rely on open source search. Most Fortune 1000 companies are testing, using, and shifting systems to make use of open source search and content processing systems. In fact, the larger companies are emerging as among the most innovative users of open source search technology. A couple of academics studied why open source is surging in large companies. The reason, as I recall, is that these organizations have the most to gain by using open source search to improve operational efficiency. A large customer service department relying on open source search gets a solid solution without the costs and licensing restrictions imposed by vendors of proprietary software. Lock in and lock down are not part of the equation for organizations looking to reduce costs and improve efficiency.
Why open source search now?
Open source has been around for years and with the recent acquisitions you've already mentioned, it is creating concern as to the longevity of many proprietary search solutions. Open source has actually become the 'safe' choice. In addition, proprietary search solutions have run into increasing friction. The costs of deploying, optimizing, and maintaining a proprietary solution have continued to go up. Licensees of these systems now have to respond to new types and volumes of digital information. Many of the proprietary search solutions restrict how and what a licensee can do once the license agreement is signed. Today organizations need the flexibility to adapt and make changes. A proprietary solution may not permit the licensee to make enhancements. If a change is made, the proprietary search vendor may “own” the fix and will add that innovation to its core product. The licensee who created the fix gets nothing and may have had to pay for the right to innovate. As corporate information technology struggles to keep up with escalating business information demands and an ever increasing mountain of growing content of all types, open source search provides a cost effective and efficient way to develop applications to address the challenges and opportunities in today’s enterprise.
When did you become interested in text and content processing?
I have spent much of my early career in the database and business intelligence markets, I learned the opportunity created by these technologies but also their limitations. Search technology, effectively deployed can greatly expand the concept of information retrieval using natural language ad-hoc query capabilities enabling the next wave of innovation to benefit decision makers in the enterprise. I followed that interest to Dassault Exalead. The idea of providing search-based applications was a good one, but the time, cost, and limitations of traditional licensing agreements blunt some of the payoff. I realized that my interest was not just in the technology, but I was fascinated how search could transform an organization's decision making. Adding open source’s community strikes me as a 21st century way to solve a fundamental problem for managers.
Lucid Imagination has been investing in open source search and and text analysis for several years. What are the general areas of your research activities?
Lucid Imagination employees open source developers across a number of significant open source projects such as Lucene, Solr, Hadoop, Mahout, and TIKA, among others. We do this for the purpose of advancing the functionality for these projects so they are used by the broadest possible audience from higher education to government, to the largest and most complex enterprises. We are listening to our customers and trying to develop value-added solutions which meet needs which are often particular to a particular business problem. Our commercial enterprise search solution includes some components such as connectors to access proprietary file formats. We offer these specialized components on a fee basis while supporting the broad application for installations which do not have to tap into content locked in a Lotus Notes repository, for example. We license some of our connectors from a unit of Lexmark, the printer company. We have to assess a reasonable fee for these because we are obligated to compensate Lexmark for their software. Our research and development are driven by the problems our customers face. Customers, not engineers near my office, are in the best position to identify an issue in my experience.
What are you contributing to the community?
Our approach is to contribute 100 percent of the code we develop for the open source projects back to the community. For example, recently some of our key developers donated a major set of functionality to the Solr trunk to significantly advance the scalability of that platform.
Don’t other commercial companies using your Solr enhancements get a free ride?
We think that supporting the community is more important than what a particular entrepreneur or competitor will do with something Lucid’s engineers contribute. The key to Lucid’s success boils down to the quality of our people and our ability to respond to customer needs. We are adding value up and down the customer experience chain. That’s what makes Lucid thrive. Great software is a result of implementation, engineering, and support, not just bits and bytes.
Many vendors argue that mash ups are and data fusion are "the way information retrieval will work going forward? How does Lucid Imagination perceive this blend of search and user-accessible outputs?
That’s a great observation. Mash ups are a large part of the Web and many commercial dashboards. Lucid Imagination provides the capability to contextually integrate content directly from the source systems. Our approach helps the user to avoid duplication and confusion. When information is presented out of context, the user has to be able to “see” and “access” the underlying information. Lucid’s approach puts the information in one place and the interface can be set up to meet the needs of a particular user or group of users. Using a connector framework and “fuzzy” logic matching algorithms, Lucid Imagination integrates structured and unstructured content into a highly scalable index. The index can be used as an application development platform to serve this integrated content into a wide variety of query based UI’s that are easily consumed by decision makers. We have taken the search-enabled application and converted finding into a fundamental information access platform.
Without divulging your firm's methods or clients, will you characterize a typical use case for your firm's search and retrieval capabilities?
Lucid Imagination’s customers use our technology platform for an extremely wide range of applications from powering the world’s largest business to consumer Web sites; for example, Facebook and Twitter, internal and external business to business Web sites, customer support applications, log management, e-mail search and discovery, supply chain and logistics applications, and health care customer record management are just a few examples. Lucid Imagination Lucene/Solr are also used in many OEM applications as embedded search technology to provide better incite and analytics into the application content. Our approach to an engagement is to listen to what our customers need, prepare an action plan, and then deliver. In a sense, our approach is the type of involvement that many software companies have stepped away from. We have an enthusiastic group of engineers and professionals who work with clients to meet their needs.
What are the benefits to a commercial organization or a government agency when working with your firm?
I mentioned the benefits of open source search; specifically, zero license cost for unlimited development. The payoff, of course, is quicker time to value for the enterprise. Open source search delivers a lower overall total cost of ownership. There is no vendor lock-in plus the on-going, real-time improvements in functionality. The peer review, committers model where experts make sure a change has no adverse consequences is a huge plus. There have been some commercial proprietary search vendors who have shipped updates which have broken the existing search system’s indexes. The open source approach helps minimize these types of issues. If an organization wants to reduce costs and improve findability, open source search is a viable and attractive alternative to the old-style approach of proprietary software vendors.
One challenge, particularly in professional intelligence operations, is moving data from point A to point B; that is, information enters a system but it must be made available to an individual who needs that information or at least must know about the information. What does your firm offer licensees to address this issue of content "push", report generation, and personalization within a workflow?
Lucidworks Platform, our commercial license software, allows organizations the opportunity to develop multiple interfaces that can queue off of business rules for event notifications and "push" notifications, development of business intelligence dashboards for more real-time query access or EEN standardized queries for easy report generation and deployment.
What's your view of the repository versus non repository approach to open source search? What are the "hooks" between content processed by your system and commercial repositories or data warehouses?
Lucid Imagination delivers a search enabling platform. Some vendors want to put "everything" in a repository and then manipulating the indexes to the information in the repository. On the surface, this seems to be gaining traction because network resident information can "disappear" or become unavailable. What's your view of the repository versus non repository approach to open source search? What are the "hooks" between content processed by your system and commercial repositories or data warehouses?
What connectors does Lucid provide?
We offer a full line up of content connectors. When an organization licenses the commercial enterprise solution, we provide a ready-to-connect system which can process more than 100 file types and most standard file systems. The connectors are a combination of software licensed from third-party specialists to work Lucid’s engineers have done to support specific, often unusual, types of enterprise content.
Put on your wizard hat. What are the three most significant technologies that you see affecting your search business?
I am not sure I am much of a wizard. We are tracking several developments and responding to these as our customers point out particular challenges. First, we see a growing trend in adopting big data platforms like Hadoop and other MapReduce-type data management systems and methods. These platforms mandate a highly scalable application development framework to support the next wave of information retrieval applications across a broad range of different content types. Interest in big data is moving from business intelligence into more general business activities. We are developing technology in this area that will dramatically facilitate the opportunities to adopt Big Data strategies with major announcements within the nexr few weeks. Second, there is increased importance on social media information and corporations need to be more closely connected to the changing tide of consumer sentiment. Social content is now a major information stream and it is important that organizations can find, discover, and access the information in these high-volume streams. Lucid Imagination offers LucidWorks for the cloud empowering organizations with the ability to integrate social media data and internal data in SAAS based applications where ease of development and acceleration of deployment provide tremendous value to business decision makers. Lucidworks for the cloud is also being used to power search within a growing number of B-2-C websites. Third, the continued advancement of mobile devices as the standard interface for business decision makers and the inevitable (my opinion) change from a “pull” based to a “push” based information retrieval model. Lucid provides a search-and-retrieval solution which can give those who require mobile search a framework for innovation. Traditional search is not a good fit for some mobile applications. To a large extent the foundation for supporting these technology advancements are already a part of the base open source platform but a deeper, richer set of features will need to be developed to address issues like new mobile/cloud based security protocols or need for more and more real-time event driven analysis.
Where does a reader get more information about your firm?
Our Web site at www.lucidimagination.com provide contact details. I will be at the upcoming Lucid conference, LUCENE REVOLUTION in Boston, May 7-10. Look for me there.
In 2003, when I wrote the first edition of the Enterprise Search Report, open source search was not a viable option for most organizations. Today, there are a number of vendors providing an open source search solution. The vast majority of these are the inspired efforts of one or a small team of developers. Most of the open source search solutions are one-offs like the Basho Riak Search solution, tailored to a specific problem such as finding content in a MySQL database like Sphinx Search, a “one man band” like Constellio and ElasticSearch, or an government-funded effort to reduce costs and work around problems with commercial proprietary search solutions. Lucid Imagination is an established company with brick-and-mortar offices, solid financial foundations, and a staff of more than two dozen engineers and developers. The company’s business model is focused on contributing to the open source community and delivering value-added services and support to its commercial and government clients. If you are looking to enhance findability in your organization, ArnoldIT recommends that you take a close look at the open source search solution provided by Lucid Imagination.
Stephen E Arnold, April 16, 2012
Sponsored by Pandia.com