An Interview with Miles Kehoe
Miles Kehoe is one of the go-to experts in search and related technologies. In 2012, he joined LucidWorks as the head of the firm’s professional services business. LucidWorks is one of the fastest growing vendors in enterprise search. Unlike the search vendors of the early 2000s, LucidWorks uses open source Lucene and Solr to provide organizations with robust, high-performance information retrieval systems.
Open source search technology is having a significant impact on the enterprise search market. In addition, many start ups are embracing open source search technology.
LucidWorks (formerly Lucid Imagination) has established itself as the RedHat of open source search vendors. One of the principal reasons for LucidWorks’ rapid growth in the last year is its engineering depth.
Enterprise search has to be integrated into an organization’s existing software and systems. As a result, licensees or the LucidWorks’ technology require access to experts who can handle the complex software challenges in organizations worldwide. LucidWorks supports on premises installations, cloud implementations, and hybrid deployments.
Mr. Kehoe spoke with me on January 25, 2013 . The full text of our conversation appears below.
What's the 2013 positioning of LucidWorks?
I don’t think LucidWorks positioning has really changed in 2013. In terms of the technology, we continue to provide full support for Solr and Solr Cloud, a remarkable open source search technology. And we continue to improve our commercial offering, LucidWorks Search, for those companies that like the power and capabilities of Solr/Solr Cloud but want to have a more traditional packaged product with full support.
We have also introduced LucidWorks Big Data, a rich ecosystem of tightly integrated Apache projects that work together to dramatically alter the information access experience. With our big data offering, organizations don’t need to download packages from a number of sites and spend months trying to glue them together into a single, integrated solution.
One issue I’ve seen our customers struggle with is the confusion with the introduction of so many new ‘products’ based on Lucene/Solr.
Lucene is the low-level API that supports searching and retrieval of textual content. Solr utilizes the low level Lucene APIs, but provides a number of higher-level ways to interface with search. Now that Apache has merged the Lucene and Solr projects a number of organizations make the claim that their product is “based on Lucene/Solr”; when in fact they utilize the lower level API and may have issues integrating with other technologies that support Solr.
Some of these projects based on the Lucene API implement really cool capabilities that were just not available in Solr 3.x; but now that Solr 4 (‘Solr Cloud’) is out, those exciting features are moot because more robust capabilities are just part of standard Solr Cloud.
What’s your background in information retrieval?
My introduction to text and content processing came in 1989 when I started working at Verity. I remember thinking of all the unique applications that great search could power; and I guess I just fell in love with the technology. It wasn’t until after we started SearchButton.com in 1998 that I realized how critical the business side of search was for users; and now I preach the importance of managing search to keep uses happy.
LucidWorks has been investing in search, content processing, and text analysis for almost a decade.
My role at LucidWorks is to run the professional services organization. I can tell you that my former business partner at New Idea Engineering, Mark Bennett, now also here at LucidWorks, is looking at ways of improving our commercial offering, LucidWorks Search. As software evolves to “cross the chasm” from open source developers to busy Enterprise administrators, additional time-saving features become essential. These are features that wouldn’t seem important to coders, but are worth paying for if you’re managing a half dozen different components of enterprise software. You can expect to see our Enterprise UI evolve over the next year.
Many vendors argue that mash ups, unified information access and data fusion are "the way information retrieval will work going forward." I am referring to structured and unstructured information. How does LucidWorks perceive this blend of search and user-accessible outputs for 2013 and beyond?
Let’s say for example you want to target mobile users. The trend there is to embed search into each application. And if a company is going to serve all of it’s users in one project, they might decide to do an initial mobile-friendly HTML5 application, vs. specific versions for iOS, Android, etc. So in that case they’d leverage Solr’s REST API and JSON support, to weave search results into the search application. In fact, Solr could be used to drive most of the screens of the application, including “browsing” activities that you’d traditionally think a database would be needed for; you don’t have to be searching for text in order to use Solr. Solr 4 even supports certain types of join functionality.
Without divulging your firm's methods or clients, will you characterize a typical use case for your firm's search and retrieval engagements?
Our customers are typically organizations that see search as a key competitive advantage, who have big problems finding content, and who want the security of owning their own future. That said, I’d categorize our customers into one of two camps.
The first are organizations where search is a key element of their product or service. Consider giant eCommerce sites with millions of unique products and SKUs; or information companies with huge data stores that deliver content to demanding users. For these customers, search has to be cost-effective, feature-rich, highly scalable and completely reliable – it just works. Solr runs some of the largest sites on the web; and some of the largest and most critical commercial and government agencies around the world. The pricing that commercial vendors typically charge for these large high-volume sites drives organizations to open source search and LucidWorks.
The second type of organizations that come to LucidWorks are the companies that use traditional ‘enterprise search’ in-house. They need to enable the information workers in their respective organizations to provide information to their employees. Think of any type of business that provides information and market intelligence to their employees for internal or external consumption – say a bank or a consulting firm. For them, open source and LucidWorks Search represent an insurance policy against obsolescence.
What are the benefits to a commercial organization or a government agency when working with your firm?
I’d say that the main benefit organizations have in selecting LucidWorks for enterprise search is the safety of open source.
In the last decade, the ‘safe’ choices in search were companies like Verity, UltraSeek and FAST. Mergers and acquisitions of the last 10 years, along with the forced obsolescence and expensive conversions have shown that no software platform is a truly safe choice.
At the same time, the Apache Project has worked steadily to improve and extend Solr to the point where it meets or exceeds the power and scalability of virtually all commercial platforms. The only advantage that commercial vendors enjoyed was documentation and support. LucidWorks provides Solr support and documentation, as well as a feature-rich enterprise search platform. And as I mentioned earlier, our customers can get the platform and support they need, without being locked in.
Finally, because we employ many of the folks who wrote, support, and enhance Lucene and Solr, we can help facilitate many of the enhancements to those core projects over time because of the feedback from our customers.
What differentiates your firm’s approach from other firms’ methods?
Just as we have different products and services for our customers, we can customize our engagements to meet our customers’ needs.
Some of our customers want to have deep product expertise in-house, and with training, best practice and advisory consulting, and operations/production consulting, we help them come up to speed. We also provide ongoing technical and production support for mission critical applications – just last month an eCommerce site ran into production problems on the Friday afternoon before Christmas. We were able to help them out and have them at full capacity before dinner.
Not to dwell on it, but what sets LucidWorks apart is the people. We employ a large number of the team that created and enhances Lucene and Solr including Grant Ingersoll, Steve Rowe and Yonik Seeley. We also have significant expertise on the business side as well. At the top, Paul Doscher grew Exalead from an unknown firm into a major enterprise search player over just a few years; my former business partner Mark Bennett and I have built up deep understanding of search since our Verity days in the early 1990s.
One challenge to those involved with squeezing useful elements from large volumes of content is the volume of content AND the rate of change in existing content objects. What does your firm provide to customers to help them deal with the Big Data problem?
Big data is really complex, and I don’t think any single product – open source or commercial - provides everything that you need. You need a data repository capable of storing lots of documents that take up lots of space. You need great search – the primary interface to big data. And you need analytics and reporting to understand what the data is telling you.
Our LucidWorks Big Data product comes bundled with everything you need, integrated in a single package. LucidWorks Search - Hadoop – Hbase – Mahoot – Pig – and more. You download our offering on a virtual machine and you’re up and running.
And real time search?
Regarding ‘near real time’ search, I have to admit that I think Verity had the best solution to this back in 1989. On the other hand, since that is no longer available – in fact not even mentioned in other commercial search platforms – I can tell you that LucidWorks Search supports very granular incremental updates based on Solr’s ‘near real time’ functionality. We can set updates to happen based on time (in milliseconds) or in the number of documents submitted for indexing.
What does your firm offer licensees to address this issue of content "push", report generation, and personalization within a work flow?
It’s been said that really great search has to provide the right content to the right user at the right time. Verity did this well back in 1989; and this is an area where LucidWorks Search excels as well. Users can select any query and either save it for future manual execution, or request periodic email notification of new content. The built-in options are hourly, daily, or weekly, but these can be configured programmatically for finer control.
There has been a surge in interest in putting "everything" in a repository and then manipulating the indexes to the information in the repository. What are the "hooks" or connectors between sources, the content processed by your system, and commercial repositories or data warehouses?
Well, I’m a believer in including as much of your content as possible into a single search index. To some extent, we can thank Google for teaching users that all content is available from the search box; and there are so many issues with federated search that I really think it’s a best practice to index it all in one place.
That said, a good enterprise search solution has to be able to access all of the content, no matter where it lives and no matter what its format.
LucidWorks Search provides a framework to create custom connectors. But more importantly, we support the Google Connection Manager architecture so our customers can utilize any of the connectors they may find for the Google Search Appliance.
Visualization has been a great addition to briefings. On the other hand, visualization and other graphic eye candy can be a problem to those in stressful operational situations? What's your firm's approach to presenting "outputs" for end user reuse or for mobile access?
LucidWorks believes in delivering the output format that our users want, rather then limit them to a single option. Our standard product can deliver content in JSON, XML and many other formats.
In addition, LucidWorks Big Data includes extensive visualization and analytics tools for the deeper analysis needs of big data.
I am on the fence about the merging of retrieval within other applications. What's your take on the "new" method which some people describe as "search enabled applications"?
I tend to like ‘search based applications’ when they serve a vertical need within the organization. For example, dashboards for financial or intelligence analysts can convey a visual summary of activity across a number of parameters – as long as they have the ability to ‘drill down’ easily, On the other hand, for content that is to be generally available in the organization, I prefer to see the ‘über index’ approach where everything is indexed into a common collection.
Solr, LucidWorks Search, and LucidWorks Big Data run on premises; in virtual environments both on premises or at a hosting facility; or on Amazon AWS and Microsoft Azure.
There seems to be a popular perception that the world will be doing computing via iPad devices and mobile phones. My concern is that serious computing infrastructures are needed and that users are "cut off" from access to more robust systems? How does your firm see the computing world over the next 12 to 18 months?
One of the nice things about LucidWorks Search is that we are completely device agnostic. If you want to support mobile clients, you’re free to use HTML5, Java/JScript, or any tool of your choosing to create an interface – or a dedicated search based application. Whatever you do, make sure you consider the user and the device. I hate looking at web pages created for my large screen desktop monitor on my Android ‘Chrome’ browser – I have enough trouble reading regular printouts, much less microscopic fonts!
Put on your wizard hat. What are the three most significant technologies that you see affecting your search business? How will your company respond?
Predicting future technologies is above my pay grade, but I think some critical directions the search market needs to address include:
- Improved security: Especially with respect to distributed security and trust models like those used in Microsoft Office 365; and single-sign-on type security across enterprise repositories. Security is not yet ‘done right’.
- Non-textual document formats: An increasing amount of corporate content will come in formats we’ve pretty much ignored thus far. – Think of video, audio, and image data. The first two we could probably integrate today; I’m not sure if we yet have the technology to recognize objects and people in photos – at least, not outside of security agencies in Washington DC!
- Better search management tools: I’m a firm believer that, beyond a certain functionality, the difference between average search and great search is a function of how well the search is managed. Yet very few search tools provide adequate tools to track, tune, and enhance enterprise search results. I think we’ll see ‘Search Managers Toolkit’ and other tool suites within a few years.
- As a bonus, I’ll throw in another prediction: better content mark-up tools. I think we’ll see more widespread use of content and query tools like Search Technology’s Aspire, Microsoft’s SP 2013 search query tool, and Pingar’s entity extraction and taxonomy tools.
How is your firm different from and better than the open source search solution from ElasticSearch?
You know I come from a ‘platform neutral’ background, and I know many of the folks involved with ElasticSearch. Their product addresses many of the shortcomings in Solr 3.x, and a year or two ago that would have been a coup.
But now, Solr 4 completely addresses those shortcomings, and then some, with SolrCloud and Zoo Keeper. ES says it doesn’t require a pesky ‘schema’ to define fields; and when you’re playing with a product for the first time, that is kind of nice. On the other hand, folks I know who have attempted production projects with ES tell me there’s no way you want to go into production without a schema.
Apache Lucene and Solr enjoy a much larger community of developers. If you check the Wikipedia page, you’ll see that Lucene and Solr both list the Apache Software Foundation as the developer; Elastic Search lists a single developer, who it turns out, has made the vast majority of updates to date. While it is based on Apache Lucene, Elastic Search is not an Apache project.
Both products support RESTful API usage, but Elastic requires all transactions to use JSON. Solr supports JSON as well, but goes beyond to support transactions in many formats including XML, Java, PHP, CSV and Python. This lets you write applications to interact with Solr in any language and with any protocol you want to use.
But the most noticeable difference is that Solr has an awesome Web Based Admin UI, ES doesn’t. If you’re only writing code, you might not care, but the second a project is handed over to an Admin group they’re bound to notice! It makes me smile every time somebody says ES and “ease of use” in the same sentence – you remember the MS DOS prompt back in 1990? Although early adopters enjoyed that “simplicity”, business people preferred mouse-based systems like the Mac and Windows. We’re seeing this play out all over again – busy IT people want an admin UI – they don’t want to spend all day at what amounts to a “web command line”, stitching together URLs and JSON commands.
Where does a reader get more information about your firm?
The best way to contact us is via our web site (www.lucidworks.com) or give us a call at 650-353-4057. I’d also invite any of your readers to contact me directly at the above number or on LinkedIn (http://www.linkedin.com/in/mileskehoe)
In my analysis of open source search for IDC, I rated LucidWorks as one of the leading vendors in enterprise search. Other firms with open source components have not yet achieved the technical critical mass of LucidWorks. Proprietary search vendors are integrating open source search technology into their systems in an effort to reduce their technology costs. At this time, LucidWorks is one of the leading vendors of enterprise and Web-centric search.
Stephen E. Arnold, January 29, 2013