LucidWorks

An Interview with Mark Bennett

Mark Bennett of LucidWorks

Mark Bennett, formerly a partner at New Idea Engineering, joined LucidWorks in December 2012. LucidWorks accelerating growth has attracted top talent in search, analytics, and adjacent disciplines in the last year. Mr. Bennett has more than 20 years’ enterprise search experience. He has deep knowledge across the major commercial search platforms. Mr. Bennet joined the LucidWorks' engineering team where his seniority and talent is being leveraged within core engineering projects to further product innovation.

On March 1, I spoke with Mr. Bennett about the trajectory of search in 2013. The full text of my discussion with him appears below:

What attracted you to the information retrieval sector?

I arrived via databases and migrated to search technology.

Would you stay where there was three inches of snow on the ground in February? Let me phrase the question another way. What’s your background?

When I was a kid I had trouble reading and finding things at the library. It was a very annoying place, but worth it if you found the right thing.

Was this the finding tools like the card catalog?

No, it was vision related. When text started being put in electronic form and you could search for things quickly, I was hooked! Search technology was like having super powers.

I had also heard a story about these new "search engines" playing a key role in the development of high temperature super conductors back in those days. I recall being impressed with the idea that a computer could help researchers do so dramatically.

Did you work for a search vendor?

Yes, I started working at Verity maybe 20 or 25 years ago. That work just cemented me into search and all the related disciplines. I still retain my interest in physics, but I'm enjoying my work in the information, Big Data, and analytics market.

Do you like math?

Yes, I love mathematics. I've always liked SQL and other data management technologies. I am particularly fond of parsers. Each of these interests fits quite well with my work in search here at LucidWorks.

In the course of your career you have worked with what online search systems. What are the major trends you have discerned in the course of that work?

As you know, Verity was one of the pioneers in enterprise and large-scale information retrieval back in the 1990s. My work there was an excellent foundation into the challenges digital information access continues to present to users and organizations.

When I joined, Verity was the “Google” of its day. Our technology was at least a decade ahead of its competitors. Verity had hyperlinks five years before the World Wide Web even existed. Verity real-time could watch data feeds and automatically detect and route important messages. Verity topics were a very powerful taxonomy tool. This was all so cool.

What waves of change are you tracking?

One change I have noticed is that things are being rediscovered and re-hyped, in both business and tech. At Verity we were doing new things; for example, indexing local content, implementing a security token system to ensure appropriate access, and coping with network technology which was changing quickly.

Also, there is a renewed interest in real time search these days, which is the third time around by my count.

From my point of view, the trick is to learn from the previous cycles and then to get out in front the next time it starts coming around. IF WE'RE GOING TO REINVENT the wheel then let's at least get to the round version a bit quicker this time.

One thing we're reliving right now is the split between "developer packaging" and "enterprise packaging". They're both completely legitimate concerns, but at opposite ends of the spectrum.

Can you elaborate?

In a nutshell, search, analytics, and content processing vendors have to recognize that what is needed to allow developers to use the product is different from what is required to sell the product and deliver software which users embrace. Think dilbert and his pointy haired boss, they both care about software, but in very different ways.

Are you working on this at LucidWorks?

Yes, and I hope to help LucidWorks deliver to both parts.

Where does open source fit in?

That’s a good question. One notion that some commercial vendors don't get, and it's one of the reasons that open source is still doing well, is that building search systems usually focuses on what I call “atypical” use cases.

What’s that mean?

If search is a secondary concern for you, and you had pretty generic requirements, then you either downloaded a free search app or bought a Google appliance and were done with it. The challenge that keeps search specialists engaged is the problem of dealing with outliers--bizarre business requirements that every project seems to unearth. Outliers are the new norm.

Search is not a “one size fits all” solution?

Right, users have quite particular and individualized needs. The new norm requires search to meet these fine-grained information retrieval requirements. Where is your content coming from, what format, what quality? And then who'll be doing the searches, and what will they do with the results?

When I think about this shift from an “appliance” approach to a personalized approach, the shift is not completely unexpected if you think about it.

So while some engines drop features that "only three percent of people will ever use", other groups realize that it's the tools that matter, assembling applications quickly, tailoring outputs to a work task or user, and other actions which are quite different from the “one size fits all” approach you mentioned.

Let’s jump back to open source software. Where does open source search and content processing enter the information retrieval picture?

Open source sometimes provides a viable alternative to proprietary enterprise search systems. Open source software can be viewed as an extended, iterative, public conversation about exactly what search should be, but people vote with code contributions instead of dollars.

What’s this mean for proprietary search solutions?

Some organizations will use open source because its efficiencies are recognized by management. Other organizations will embrace open source because a vendor offers 24x7 support like LucidWorks and has world class engineers available to customize the system. The feature-set is different as well, enterprise buyers care about analytics and data quality, and would prefer a graphical UI. Other organizations will stick with what has been traditionally licensed year after year indifferent to the fact that what’s in an IBM solution may be open source or totally proprietary like Oracle Endeca or Oracle InQuira, and there are justifications for doing that as well.

The costs are the key point right?

In some case. License fees to some degree. The costs associated with obtaining specialized software which only works with a proprietary system. Engineering support is another factor. But I've always counseled clients that the main benefit of open source is flexibility and control. You're not guaranteed to save money on a project level basis.

What makes proprietary search so expensive?

That’s a good question. First, let’s take a typical example. A company has its data in XML, JSON, or some handy database, that company sidesteps content acquisition costs, content normalization hassles to some degree, and the expense of the computational load. Any of these can become an expensive proposition. Open source software can reduce some of these costs because the community has produced tools and because the methods are documented.

But the reality in the typical enterprise is so different from what "should" be the case; for example, some companies still need to use spiders and scrapers for their own content behind their own firewall. Commercial vendors’ fees have gone up in order to recover the costs of connectors and specialized systems to deal with disparate content. My view is that open source should be the baseline. Then available resources can be applied only when the open source solutions are not able to meet the organization’s specific needs, or can really save some time, or provide more bi, etc.

What’s a proprietary vendor do?

Keep buying other proprietary vendors I guess! By necessity the commercial companies have to figure this out. Acquisitions allow vendors to offer a complete information management and retrieval solution along with integration and support services. This is still very attractive to some large customers.

To be fair, there are issues which open source vendors have to consider as well. Open source often comes with the label “Some assembly required.”

What are the principal differences between the old style proprietary approach of a STAIRS III or BRS Search approach and a system based on Lucene/Solr?

Let me boil down my view. We are now getting a lot of smart people to think about how to improve information retrieval. Because of the broad, dynamic discussion, my view is that the Boolean style solutions are fitting into more robust blends of technology. Today we have natural language processing, facets, and smart software which can winnow results and present them in a more easily apprehendable form than a long laundry list of hits.

I talked to a vendor 10 years about a particularly tough search problem. He ticked off a half dozen reasons why it was really very hard to solve and not worth the effort. Years later the open source people visited this same problem, came up with a similar list, and then diligently worked through those items. LucidWorks, for instance, delivers facets, suggestions, advanced file storage, and high performance without the punishing costs of proprietary solution.

Where do open source search solutions fit in today's enterprise?

Business problems and software can be thought of as an extended conversation. What are the issues, what does the data look like, etc? If the problem is well defined and there's a reasonably priced commercial software package that handles it, then it's a no-brainer.

My view is that organizations shouldn't go with open source just to save money. Open source makes sense when the problems are either not well defined, or are very complex. Flexibility and innovation become extremely important. Access to a range of code libraries becomes important. Community becomes important when seeking inputs from individuals who may have ideas about solving a tough problem. An organization’s in house team and consultants can then tackle problem solving without the handcuffs proprietary software locks on.

For example, some giant eCommerce sites have extremely demanding requirements and highly specialized business requirements WOULD curl YOUR toes. If search is the front door to billions in sales, then the company has to make the system work. When the commercial vendors of proprietary search systems ruled the roost, it was actually their professional services organizations that had the biggest hand in making those sites work well, and of course they had software tools that they had created and presumably knew very well (not always the case)

In hindsight, I wonder if their success was really based on all of their patents and proprietary tech, or just that they had good tools and stable of engineers on hand.

What does LucidWorks offer to those organizations looking for an open source search solution?

We offer a range of solutions, services, and support options. Each client is different and often has unique requirements.

If a LucidWorks’ licensee has most tasks well in hand, under control, my colleagues and I at LucidWorks can just be available as needed. We are for some organizations a security blanket for quick support.

If the licensee needs some technical design assistance or has not finalized the system architecture, then LucidWorks can provide the necessary professional services to handle these tasks.

There are also clients who require enterprise levels of administrative support and system monitoring.

Many of our licensees migrate through various states over the course of working with us. Again an extended conversation as their business evolves. For those customers, we offer the ultimate "have it your way" model.

How does LucidWorks commercial version of Lucene/Solr differ from the version available from Apache?

That’s an excellent question. LucidWorks commercial version has extra user interface screens to speed through things that would otherwise be done manually via a script or a command line.

The administrative interface does more than save a coder time. Our user interface allows a company significantly more flexibility in staffing. The idea is that a person with basic technical skills can handle many tasks without having to reach for the manual or write a script.

Can you give me an example?

Sure. Business folks can boost and block documents by query, for example. Without our interface, the administrator would have to edit XML files on some server somewhere. Big companies need things that would bore the heck out of an open source coder, so I think both audiences would happily agree that this type of feature segmentation delivers a win/win. Lower costs, quicker set up, and less time fiddling with scripts and command line instructions.

In the open source search sector, there are dozens of companies offering information retrieval solutions. Can you highlight the differences between LucidWorks and a company like ElasticSearch or Datastax?

I can try. The vendors you mention have undertaken laudable efforts, and many are based on Lucene. ElasticSearch is very focused on developer packaging and coders love that. Despite all the comparisons done lately, the target audiences for most open source solutions are very different.

If you spin up a copy of Solr you've got a very powerful Web user interface, and LucidWorks gives you even more of an administrative user interface. But when you fire up ElasticSearch, you've got a REST API. The system listens on a port in a quite attractive manner.

You've probably worked with people who have remained faithful to the Unix or DOS command prompt and still get quite a bit of work done.

We still do some work via the command line.

Yes, and I still work from the command prompt many times during the day, although for things like email and web browsing I use the GUI versions.

But when I watch a Windows or Mac power user for a day, and then watch a senior, bearded Unix command prompt guru—both get a lot of work done. My point is that each is a different type of power user.

I do wonder what happens when an ElasticSearch developer hands off an application to a busy information technology person or an operations team to manage. Either those new owners are will need to know the "Web command line" (URL and JSON syntax) extremely well, OR If not, an administrative framework will be needed.

As you know, neither Solr nor LucidWorks requires extras to be built. Of course, just as a modern Unix bash shell prompt is an improvement over an old Bourne command prompt, open source Solr could use a bit of "developer" polishing. Solr's XML and Jetty-based stack looked pretty cool back in the 2000's, but feels a bit dated now.

We have addressed some of these issues with LucidWorks. Other open source vendors are caught in an envelope which makes detailed technical knowledge a basic requirement for keeping a system humming along. LucidWorks is a step beyond more commercial proprietary search systems in my opinion because it serves both groups of users.

Doesn’t that echo the role Verity played a quarter century ago?

Yes, that’s one reason why I find Lucid Works’ solution in the right place at the right time with the right technology. I also landed here a couple years earlier and am more vocal than I was back at Verity. And, of course, Verity was a success, they did an IPO and acquired other companies, I'm hoping for similar success here.

What are the principal challenges an organization may face when replacing a proprietary search system like SharePoint / Fast ESP search with LucidWorks solution?

I'd like to see better SharePoint connector options, better security integration. You interviewed my colleague, Miles Kehoe, a couple of months ago, and I agree with him. There needs to be some WebParts developed, preferably in the open source Solr stack, to allow for modular search apps to be quickly assembled.

I'd like to invite anybody with code or interest in that area to contact me at mark.bennett at lucidworks dot com. I would remind people that there are really two things here: First, getting SharePoint content into Solr, and, second, allowing SharePoint users to search against Solr. Although many companies will do both, these are technically two different integration points.

What services does LucidWorks offer to streamline the transition from proprietary search to a more open approach?

We're quite fortunate with the LucidWorks’ technology and human resources. Our professional services team has experience with many of other search engines. There are different ways to get content into Solr, and various options for porting the user interface. Chances are we've worked with many of the pieces before and know how to crack tough problems quickly. If an issue is a first time event, I am confident WE can develop a solution.

Licensees of legacy systems have to decide which aspects of the old search application worked well and which didn't. No two conversions will ever be identical, but that's why we're here.

With search systems providing access to structured and unstructured content, what features and services does LucidWorks offer to make integration a more streamlined process?

We offer a bunch of options. My approach is to begin with understanding where the source data are stored. Then I want to know if the data are going to remain in that location or will the data migrate? If the content is in a database and standards compliant, the content acquisition could be little more than using Solr's DIH.

What’s that?

Solr’s D|H is a data import handler.

What’s the process?

The licensee can do it, or our engineers can handle the data import. If the content acquisition has some twists and turns, we have the LucidWorks Big Data (LWBD) tool. The software goes way beyond LucidWorks Search (LWS) to process massive amounts of data and apply machine learning techniques to it.

Our CTO has a vision for moving beyond normal search applications, and I'm still coming up to speed on all of LucidWorks’ capabilities.

What is the principal advantage an organization enjoys when it works with LucidWorks?

That’s a good question. Our customers will be working with the organization with many of the core committers in both Lucene and Solr, and a number of other engineers who have worked with a range of search, content processing, and analytics systems. I haven't been as impressed with a company’s technical team since I first started back at Verity.

We serve both developers and busy IT/operations professionals who have to keep software running.

Also, LucidWorks interoperates smoothly with other search systems and enterprise applications. We don't need to "own" every island of data or every single search app in a client’s organization. LucidWorks does not have that old-style mindset.

We are also comfortable with clients who need situational help via the phone or email. On the other hand, if a customers needs us to handle a complete project, we do that too. We're pretty flexible. There's quite a bit of good open source / do it yourself karma here.

I think some customers just the like the security having somebody they could call if they ever got over their head.

Put on your wizard hat. What the major trends in search and retrieval which you are monitoring?

It’s hard to predict the future with certainty. I can highlight several issues which interest me and which may have an impact in the months ahead.

First, innovations like client-side-only search apps built with HTML5 is a big trend. However, I am concerned about the functionality available to a traditional programmer. The differences in browsers and the explosion of malware engineering troubles me a bit. I will adapt. It is possible that there will be some new security issues which could come along for the HTML5 ride.

Second, some of the applications will overlap with mobile apps. From an engineering point of view, programmers will adapt and best practices will evolve. However, the users may find themselves having to make decisions about which apps they need.

Third, Big Data will be transitioning from batch-oriented to real-time processing. There's a lot of work yet to be done there. And all these search indexes can be used for more than just search! Same thing for search logs.

Fourth, I think that there will be more refinement coming to what I call hybrid open source / commercial business models (or "developer packaging" versus "enterprise packaging"). Software may be on a track to becoming just one voice in a much larger, more overarching conversation among different constituents about what each is trying to accomplish.

Where can a reader get more information about LucidWorks?

The best way to contact us is via the LucidWorks Web site.

ArnoldIT Comment

In my analysis of open source search for IDC, I rated LucidWorks as one of the leading vendors in enterprise search. Other firms with open source components have not yet achieved the technical critical mass of LucidWorks. Proprietary search vendors are integrating open source search technology into their systems in an effort to reduce their technology costs. At this time, LucidWorks is one of the leading vendors of enterprise and Web-centric search.

Stephen E Arnold, March 4, 2013

Search AIT

LucidWorks

An Interview with Mark Bennett