An Interview with Mark Brandon
A revolution in search and retrieval has taken place. Companies like Hewlett Packard, IBM, Oracle, and Microsoft offer search solutions. But these established companies rely on technology that may be 20, 30, even 40 years old.
In the real world, companies like Elasticsearch have built more modern, flexible systems using open source technology. The shift in enterprise search from the proprietary technology model to a more open approach has happened. Representative of the new wave of information retrieval solutions is Qbox.
QBox offers a range of search-centric services using Elasticsearch, an open source search solution that has forced less well funded vendors like LucidWorks to scramble and established vendors like HP Autonomy to do some fancy dancing.
Qbox, of course, is not the only enterprise search upstart, but the company embodies a different approach, which may put further pressure on enterprise search vendors scrambling to close deals that generate sufficient revenue to keep customer support lines open and deal with the myriad technical hurdles their technology presents. Qbox’s customers benefit from more modern technology and a business model built on meeting customer needs, not operating a system that extends and embraces the business practices of proprietary search technologies.
I spoke with Mark Brandon about his fast growing search business. The full text of the interview appears below.
Thanks for taking the time to talk. What's the history of your company?
Qbox opened for business about one year ago. We provide Elasticsearch hosting, solutions, and support. You could think of us as a "Mongolab for Elasticsearch".
Whats that mean for your customers?
In a nutshell, we offer solutions that work and deliver the benefits of open source technology in a cost-effective way. Customers are looking for search solutions that actually work.
When did you become interested in text and content processing?
Before the company was involved in search, I was personally involved in a project to create an application marketplace for cloud software. While that project went nowhere, I did learn a thing or two about what it meant to create a great search experience, performance, faceting, filtering, aggregations, etc.
After a grueling experience learning how to build it, I decided that there was a problem to be solved. That was when I teamed up with Sloan Ahrens and we productized what I had learned.
Whats your background?
I am a graduate of the University of Texas at Austin. I have an undergraduate degree in history and a master's degree in technology commercialization. For as long as I can remember, I have had a technical knack.
Particularly important for me was my experience as a middleware salesman for Oracle and hardware and software salesman for Dell. When we set up Qbox, my partner and I were confident that we could both build and sell a solution.
Can you describe how you developed your interest in information access?
That's a great question. During my first week at Oracle, I asked one of my colleagues if they could share with me the names of the middleware buyer contacts at my 50 or so named accounts. One colleague said, "certainly", and moments later an Excel spreadsheet popped into my inbox.
I was stunned. I asked him if he was aware that "Excel is a Microsoft technology and we are Oracle."
He said, "Yes, of course."
I responded, "Why don't you just share it with me in the CRM System?" (the CRM was, of course, Siebel, an Oracle product).
He chortled and said, "Nobody uses the CRM here."
My head exploded. I gathered my wits to reply back, "Let me get this straight. We make the CRM software and we sell it to others. Are you telling me we don't use it in-house?"
He shot back, "It's slow and unusable, so nobody uses it."
As it turned out, with around 10 million corporate clients and about 50 million individual names, if I had to filter for "just middleware buyers", "just at my accounts", "in the Northeast", I could literally go get a cup of coffee and come back before the query was finished.
If I added a fourth facet, forget it. The CRM system would crash. If it is that bad at the one of the worlds biggest software companies, how bad is it throughout the enterprise?
When did that experience take place?
Well, that happened in 2008. Ninety percent of the data that exists today didn't exist back in 2008. The problem is only getting worse. QBox solves this type of everyday search problem.
Your company has embraced Elasticsearch. Why?
When our previous search product proved to be too cumbersome, we looked for an alternative to our initial system. We tested Elasticsearch and built a cluster of Elasticsearch servers. We could tell immediately that the Elasticsearch system was fast, stable, and customizable.
But arent other search systems as good?
Maybe. But we love the technology because of its built-in distributed nature, and we felt like there was room for a hosted provider, just as Cloudant is for CouchDB, Mongolab and MongoHQ are for MongoDB, Redis Labs is for Redis, and so on.
Qbox is a strong advocate for Elasticsearch because we can tailor the system to customer requirements, confident the system makes information more findable for users.
What are the general areas of your research activities related to Elasticsearch technology?
Thanks. Another great question. Right now, we have developed quite an expertise for deployment across several cloud vendors, sizes, configurations, and data centers. We've also built a seamless way to scale. All of that is built into our service. Spinning up one node of Elasticsearch is not hard. It's when you have multiple nodes to manage that a managed service starts to make sense. We have innovations that give us an edge in many enterprise deployments. Our team will continue to add value with this type of functional innovation.
Many vendors argue that mash ups are and data fusion are "the way information retrieval will work going forward. Whats your view?
I would say that this has always been the way that information retrieval works. Take ETL, for example.
Yeah, sorry. That's Extract, Transform, and Load is itself a product category that helps end users gather disparate data streams, normalize them, and then make them searchable. Accessing content is an Elasticsearch strength.
What's your view of this blend of search and user-accessible outputs?
The technology is evolving in many different directions. As for making search more useful, you have features such as aggregations (facets of facets), percolation (essentially, search in reverse), and all sorts of machine learning incorporated to help users recognize patterns before they even know it is valuable.
Is consumerization opening new opportunities or distracting people from more challenging applications of your firm's technology?
It's a little bit of both. On the lower end are some really snazzy front-end, self service solutions like Swiftype or Algolia that make it possible for a business user to configure simple search experiences like blog search.
On the higher end, even the most sophisticated search engineers would have a hard time utilizing all the capabilities of Elasticsearch. We like to think we are in the middle, ideal for users that have a more sophisticated need than blog search and appreciate managed services because they don't have their own search engineers.
Without divulging your firm's methods or clients, will you characterize a typical use case for your firm's information access capabilities?
Our clients demands are varied. Sometimes even the clients aren't that open about explaining what they're trying to do. We are fans of the Seasonal Recipe App, an open source project that helps users discover recipes that go along with local food sources.
Recognizing the breadth of ingredients, the seasonality of their availability, and then incorporating the geo-search functions is extraordinarily complex. We have contributed to this project by donating the infrastructure.
We have also helped out a project developing HebMorph, a Hebrew language full-text indexer. When you consider that Hebrew has a different alphabet and is read from right to left, you can appreciate how difficult this is. Itamar Syn-Hershko, the Israeli developer behind the project, is a genius. When you consider the amount and importance of Hebrew language historical documents that have not yet been indexed, you can also grasp how significant it is.
What are the benefits to a commercial organization or a government agency when working with your firm?
In both cases, our service is there to help people who have a search problem but certainly can't afford the time and expense of hiring in-house search engineers. Managed services still make a lot of sense for a whole lot of enterprises.
How does an information retrieval engagement with your firm move through its life cycle?
To be clear, most people know us as a relatively easy self-service hosted Elasticsearch. We also occasionally do some professional services, which is appropriate when a user needs help constructing the search application that goes on top of Elasticsearch. For those, we have different models, by the project or by the hour. If a company also would like ongoing application level support from the same people who wrote the code, we have partnered with Elasticsearch to provide that service.
What's unique to your firm's approach?
The speed and flexibility with which we can work. Our firm is young enough and small enough that we can fit our work into the client's needs, not the other way around.
How do you match your solution to the licensee's challenge?
We just ask. We have a scoping call to discern the customer's challenges, timelines, budget, and resource constraints, provide a proposal, and see if it is accepted. I am interested in the management approach you take, assuming there is one. When delivering projects and with our own internal projects, we use an Agile Method, developing Milestones and timelines. We use the low-cost agile tool from Scrumwise.
What does your firm provide to customers to help them deal with the volume/latency problem?
Systems have latency. The issue is what do you do to reduce indexing and query processing latency. Our instances are not like many hosted solutions. We are deploying dedicated nodes to the cloud provider and data center of the user's choice. We have very little abstraction between you and Elasticsearch otherwise, and this sets us apart from competitors.
First, users should want their search infrastructure next door to their primary data stores, whether that's in the US, UK, Asia, or wherever. Reducing the cross-data center latency is a huge advantage.
Second, because we have little abstraction, the latency is no more burdensome than if you were to provision your own cloud instances. We do not add anything that would make this worse. The noisy neighbor problem is non-existent.
What does your firm offer licensees to address this issue of content "push", report generation, and personalization within a work flow?
Elasticsearch is a foundational technology, but there are lots of visualization libraries that integrate well with it. Kibana and Logstash (together with Elasticsearch are collectively known as the ELK Stack), for example, handle any time-series data well. AngularJS and D3 can also be used.
There has been a surge in interest in putting "everything" in a repository and then manipulating the indexes to the information in the repository. What's your view of the repository versus non repository approach to content processing?
Elasticsearch can be used as a primary data store. Most people don't use it that way, but instead integrate with several data stores like Mongo, HBASE, JDBC, Cassandra, Couch, and others. There are unofficial integrations, dubbed "Rivers", that can integrate, but these are unsupported.
In practice, constructing your own hooks are usually the way clients proceed, and it is also the recommended method by Elasticsearch engineers.
What's your firm's approach to presenting "outputs" for end user reuse or for mobile access?
Most business users would ask what it's all for if you can't visualize the data. So, it's crucial to have this in mind. But, for us, it's a custom project. We can scope it and execute it, but visualization is not a product in and of itself. Elasticsearch provides a JSON response to your query and how you style this response in your front end application is up to you. There's no reason it can't be tailored to any device.
There seems to be a popular perception that the world will be doing computing via iPad devices and mobile phones.How does your firm see the computing world over the next 12 to 18 months?
We are device-agnostic. We provide search end-points, and our customers can construct the experience for whatever device(s) are relevant to them with that aforementioned JSON response. However, application speed in mobile is just as critical as on desktop. The current state of mobile search is dreadful, though. I saw a study where the average query speed during Black Friday for the top mobile stores was six seconds or thereabouts. On desktop, that would be unacceptable. Almost all ecommerce users have abandoned you by that point. So, optimizing for mobile will be a growth business for certain. If I had more time to devote to it, I would certainly dream up a library of mobile search design patterns that better incorporate faceted search for ecommerce, because while not every user searches that way, those that do are three times more likely to be buyers than browsers.
Put on your wizard hat. What are the three most significant technologies that you see affecting your content processing system?
I think the tools are getting better to help business users (non-developer users of data) get better at constructing their own applications. Right now, the business user just instructs the developer what they want and at some point in the future, they might deliver it. Also, when you combine machine learning and natural language processing with deep search, the possibilities are endless. IBM's Watson is a nifty start, combining business intelligence tools, machine learning, and natural language, but we've still barely scratched the surface. You can also bet that it will be an ever-escalating arms race.
Where does a reader get more information about your firm?
A few years ago, enterprise search required an on premises installation. Search software could take months to install, configure, and stabilize. Companies like Qbox are demonstrating that cloud technology and open source software can deliver functional, affordable solutions quickly and at a fraction of the cost of an old style Autonomy- or Endeca-type solution. Qbox is one of the new breed of enterprise search vendors. We think the companys approach deserves a close look.
Stephen E. Arnold, July 2, 2014