An Interview with Raphael Perez
The interest in open source search contributes to the schemozzle in enterprise search and content processing. When one reads the pontifications of the self appointed search experts, one learns that open source search is thriving or diving, innovating or imitating, and pulsing or panting.
What does one expect of English majors with a feel for the metaphors of Andrew Marvell, home economics experts turned enterprise architecture mavens, and failed Web masters? (Andrew Marvell was for a while the secretary of John Milton, another potential search expert in today’s world.)
The reality is that open source software is becoming a major factor in many commercial organizations.Palantir, an outfit which has accepted more than $150 million in venture funding, uses open source software. Will the commercialization of open source software in general and open source search in particular undermine “open source” itself?
In 2006, IBM dumped its internal search efforts and embraced Lucene. The code was open source and bugs were fixed by the community. IBM, however, figured out how to keep its open source “love” and generate revenue from services and proprietary software.
Now Lucene is the beating heart of the aging IBM’s OmniFind, Content Analytics, and Watson products. I think of Watson as a little like Liberace after his first face lift and tummy tuck.
There is also Lucid Imagination. I find the firm interesting since it is trying to implement the RedHat model using venture funding; that is, software which is mostly free and fees for engineering expertise monetized in such a way that the venture folks are happy with the growth and upside of their investments. What has set Lucid Imagination apart in the last nine months is its management “challenge”. First, Mark Krellenstein, who help found the company, embraced other opportunities. Next CEO Eric Gries left the helm. Assorted open source community “wizards” are in and then out and then in again. I find the flip flops at MarkLogic easier to follow than Lucid’s shifts.
There are open source search “vendors” at every turn it seems. There is SearchBlox, Flax, and Xapian. (I did my October Online Magazine article about open source search confusion, listing about two dozen past and present open source search systems with interesting and often opaque names. The story should be in print or online in January 2012, which underscores why some search experts are former real journalists. In September 2011, I learned that open source search vendor Tesuji has shifted course. So not even open source is without stormy seas in today’s financial climate.
I mention formerly open source search vendors who have morphed ever so gradually into full fledged Gordon Gekkos similar to the protagonist in the 1987 film Wall Street. (Some search vendors should make so much money as the film which explains much about American enterprise search marketing, by the way.) I would love to toss these chestnuts on the fire of this interview, but I don’t have the energy to deal with 35 year olds who know as much about the chemistry of a thorium reactor as they do about the history of search.
I learned about Open Search Server the old fashioned way—I stumbled across a reference to the company and chased down Raphael Perez, the president of the company.
Like a number of open source enterprise search projects, the trigger was a company’s hunt for an informati0on retrieval system that was available at a price the company could afford or offered the specific features the firm required. The platform was Lucene and the work began at Infopro, a French company. After some sleuthing I was able to contact Raphael Perez, one of the experts who worked on the initial project and now serves as CEO of Open Search Server.
The full text of my interview with him appears below:
Thanks for taking the time to speak with me.
No problem. It is my pleasure answering your questions.
What’s the origin of Open Search Server?
OpenSearchServer project started in a French B2B media group in 2007. The company was looking for a search solution.
Because no available solution was available at a decent price or offering all wished features, decision were made to create an in-house solution as an open source project based on Lucene.
Emmanuel Keller, then the chief information officer, led the projects, and after two years of work more than 12 applications were installed and providing high value results. In December 2009, Emmanuel purchased the rights of the solution and formed a company to develop the community and offer them high level professional services, support, community management. It was the start of the story and I joined Emmanuel some weeks after the launch.
What’s your role?
I am the CEO of the now 18 months old company and we work on enhancing our solution, animating the community and providing professional services to enterprises users.
Is the company also called Jaeksoft?
Yes, the open search server is the open source operation and the development and distribution is handled by Jaeksoft SaRL, which offers open source enterprise search solutions for developers.
As I said, we offer OpenSearchServer, an open source enterprise search tool that allows users to develop search-based applications using index and full text algorithms. The firm delivers support and services for OpenSearchServer developers as well as customized developments.
What do you provide via Jaeksoft?
We offer remote management, and we design new applications and implementations. We also do performance and optimization of OpenSearchServer installations. The Jaeksoft entity was set up in January 2010 and is based in Paris, France, hosted since October 2011 in Paris Technological Incubator with 30 others high tech startups, very near of the new French National Library
There is considerable turmoil in open source in my opinion. First, Google and Oracle have a legal battle underway, and the outcome of which is little more than a spin of the roulette wheel. In addition, Tesuji, an open source vendor in Hungary, has shifted to a different line of business and Lucid Imagination has been flipping executives the way the chef does at the local International House of Pancakes. Is this a stable environment?
We are really committed to open source and this is for us a key driver of our project and our company. Don’t forget that commercial open source is a new way of building businesses and companies need to fine tune their offers and business model to reach water level as well as matching customers expectations.
Of course, among the more than 300,000 projects available in Sourceforge no more that some tens will become successful commercial projects and most of them won’t even try to, but there is a traction for sure in enterprises for serious open source projects which are backed by skilled teams. Projects whose teams know what customers need and how to define a road map and execute a strong solution roadmap are in demand. A successful open source initiative requires a strong community manager as well.
Litigation is something else. My personal view is that I think the Oracle-Google disagreement has no real link to open source software or search. We all know the game the software giants play. These big firms compete to purchase as many patents as they can, and then the owners of the intellectual property use the legal system to block their competitors from moving forward. The big companies often try to shut down the competition. It is not our game and we wish to stay in a “full open” model.
What about Lucid and the other open source search “plays”? None has had the success of RedHat.
I cannot comment on other companies. With regard to the specifics you mention, I just have the information available in the media and blogs like yours.
I think some of the companies you mention will try to find a way to capitalize on the huge Solr market position and create value for their investors.
What’s your background and why the interest in search?
I have a long experience in software and Internet. When I met Emmanuel and was proposed to lead this project, I made some research to understand the market and opportunities. It has not been long for me to understand the huge potential and growing need of companies and developer of a performing, easy to use Open Source solution.
Also the origin of the project was very good when Infopro wished to find one search solution for its 15 subsidiaries with needs ranging from job boards, to professional paid directories or sophisticated database management needs. The company had 12 B2B publications, related Web sites, and other various digital activities. OpenSearchServer bring value to all of them.
The needs were so different that the team focused on the only common thing that was the open nature of Lucene and its algorithmic internals. I can say that it was Lucene’s possibilities which pulled us into in this field. We were fortunate to have very talented programmers who worked hard to put even more features and possibilities to it; In fact we have created some features in Lucene with the objective to have a better integration in our framework and ease of use. For example, the full faceting possibilities have been coded in this intent. Also we created the collapsing of the returned answers as it was not available at that time in Lucene.
A major module we added is an integrated crawler that offers a very easy way to bring data to the index, from urls, file systems or database. We give a full manual control to users who can plan and optimize the crawling : exclusion lists, scheduler, fine tune the max number of document to be crawled each minute and many administration functions.
Right now we are working on matching customer needs with solutions. We want to provide solutions customers can easily master with a short learning curve and easily demonstrable advantages and value not toward search players but toward databases.
What are the technical enhancements your team has developed?
One a major module our Enterprise customers appreciate is called Classifier. It brings a very innovative set of features for applications with automatic classifications, matching and that are very appreciated in many businesses. Offering this module helps us to bring a nice differentiation for customers. Also we offer log reporting tools and a SOAP Web service.
Without divulging your firm’s methods or clients, will you characterize a typical use case for your firm’s search and retrieval capabilities?
Our solution is a development platform to create any kind of index based applications and it offers all that a serious project requires (crawler, connectors, reporting, back office, Web services). So it is often the starting point of a growing number of developments that can be supported by OpenSearchServer.
But maybe the best thing people like in our solution is our fully openness and great ease to access the most sophisticated features to create advanced applications but also our flexibility in doing business and enterprise information technology needs experience.
- MiMTiD: a Texas based company offering music and movie Majors a system based on OpenSearchServer in order to identify over 100 Million instances of copyright infringements.
- AXA Investment management: One of the first asset management company in the world, AXA IM uses OpenSearchServer on its 55 worldwide websites and web applications. Some used for general public and other with a restricted access to company analyst.
- ETAI: A worldwide pioneer in car information, ETAI is an Infopro group company. A top player in European market, ETAI shifted from offline information to all digital. Today, several thousand car dealers and service centers use one of their subscription based car documentation applications, based in OpenSearchServer, for better serving owners of more than 41000 models.
What are the benefits to a commercial organization or a government agency when working with your firm?
That’s a good and key question.
The answer is to my previous comments. We focus on providing the best set of features with great performances for the best return on investment. Access to search technologies should be at fair cost: License cost or learning curve and programmers’ level required.
We believe open source search is now mature and we should tackle enterprise challenges more than focusing in any other point. Another important point is to adapt sales and marketing approach to new populations that are not full text, techno-centric.
How does an information retrieval engagement with your firm move through its life cycle?
I think OpenSearchServer projects have a life cycle similar to other enterprise software. The main areas we try to have an effect on is shortening learning curve to enter a project and allow corrections in a easy way without holding users from accessing their applications and search modules.
What does your firm provide to customers to help them deal with the big data?
Volume and big data are key drivers on our research and development because there are strong expectations for users and CIOs. And it is also the same for the database vendors like Oracle. Organizations have to respond to “big data” which is happening within companies, not just on the Web. The information in big data will allow organizations to make use of the data to make employees more productive.
What can your system acquire and index?
Clients want to index what’s on their servers and content accessible via the Internet.
Access to content from different sources and systems is a key point. I can tell you that we are very active in this direction. We are partnering with the Manifold Connectors Framework project to which we are a contributor and we already have some customers doing beta testing on an OpenSearchServer able to access data within FileNet (IBM), SharePoint (Microsoft), Meridio (Autonomy) or Open Text Livelink.
Also as I mentioned earlier, OpenSearchServer has a powerful crawler module allowing to index content from the web (internet and intranet), files systems and databases.
I am on the fence about the merging of retrieval within other applications. What’s your take on the “new” method which some people describe as “search enabled applications”? What’s your firm offer?
This new method is really a strong trend we believe. For many needs Search platforms bring a great complement to applications and open completely new sky to users.
Just think about Gmail. What is Gmail? It is not perfect but it is not just another Web mail. Gmail is a search enabled Web mail, completely relying on search technologies.
You can talk to many Gmail users they will tell you that they love it because they can forget where the e-mail is or when they received it. Gmail will always find it for them.
This is a good example of what users want and what CIOs want to do for their companies: Software that goes beyond users expectations and help them to focus on their job. Productivity improves with this approach.
There seems to be a popular perception that the world will be doing computing via iPad devices and mobile phones. My concern is that serious computing infrastructures are needed and that users are “cut off” from access to more robust systems? How does your firm see the computing world over the next 12 to 18 months? What products / services will you be focusing on to deliver on your next vision?
You’re right, all these consumer gadgets are really booming because they are providing users with a real freedom to access data from wherever they are to users and especially to newcomers in technology.
You know that last week, within the 24 first hours of iPhone 4S pre sales, Apple took over one million orders. People love the iPhone. What you and I know, is that those nice devices only exist because of the fabulous services made ready for them and that these services rely on strong infrastructures.
The biggest challenge for the cloud is to show it is able to face its growing success. What Amazon’s online outage showed us in April 2011 and what RIM experienced in its four day outage in October 2011. These issues make clear that cloud services art not easy. However, I think these cloud services will become stronger tomorrow because no one—neither companies nor consumers—want to be without the convenience of any time, anywhere access.
We are working to make our our solution ready and appealing on these distributed and highly-scalable architectures.
Put on your wizard hat. What are the three most significant technologies that you see affecting your search business? How will your company respond?
I believe the search platforms are going to reach a new level. No longer will the problem be to answer a query launched by a user. What is coming are services that provide users with a lot of more than a correct answer.
How good can be an answer when it is made of 12 000 links (in enterprise world a lot less answers than in a Google)? A key direction for me is how search systems will help navigate and narrow searches within answers; another direction is to have a collaborative view on the search. When a group applications helped work users be more productive. The search tools will learn from a query to better answer the next query and will adapt their “memory” to the user profile.
On another hand, big data experience and index size operations will be highly improved and will be more and more paired and synchronized with large database deployments.
As another direction I would say that lower price coming for SSD drives will bring a new paradigm in hosting showing new opportunities and challenges. That will allow innovative solutions better serve users and answer their needs.
And as a last but not least, we will need to adapt our platform to match expectations of today’s database developers who will be on search platforms tomorrow.
Where does a reader get more information about your firm?
The best place is our Web site. We always enjoy receiving calls to discuss all these issues and think together on new projects and challenges.
Open source search has an increasing appeal. After all, what chief financial officer will ignore a “free” software product. However, open source vendors have to walk a knife edge. If the “community” turns against an open source vendor, the affected company may have difficulties ranging from the mundane like hiring open source savvy developers to more esoteric problems such as open source license litigation.
OpenSearchServer is following the path taken by such successful open source search firms as Lemur Consulting (FLAX) and SearchBlox. Like some of the open source search firms which have ingested venture capital, OpenSearchServer has been raising a first round of venture funding in February 2011 and its objective is to become a clear European leader and one of top worldwide player.
If you organization is looking for an open source search solution, you will want to take a close look at what OpenSearchServer offers. Worth a look.
Stephen E. Arnold, October 31, 2011