An Interview with Luc Manigot
The French have been leaders in technology for centuries. If you are like me, the names of Henri Poincaré, Pierre de Fermat, and René Descartes were synonymous with the analytical thinking which underpins much of my work in information retrieval and content analytics.
France has been at the forefront of information technology for decades. Today, a number of innovative companies are breaking through the barriers that information access have placed in front of busy professionals.
Sinequa, based in Paris, continues to capture clients in Europe and is thriving in an economic climate where other companies have been forced to close their doors or seek a buyer.
I have tracked Sinequa's technology and business activities for many years. In the course of my monitoring of the firm, I have learned that the company’s mission is to empower users - both employees and customers - with real-time, intuitive, business-focused access to information. Unlike some firms, Sinequa taps information in a range of file types, sources, and systems.
The company applies sophisticated analytic techniques to provide what the company calls “unified information access.” The payoff for licensees is smoother business operations and decision making. Instead of emphasizing content processing methods, Sinequa’s approach puts the focus on solving specific business problems in customer support or competitive analysis. The Sinequa technology enables greater productivity and permits more effective collaboration.
According to the company:
Hundreds of thousands of people in more than 250 organizations rely on Sinequa's intuitive tools to create search-based applications and integrate intelligent enterprise search functionality into enterprise- and industry-specific applications. Our solutions are used across the business in banking, consulting, consumer products, government, media, telecommunications, manufacturing and retail. See http://goo.gl/jteTl
On February 25, 2013, I spoke with Sinequa’s chief operating officer Luc Manigot. The full text of our conversation appears below.
Thanks for taking the time to talk with me. I have a positive view of French technology because we studied the work of Poincaré and Fermat in my math classes.
Yes, the educational system in France is very strong, particularly in math and engineering. We benefit from it via several directors and employees. In fact, the strength of Sinequa’s technology rests on a solid technical base.
Yes, but before we talk about the system, what's the history of Sinequa?
Let me give you an analogy. Like a Hollywood motion picture trilogy, the history of Sinequa can be told through a succession of three stories.
The first one is about a French private laboratory, founded during the 1980s, and maturing a set of deep skills in the fields of natural language processing. They included text generation, an automatic summarizer, a grammatical checker, a translation component, etc.
The second part of the story is what I call the Web expansion period, in the early 2000s, when Sinequa decided to implement a semantic search engine for French newspapers, e-commerce Web sites, and several other corporate solution applications.
By this time, Sinequa had mastered some quite sophisticated linguistic technologies and developed innovative methods to make information access more effective. At this time, our engineers focused on developing a robust engine drawing on those multi-lingual capabilities.
The current stage is our development of a next generation semantic search engine. Since 2006, we have been one of the key players in the enterprise search market. As you know, in 2006 Autonomy, Fast Search & Transfer, and IBM were aggressively seeking to expand their customer base. We knew that we had a more effective system, and since 2006, we have been providing search systems which help an organization work more efficiently via information access built on Sinequa’s platform.
So we just decided to develop a better solution than theirs.
What does the Sinequa system provide to a licensee?
Good question. Today we do think that our solution offers a real advantage over others, and customers stand to gain by replacing other solutions by ours. We have a growing business in replacing legacy search systems from other companies. Siemens, for example, has found that Sinequa provides its employees with information access, not headaches.
Can you elaborate on this information access idea?
Our positioning today implies that Unified Information Access to the full complement of structured and unstructured enterprise data is our ultimate goal. But we can do more! Today Sinequa is able to provide Big Data solutions when an organization wants to extract value from billions of documents and data records. But of course, we also deliver what you call “classical” enterprise search solutions.”
We cover a very large span of markets and their content needs. The most interesting thing is the broad range of usages our solutions are used for. For example, what we call the 360 degree view of a customer, of a product or a technology. Sinequa is an extended business intelligence solution. With our system it is possible to eliminate some existing enterprise systems. Sinequa makes it possible to identify experts within an organization. Our licensees can optimize marketing campaigns. Mobile telephony operators can minimize customer churn and turnover with our system.
And what about the linguistic technology?
Our engineers have worked on linguistic methods for many, many years. We have a long history which makes our linguistic technology a key differentiator today.
Do you use open source technology?
We monitor open source software. However, much of our work is based on proprietary systems and methods which are our secret sauce. We have done extensive testing and we know that our system’s performance pivots on our optimized algorithms.
We control our own linguistic, semantic, text mining, and connectivity technologies. As a result, we can integrate our high-value methods very deeply into the various levels of a software stack.
And the payoff?
Not surprisingly, our system is very much faster than other aggregated technologies. Our customers have measured a factor of up to eight. This means that a single server does the work which requires eight machines running a competitor’s software. In addition to processing speed, the hardware footprint delivers a significant cost savings to our licensees.
When did you become interested in text and content processing?
I came to text processing technology in 1996 after studying mathematics at the university. Two years later, I participated in the development of a patented method to define the overall meaning of a text with the aid of semantic dictionaries within an 800-dimension vector space model that underpins Sinequa. For me, it was very exciting at that point to imagine all the potential usages of such an idea and now to see the technology solve customer problems effectively.
Is your method in the DNA of Sinequa?
Yes to a certain extent. And I have continued to contribute, particularly to our linguistic functions. We have many brilliant engineers at Sinequa. Of course, the DNA of Sinequa consists of many important elements, and the multi-language approach has been one of the keystone’s of our approach.
One other point: I understand the term DNA also to acknowledge that a few fortuitous factors have found their way into our genome, amongst them a research project for the French Ministry of Defense, several PhD theses, and, of course, the steady succession of brilliant ideas from the Sinequa technical team.
Sinequa has been investing in search, content processing, and text analysis for several years. What are the general areas of your research activities?
It is well known that full linguistic indexing and search pipelines in many languages are keys for a search engine that is to provide relevant results to its users today. Business is global and content in many languages is very important to decision making.
In addition, we have focused on named entity extractions and other text mining capabilities. These are, in our view, complementary tools to produce structure in unstructured or semi-structured data silos.
I want to emphasize that Sinequa’s research continues every day. Our methods are under constant development because of the complexity of natural languages.
Once you dig as deeply into content analytics as we do, an amazing and virtually infinite range of challenges become visible to our engineers. We are working in the field of military intelligence. We are focusing on methods to find experts (beyond the self-declared experts in social networks) by analyzing an unstructured document corpus. We are researching the detection of chemical components in patents and technical surveys. Our customers are bringing us problems which require research and innovative thinking to solve. Our system’s architecture helps our licensees identify new challenges which neither they nor we previously considered.
Many vendors argue that mash ups and data fusion are "the way information retrieval will work going forward.” How does Sinequa perceive this consumerization of search?
To my mind, mash-ups are just one way to present data, possibly from different sources. And before you can “fuse” data and mash it up, you have to do the hard work of digging through large amounts of heterogeneous data to find the tidbits that are relevant to a particular user.
Mashing up raw data from different sources is no use to anyone. To fuse data or knowledge that represents the same real-world objects, you need the kind of strong analytics we provide to detect what belongs to the same semantic category. Then a system like ours can “fuse” results with other data, like geographic position or customer history, and the like.
And that is what our system does. But the whole process of getting relevant data and a complete view to a user needs deep analytic capacities across heterogeneous sources. No Web wizardry will give you that.
In my view, consumerization led us to provide graphically constrained layouts for smartphones and tablets. These devices have stimulated our engineers to find ways to give our licensees access to relevant data from anywhere. But we continue to serve professional users within enterprises and administrations. Their expectations are influenced by their private use of information technology, Web and social networks. But we are not providing tools to interact with their personal friends.
Without divulging your firm's methods or clients, will you characterize a typical use case for a client who licenses the Sinequa system?
We target large enterprises and administrations. When these organizations look for search, business intelligence systems, or unified information access technology, they suffer from a loss of time to find relevant information at various points in business processes. There are situations where the lack of information access leads to slow responses to customers, to requests for proposals, price quotes, or technical information. Many organizations point to problems in addressing customer satisfaction or pinpointing causes for customer turnover or churn. A common problem is the cost associated with traditional information processing systems. The CFO is exasperated because costs continue to go up because the existing systems are works in progress, not reliable solutions. Other organizations want to eliminate redundant information systems. A large number of organizations need ways to find which employees or contractors have a particular area of expertise or a supplier with a specific capability.
How do clients determine that Sinequa delivers what the prospect needs?
Usually we get selected after a formal procurement process. I think you have called this “a bake off” or “head to head comparison.”
It is not uncommon that we get invited to provide a demonstration and price quotation, even if we are not always high on the client’s list of potential vendors. We urge these organizations to look at the traditional solutions like those available from Hewlett Packard, IBM, or Microsoft. In most direct comparisons, we often win the account because of higher relevancy of search results (due to superior content analytics), much faster implementation of a proof of concept.
We have more than 120 connectors and a streamlined installation process. As a result, the company comparing systems can see firsthand how Sinequa deploys and performs. The prospect can push content into our system and see how the system performs and can be scaled.
Our system is orders of magnitude faster in set up and content and query processing. Performance boosts of 3X or more are at the low end of the scale. Some clients have reported performance improvement of 7X and higher.
Is Sinequa delivering a “one size fits all” solution like the Google Search Appliance?
For the last couple of years, we have not seen “typical use cases” anymore. Our customers develop ever more ingenious use cases. Nevertheless, a few are recurring; for example, the need to get the 360 degree view of a given topic such as a customer, the expert search requirement, and the need to do effective technology searches in order to avoid expensive redevelopments of technology already available to a company.
Our system makes it easy to deploy a basic solution or build an “InfoApp” or search based application on our platform. The application integrates into the customer’s business processes.
The good thing about this profusion of usages is that customers keep extending the use of our platform. Hence our business with the installed base is growing, making forecasts more reliable. The Google Search Appliance approach is not exactly our approach, and we rarely compete head-on. We help solve our licensee’s problems. We don’t force them to fit into a “search microwave” with a single control knob and a start button.
What are the benefits to a commercial organization or a government agency when working with your firm?
That’s a good question.
Agility. R&D responsiveness, a motivated team, one of the lowest total costs of ownership available today. We also have independence from any big software vendor for whom search and unified information access may only be secondary on its business agenda.
I would also say that at Sinequa, our team works to integrate new functionality, developed because our customers have signaled a new function is required. We develop and maintain our catalogue of smart connectors, essentially driven by our customers’ needs, while most of our competitors that are part of large companies tend to favor connectors to their company’s applications and data sources.
In addition, our strong integration of the whole software stack and product packaging offer a very low total cost of ownership. A solid TCO is almost impossible to achieve with open source solutions.
Why? Open source software is positioned as a cost saving option?
In our experience, open source software usually requires scarce and quite costly resources to adapt, implement, roll out, troubleshoot, and maintain an enterprise solution.
How does an information retrieval engagement with your firm move through its life cycle?
If we look back to any large company having chosen the Sinequa solution, one point which is clear is that the customer has remained satisfied. I think the main reason is that we don’t consider the engineering support we provide as an annoyance or just troubleshooting. We have made customer satisfaction the fuel for our research. Sinequa’s technical roadmap is based on what the customer needs.
This is what I meant when I said Sinequa has agility and R&D responsiveness. Of course, I’m not saying that every deployment needs a set of additional developments on our part, but we consider that in a quickly changing world, it is important to keep a close relation with customers and their new projects.
This used to be called “customer intimacy” before the term got a bad taste when it turned out that in most companies it had no grounding in reality. It does for us. The challenge is to keep growing without losing the benefits of this approach.
One challenge to those involved with squeezing useful elements from large volumes of content is the volume of content AND the rate of change in existing content objects. What does your firm provide to customers to help them deal with the Big Data challenge?
Sinequa ES provides a highly scalable grid architecture, making it possible to parallelize and load balance the collecting, indexing, storing and searching stages of the process. Once more, even though people may find it strange that a relatively small company like Sinequa claims that its engine is orders of magnitude faster than that of many well-known systems from much larger companies, our customers have verified our claim and found it correct. So we suggest people to try and check by themselves in a proof of concept competition.
Enterprise search is a particularly tough problem for many organizations. Where does Sinequa set itself apart?
Right. First, let’s talk about volume and variability in a classical enterprise search project.
Document sources and other data silos will be crawled and re-crawled according to the required level of freshness of data. It is a misconception to believe that real-time indexing of huge volumes of data is mandatory in all cases.
And even if the search engine is capable to do so, the cost of making available additional bandwidth on existing data sources may be higher than the benefits. In general, it’s important to set up the incremental indexing frequency as a good compromise between infrastructure constraints, business needs, and user expectations. In optimal cases, Sinequa is able to index millions of documents per day and per server without any latency between the indexing stage and data findability in a search.
Next, real time indexing is also possible by push methods through various application programming interfaces for any critical case. For example, take an enterprise workflow where indexing is needed at a given step, and a document indexed at this step must be retrievable less than five seconds later, we can do this.
User satisfaction with a system is based on matching work task requirements with information. Sinequa permits this type of tuning.
Another challenge, particularly in professional intelligence operations, is moving data from point A to point B; that is, information enters a system but it must be made available to an individual who needs that information or at least must know about the information. What does your firm offer licensees to address this issue of content "push", report generation, and personalization within a work flow?
This is just a particular case of setting up various user profiles according to their roles. We have many projects for government agencies like the French military, directories, and news organizations. Our system uses workflows to let knowledge managers filter and categorize the document flow first, before making it available to other basic users. We provide a large set of features to design workflows; for example, public saved queries, public baskets, e-mail alerting or RSS feeds, folksonomy tagging of the indexed documents, and personalized content streams.
What are the value-added elements your system applies to content and data?
We categorize and structure data and add automatically created metadata plus any useful facet from any available metadata (date of the documents, formats, authors, named entities, extracted textual concepts, etc.)
Additional capabilities like “automatic classifiers” (by post-processing the indexed corpus and mapping it into a given ontology) or “text mining agents” (extracting business phrases or relations from document contents) are very useful ways to drill down into huge volumes of data without having any vision of the underlying storage organization.
As already mentioned, Sinequa has a catalogue of more than 120 indexing connectors to various repositories, including management systems but also standard flat storage repositories.
Visualization has been a great addition to briefings. On the other hand, visualization and other graphic eye candy can be a problem to those in stressful operational situations. What's your firm's approach to presenting "outputs" for end user reuse or for mobile access?
For now, Sinequa ES 8.5 provides dashboard capabilities, for “light BI” purposes and user-friendly layouts. But the question of usability is real. We are actually working on different customer projects which demonstrate how Sinequa can replace a Business Objects or a Cognos-type solution.
We are not really competing regularly against dedicated business intelligence solutions, but it’s a very interesting approach to consider how a search engine, coming along with multi-dimensional inner representation and structuring of indexes combined with a graphical layout can take a BI solution to the next level. Sinequa can provide a much simpler way to deliver real-time and flexible data analysis to its users without having to assemble or code complex business intelligence queries. With the Sinequa solution, a customer can very easily set up a graphical dashboard and let the user refresh it as frequently as necessary.
By the way, interfaces for users in “stressful” situations will usually be designed and implemented for each such use case.
I am on the fence about the merging of retrieval within other applications. What's your
take on the "new" method which some people describe as "search enabled applications"?
Integrating enterprise search properly with business applications can in many cases bring added value, agility and other benefits. There is nothing new about this. Confusion comes from the terms that have been used to describe this activity.
What do you mean?
There are timid and radical approaches. SBAs or search based applications are not really new or search based. At Sinequa, we take a more advanced view. We focus on search enabled applications or search enhanced applications. Today, people use the more general term of “InfoApps”.
The really interesting thing is how much new functionality Search adds to existing work environments.
Can you give me an example?
Sure. We are currently working for Crédit Agricole, a large bank here in France, and one of the main French telecommunication providers. In both of these engagements the customer is designing a fully operational “new workplace” based on intensive usage of Sinequa indexes.
In this type of use case, a significant amount of the enterprise data may well be offloaded from mainframes. This type of shifts makes the economic equation particularly interesting. Industrial companies like Siemens or large service providers like Atos also tend to add the benefit of the Sinequa search engine to various applications. User interaction with the system becomes very different from the common search interface.
We make the development of InfoApps easy for almost any programmer. Our set of APIs support such methods and languages as SOAP, REST, JSON, Java, Dot Net, and php, among others.
There seems to be a popular perception that the world will be doing computing via
iPad devices and mobile phones. How does your firm see the computing world over the next 12 to 18 months?
We think that embedding applications in mobile devices is certainly a good approach. However, these devices are not our primary focus, just part of the modern user’s ecosystem.
We do think that an efficient search infrastructure will necessarily be housed in professional data centers (even if most of them may be hosted on private clouds quite soon). Access may well be via portable devices, but no serious amount of data will be stored on these devices, if only for security reasons. At Sinequa we provide a secure system that allows the user’s access to be fast and responsive on whatever device he is using to access an InfoApp.
Put on your wizard hat. What are the three most significant technologies that you see affecting your search business?
Let me come at the question this way.
First, I think that cloud based multi-node infrastructures are important. We are currently deploying a set of Sinequa nodes on Microsoft’s Azure cloud solution for one of our customers. We are probably just at the beginning of such a deployment model.
Second, the Big Data storage and distributed computation technologies are attracting considerable attention in many different organizations. We are looking at a range of smart combinations between search and Big Data technologies.
Third, the new generation of content analytics is going to continue to have an impact. Combining natural language processing and extended business intelligence is an exciting area. But the field is quite new, and there is a great deal of experimentation still going on. We are working on some exciting linguistic solutions, but I don’t want to go into detail at this time.
Where does a reader get more information about your firm?
Through the Sinequa site. There are different options for contacting my colleagues and me. We will promptly respond to questions and documentation requests. We can also arrange for demonstrations of our system.
Interest in enterprise information access continues to change and increase. Sinequa’s approach is refreshing because the company focuses on the business problem the licensee of the Sinequa system needs to solve. The firm’s approach “snaps in” to existing systems which the client is using. Massive reengineering is not required to acquire, process, extract, and make actionable the data and information the client wants to analyze.
Also, Sinequa’s approach can add a new dimension to mundane tasks such as locating information about a client’s order history. For customer support, the Sinequa system makes it quick and easy for a person to locate information pertinent to a specific issue. Business decision making becomes looking at on-point information directly relevant to a particular issue or task.
If your organization is seeking a solution to information challenges, we strongly suggest that you investigate the Sinequa system.
Stephen E Arnold, February 25, 2013