Recommind: Following the Search Imperative
January 10, 2008
I opened my Yahoo alerts this morning, January 10, 2008, and read:
Recommind Predicts 2008 Enterprise Search and eDiscovery Trends: Search Becomes the Information Foundation of the … — Centre Daily Times Wed, 09 Jan 2008 5:32 AM PST
According to the enterprise search and eDiscovery technology experts at Recommind, 2008 will be the year that enterprise search and eDiscovery converge to become top areas of focus for enterprises worldwide, creating substantial growth and evolution in the management of electronic information.
The phrase “foundation of the electronic enterprise” struck me as meaningful and well-turned. Most search experts know Recommind by name only. I profiled the company in the third edition of The Enterprise Search Report, the last one that I wrote. I support the excellent fourth edition, but I did not do any of the updating for that version of the study. I’m confining my efforts to shorter, more specialized analyses.
The company once focused on the legal market. My take on the company’s technology was that it relied on Bayesian algorithms.
The Recommind product can deliver key word search. The company has a patented algorithm that implements “probabilistic latent semantic analysis.” I will discuss latent semantic indexing in “Beyond Search”. For our purpose, Recommind’s system identifies and analyzes the distribution in a document of concept-related words. The approach uses statistical methods to predict an item’s relevance. .
The Recommind implementation of these algorithms differentiate the company’s system from Autonomy’s. Autonomy, as you may know, is the high-profile proponent of “automatic” or “automated” text processing. The idea (and I am probably going to annoy the mathematicians who read this article) is that Bayesian algorithms can operate without human fiddling. The phrase “artificial intelligence” is often applied to a Bayesian system when it feeds information about processed content back into the content processing subsystem. The notion is that Bayesian systems can be implemented to adapt to the content flowing through the system. As the system processes content, the system recognizes new entities, concepts, and classifications. The phrase “set it and forget it” may be used to describe a system similar to Autonomy’s or Recommind’s. Keep in mind that each company will quickly refine my generalization. For my purposes, however, I’m not interested in the technology. I’m interested in the market orientation the news story makes clear.
Recommind is no longer a niche player in content processing. Recommind is cursoring the heartland of IBM, Microsoft, and Oracle: big business, the Fortune 1000, the companies that have money and will spend it on systems that enhance the firm’s revenue or control the firm’s costs. Recommind is an “enterprise content solutions vendor”
Some History
Lawyers are abstemious, far better at billing their clients than spending on information technology. Recommind offered a reasonably priced solution for what’s now called “eDiscovery.”
eDiscovery means collecting a set of documents, typically those obtained through the legal discovery process and processing them electronically. The processing part can have a number of steps, ranging from scanning, performing optical character recognition, and generating indexable files to performing relatively simple file transformation tasks. A simple transformation task is to take electronic mail and segment the message and save it, then save any attachment such as a PowerPoint presentation. Once a body of content obtained through the legal discovery process is available, that context is indexed.
Legal discovery means, and I am simplifying in this explanation, that each side in a legal matter must provide information to the opposing side. In complex matters, more than two law firms and usually more than two attorneys will be working on the matter. In the pre-digital age, discover involved keeping track of each discovered item manually, affixing an agreed upon identification number on the item, and making photocopies. The photocopies were — and still are in many legal proceedings — punched and placed in binders. The binders, even for relatively modest legal actions, can proliferate like gerbils. In a major legal action, the physical information can run to hundreds of thousands of documents.
eDiscovery, therefore, is the umbrella term for converting the text to electronic form, indexing it, and making that index available for those authorized to find and read those documents.
The key point about discovery is that it is not key word search. Discovery means that the system somehow finds out the important information in a document or collection of documents and makes that finding evident to a user. No key word query is needed. The user can read an email alert, click on a hot link that says, “The important information is here”, or displays a visual representation of what’s in a mass of content. Remember: discovery means no key word query, no reading of the document to find out what’s in it. Discovery is the most recent Holy Grail in information retrieval despite its long history in specialized applications like military intelligence.
Recommind found success in the eDiscovery market. The product was reasonably priced, particularly when compared to a brand name, high profile system such as those available from Autonomy, Endeca, Fast Search & Transfer, iPhrase (now a unit of IBM), and Stratify. Instead of six figures, think in terms of $30,000 to $50,000. For certain law firms, spending $50,000 to manipulate discovered materials electronically was preferable to spending $330,000.
The problem with the legal market is that litigation and legal matters come and go. For a vendor of eDiscovery tools, marketing costs chew away at margins. Only a small percentage of law firms maintain a capability to process case-related materials in a single system. The pattern is to gear up for a specific legal matter, process the content, and then turn off the system when the matter closes. Legal content related to a specific case is encumbered by necessary controls about access, handling of the information once the matter is resolved, and specific actions that must be taken with regard to the information obtained in eDiscovery; for example, expert witnesses must return or destroy certain information at the close of a matter.
The point is that eDiscovery systems are designed to make it possible for a law firm to comply with the stipulations placed on information obtained in the discovery process.
Approaches to eDiscovery
Stratify, now a unit of Iron Mountain, is one of the leaders in eDiscovery. Once called Purple Yogi and the darling of the American intelligence community, Stratify has narrowed its marketing to eDiscovery. The Stratify system performs automatic processes along with key word indexing of documents gathered via legal discovery. The system has been tuned for legal applications. Licensees receive knowledge bases with legal terms, a taxonomy, and an editorial interface so the licensing firm can add, delete, or modify the knowledge bases. Stratify is priced in a way that is similar to the approach taken by the Big Three (Autonomy, Endeca, and Fast Search & Transfer) in search; that is, fees in the hundreds of thousands of dollars are more common than $50,000 fees. Larger license fees are needed because the marketing costs are high, and the search vendors have to generate enough revenue to avoid plunging into financial shortfalls. Second, the higher fees make sense to large, cash rich organizations. Many companies want to pay more in order to get better service or the “best available” solution. Third, other factors may be operating such as the advice of a consultant or the recommendation of a law firm also working on the matter.
eDiscovery can also be performed using generalized and often lower-cost products. In the forthcoming “Beyond Search: What to Do When Your Search System Doesn’t Work”, I profile a number of companies offering software systems that can make discovered matter searchable. For most of these firms, the legal market is a sideline. Selling software to law firms requires specialized knowledge of legal proceedings, a sales person familiar with how law firms work, and marketing that reaches attorneys in a way that makes them comfortable. The legal market is a niche, and anyone can buy the names of lawyers from various sources, lawyers are not an easy market to penetrate.
Recommind, therefore, has shifted its marketing from the legal niche to the broader, more general market for Intranet search or what I call “behind the firewall” search. The term “enterprise search” is devalued, and I want to steer clear of giving you the impression that a single search systems can serve the many information access needs of a growing organization. More importantly, there’s a belief that “one size fits all” in search. That is a misconception. The reality is that an organization will have a need for many different types of information access systems. At some point in the future, there may be a single point solution, but for the foreseeable future, organizations will need separate, usually compartmentalized systems to avoid personnel, legal, and intellectual property problems. I will write more about this in “Beyond Search” and in this Web log.
Trajectory of Recommind
Recommind’s market trajectory is important. The company’s shift from a niche to a broader market segment illustrates how content processing companies must adapt to the revenue realities in selling search solutions. Recommind has moved into a market sector where a general purpose solution at a competitive price point should be easier to sell. Instead of the specialized sales person for the niche market, a sales person with more generalized experience can be hired. The small number of law firms is somewhat limited and has become saturated. The broader enterprise market consists of the Fortune 1000 and upwards of 15 million small- and mid-sized businesses. Most of these need and want a “better” search solution. Recommind’s expansion of its marketing into this broader arena makes sense, and it illustrates what many niche vendors often do to increase their revenues.
Here’s the formula and a diagram to illustrate this marketing shifting. Click on the thumbnail to view the illustration:
- Increase the number of prospects for a search system by moving to a larger market. Example: from lawyers to general business or intelligence community in Washington, DC to business intelligence in companies; or from pharmaceutical text mining to general business text mining.
- Simplify the installation, minimizing the need for specialized knowledge bases, tuning, and time-consuming set up. Example: offer a plug-and-play solution, emphasize speedy deployment, provide a default configuration that delivers advanced features without manual set up and time-consuming “training” of the system.
- Maintain a competitive price point because the “vendor will make it up on volume”. With more customers and shorter buying cycles, the vendor will have increased chances to land a large account that generates substantial fees when customization or special functionality are required.
- Boost the return on investment for research, development, sales, marketing, and customer support. The business school logic is inescapable to many search vendors. Note that these MBA (master of business administration) assumptions prove false is not my concern in this point. Search vendors can’t make their revenue goals in small niches and remain profitable, grow, and fund R&D. The search vendors have to find a way to grow and expand margins quickly. The broader business market is a solution that most content processing companies implement.
Implications of Market Shifts
Based on my research, several implications of moving upmarket, offering general purpose solutions, and expanding service options receive scant attention in the trade and business press. Let’s look at several. Keep in mind that my data and experience are unique. Your view may be different, and I welcome your view points. Let’s look at what I have learned:
First, smaller, specialized vendors have to move from a niche to a broader market. Examples range from the aforementioned Stratify, which moved from the U.S. intelligence niche to the broader business niche, only to narrow its focus in the broader business niche to handling special document collections. Iron Mountain saw value in this positioning and acquired Stratify. Vivisimo, which originally offered on-the-fly clustering, has repositioned itself as a vendor of “behind the firewall” search. The company’s core technology remains intact, but the firm has added functionality as it moves from a narrow “utility” vendor to a broader, “behind the firewall” vendor. Exegy, a vendor of special purpose, high-throughput processing technology, has moved from intelligence to financial services. This list can be expanded, but the point is clear. Search vendors have to move into broader markets in order to have a chance at making enough sales to generate the return investors demand. Stated another way, content processing vendors must find a way to expand their customer base or die.
Second, larger vendors — for example, the Autonomys, Endecas, and their ilk — must offer more and more services in an effort to penetrate more segments of the broader search market. Autonomy, in a sense, had to become a platform. Autonomy had to acquire Verity to get more upsell opportunities and more customers quickly. And the company had to diversify from search into other, adjacent information access and management services such as email management with its acquisition of Zantaz. The imperative to move into more markets and grow via acquisition is driving some of the industry consolidation now underway.
Third, established enterprise software vendors must move downmarket. IBM, Microsoft, and Oracle have to offer more information management, access, and processing services. A failure to take this step means that the smaller, more innovative companies moving from niches into broader business markets will challenge these firm’s grip on enterprise customers. Microsoft, therefore, had to counter the direct threat posed by Coveo, Exalead, ISYS, and Mondosoft (now SurfRay), among others.
Fourth, specialized vendors of text mining or business intelligence tools will find themselves subject to some gravitational forces. Inxight, the text analysis spin out of Xerox Palo Alto Research Center, was purchased by Business Objects. Business Objects was then acquired by SAP. After years of inattention, companies as diverse as Siderean Software (a semantic systems vendor with assisted navigation and dashboard functionality) to MarkLogic (an XML-on-steroids and data management vendor) will be sucked into new opportunities. Executives at both firms suggested to me that their products and services were of interest to superplatforms, search system vendors, and Fortune 1000 companies. I expect that both these companies will be themselves discovered as organizations look for “beyond search” solutions that work, mesh with existing systems, and eliminate if not significantly reduce the headaches associated with traditional information retrieval solutions.
I am reluctant to speculate on the competitive shifts that these market tectonics will bring in 2008. I am confident that the market for certain content processing companies is very bright indeed.
Back to Recommind
Recommind, therefore, is a good example of how a niche vendor of eDiscovery solutions can and must move into broader markets. Recommind is important, not because it offers a low-cost implementation of the Bayesian algorithms in the Autonomy system. Recommind warrants observation because it makes a useful case study of certain search sector market imperatives visible. As the diagram depicts, albeit somewhat awkwardly, is that each segment of the information retrieval market is in movement. Niche players must move upmarket and outwards. Superplatforms must move downmarket and into niches. Business intelligence system vendors must move into mainstream applications.
Exogenous Forces
The diagram omits two important exogenous forces. I will comment on these in another Web log article. For now, let me identify these two “storm systems” and offer several observations about search and content processing.
The first force is Lucene. This is the open source search solution that is poking its nose under a number of tents. IBM, for example, uses Lucene in some of its search offerings. A start up in Hungary called Tesuji offers Lucene plus engineering support services. Large information companies like Reed Elsevier continue to experiment with Lucene in an effort to shake free of burdensome licensing fees and restrictions imposed by established vendors. Lucene is not likely to go away, and with a total cost of ownership at a baseline of zero in licensing fees, some organizations will find the system warranting further investigation. More importantly, Lucene has been one of the factors turbo charging the “free search software” movement. The only way to counter certain chess moves is a symmetric action. Lucene, not Google or other vendors, is the motive force behind the proliferation of “free” search.
The second force is cloud computing. Google is often identified as the prime mover. It’s not. The notion of hosted search is an environmental factor. Granted, cloud based information retrieval solutions remain off the radar for most information technology professionals. Recall, however, that the core of hosted search is the commercial database industry. LexisNexis, Dialog, and Ebscohost are, in fact, hosted solutions for specialized content. Blossom Software, Exalead, Fast Search & Retrieval, and other content processing vendors offer off-premises or hosted solutions. The economics of information retrieval translate to steadily increasing interest in cloud based solutions. And when the time is right, Amazon, Google, Microsoft, and others will be offering hosted content processing solutions. In part it will be a response to what Dave Girouard, a Google executive calls, the “crisis in IT”. In part, it will be a response to economics. Few — very, very few — professionals understand the total cost of information retrieval. When the “number” becomes known, a market shift from on premises to cloud-based solutions will take place, probably with some velocity.
Wrap Up
Several observations are warranted:
First, Recommind is an interesting company to watch. It is, a microcosm of broader industry trends. The company’s management has understood the survival imperative and implemented a solution that becomes obvious in today’s market. Expand or stagnate.
Second, tectonic forces are at work that will reshape the information retrieval, content processing, and search market as it exists today. It’s not just consolidation; search and its cousins will become part of a larger data management fabric.
Third, there’s a great deal of money to be made as these forces grind through the more than 200 companies offering content processing solutions. Innovation, therefore, will continue to bubble up from U.S. research computing programs and outside the U.S. Tesuji is Hungary is just one example of dozens of innovative approaches to content processing.
Fourth, the larger battle is not yet underway. Many analysts see hand to hand combat between Google and Microsoft. I don’t. I think that for the next 18 to 24 months, battles will range within niches, among established search vendors, and among the established enterprise software vendors. Google is a study in “controlled chaos”. With this approach, Google is not likely to mount any single, direct attack on anything until the “controlled chaos” yields that data Google needs before deciding on a specific course of action.
Search is dead. At least the key word variety. Content processing is alive an well. The future is broader: data management and data spaces. As we rush forward, opportunities abound for licensees, programmers, entrepreneurs, and vendors. We are living in a transition from the Dark Ages of key word search to a more robust, more useful approach.
Stephen E. Arnold
10 January 2008
Little-Known Search Engines
January 9, 2008
Here’s a run down of little known engines with links to their web sites.
As I work to complete “Beyond Search: What to Do When Your Search Engine Doesn’t Work,” I reviewed my list of companies offering search technology. I could not remember much about several of them.
Here’s what triggered my checking to see what angle each of these companies takes, or in some cases, took towards search and retrieval.
- Aftervote — A metasearch engine with a “vote up” or “vote down” button for results.
- AskMeNow — A mobile search service that wanted my cell number. I didn’t test it. The splash page says AskMeNow.com is a “smart service”.
- C-Search Solutions — A search system for “your IBM Domino domain.” The company offers a connector to hook the Google Search Appliance to Domino content.
- Ceryle — A data management system that generates topics and associations.
- Craky.com — Site has gone dark when I tested it on January 8, 2008. It was a “search engine for impatient boomers”.
- Dumbfind — An amazing name. A social search system. Dumbfind describes itself as a “user generated content site.” A social search system, I believe.
- Exorbyte — A German high-performance search system. Lists eBay, Yahoo, and the ailing Convera as customers.
- Eyealike — A visual search engine. The splash page says “you can search for your dream date.” Alas, not me. Too old.
- Ezilon — not Ezillion which is an auction site. A Web directory and search engine.
- Idée Inc. — The company develops advanced image recognition and visual search software. Piximilar is the company’s image search system.
- Kosmix — An “intelligent search engine”. The system appears to mimic some of the functions of Google’s universal search system.
- Linguistic Agents — The company’s search technology bridges “language and technology”
- Paglo Inc. — This is a “search engine for information technology on an Intranet. The system discovers “everything on your network”.
- Q Phrase — The company offers “discovery tools”.
- Semantra — The sysetm allow syou to have “an intelligent conversation with your enterprise databases.”
- Sphinx — Sphinx is a full text search engine for database content.
- Surf Canyon — In beta. The system shows related information when you hover over a hit in a results list.
- Syngence — A content analytics company, Syngence focuses on “e-discovery”.
- Viziant — The company is “a pioneer in delivering tools for discovery.”
- Xerox Fact Spotter — Text mining tools developed at Xerox “surpass search”. The description of the system seems similar to the Inxight system that’s now part of Business Objects which is now owned by SAP.
Several observations are warranted. First, I am having a difficult time keeping up with many of these companies’ systems. Second, text mining and other rich text processing solutions are notable. Semantics, linguistics, and other techniques to squeeze meaning from information are hard-to-miss trends. The implication is that key word search is slipping out of the spotlight. Finally, investors are putting up cash to fund a very wide range of search-and-retrieval operations. Even though consolidation is underway in the search sector, there’s a steady flow of new and often hard-to-pronounce vendors chasing revenue.
Stephen E. Arnold
9 January 2008, 11:00am
Thoughts on Microsoft Buying Fast Search & Transfer
January 8, 2008
To start the New Year, Microsoft bought Fast Search & Transfer for about $1.2 billion, a premium over Fast’s share price before the stock was delisted from the Oslo exchange on January 7.
I’ve tracked Fast for more than seven years, including a stint performing an independent verification and validation of the firm’s technology for the U.S. Federal government. Some good background links:
- Useful overview of Fast’s history
- Another good overview
- Summary of some of the issues facing the company
Most of the coverage of the acquisition focuses on the general view that Microsoft will integrate Fast’s search technology into SharePoint. With upwards of 65 million installations of SharePoint, Microsoft’s content management and search platform, Fast’s technology looks like a slam dunk for Microsoft.
I want to look at three aspects of this deal that may be sidelights to the general news coverage. The thread running through new stories appearing early January 8, 2008, hit three points. One, Microsoft gets enterprise search technology that can add some muscle to the present search technology available in Microsoft Office SharePoint Server (MOSS). Second, shareholders get a big payday, including the institutional shareholders hit hard by Fast’s unpredictable financial results and flatlined share price. Third, synergies in research, technology, and customers make the deal a win for Microsoft and Fast.
Now, let’s look at the sidelights. I think that one or two of these issues will become more important if the deal closes in the second quarter of 2008 and the Fast technology is embraced by Microsoft’s various product groups. None of these issues is intended to be positive or negative. My goal is to discuss “behind the firewall search” or what the trade press calls “enterprise search”. This is distinct from Web search which indexes content on publicly-accessible Web servers in most cases. The “behind the firewall” type of search indexes content on a company’s own servers and its employees computers. The idea is that the “behind the firewall search” tackles the wide range of information and file types found in an organization. To illustrate: an organization must index standard file types like Word documents and Adobe Portable Document Format files. But the system must be able to handle information stored in enterprise applications built on SAP technology or with IBM’s technology. There’s another twist to “behind the firewall search”. That’s security. Certain information cannot be available to anyone but a select and carefully vetted group of users. One example is employee salary information. Another is research data for a new product. Finally, “behind the firewall search” has to be able to generate useful results when there aren’t indicators like the number of times a document is clicked on or viewed. As you may know, Google’s Web search system uses these cues to determine relevancy. In an organization, a very important piece of information may have zero or very low accesses. In a patent matter, a “behind the firewall search” system must be able to pinpoint that particular piece of information because it may be the difference between a successful legal resolution and a costly misstep.
Web Roots
Fast Search & Transfer’s technology has deep roots in Web indexing. Fast pulled out of Web indexing for the most part in 2003. In 2003, Fast sold its Web search division to Overture, subsequently acquired by Yahoo. With its focus on enterprise search, Fast’s engineers crafted enterprise functions on the high-speed, Linux-based indexing system that powers AlltheWeb.com. Fast’s Web roots have been wrapped in three types of extensions. First, Fast wrote new code to make integration with other enterprise systems easier. Second, Fast used some open source software as a way to perform certain tasks such as data management. Third, Fast acquired technology such as the 2004 acquisition of Nextpage Publishing Applications business Unit from Nextpage and a number of other properties, including the Convera RetrievalWare business. Convera was a “behind the firewall” search vendor that had fallen into the quagmire that sucks cash in an attempt to make search systems work the way licensees want. The point is that today’s Fast search system is complicated. There are quite a few subsystems “glued” to other components. It’s the nature of information to make today’s solution a smaller piece of what customers want. Over time, “behind the firewall search” systems become hugely complex. The figure below, taken from a 2005 Fast Search presentation once available via the Google cache, provides a good indication of what makes a Fast system tick. Click on the thumbnail to view it at normal size:
Staffing
Fast Search has some outstanding engineers. Not only is John Lervik (CEO) a Google-caliber technologist, Bjorn Laukli (chief technical officer at one time) is a search wizard. Fast Search’s management team has turned to sales and marketing professionals. One of these individuals — Ali Riaz, now CEO of Attivio, Inc. — burnished the Fast Search image and fueled sales. In the wake of Mr. Riaz’s departure, Fast Search had to trim some costs. More than 140 employees were terminated and at the same time in 2006 and 2007, Fast Search expanded its technical hiring. The company handled the shift from pure technology in the pre-Riaz era to a sales-driven organization when Mr. Riaz was at the helm from 2000 to 2006, and then back to a more engineering focus in the post-Riaz era. Not surprisingly, institutional investor pressure increased. The Fast Search Board of Directors looked for ways to get the company on an equal revenue and earnings footing with arch-rival Autonomy plc. Arguably, Autonomy’s acquisitions (Verity in search and Zantaz in email mail compliance services) have been more beneficial to Autonomy’s revenue growth than Fast Search’s acquisitions such as Platefood in advertising and Agent Arts, a content recommending system. In short, there’s been some contention between sales and engineering, institutional investors and the board of directors, and the board of directors and senior management. Joseph Krivickas’ joining the firm as President and Chief Operating Officer in July 2007 marked a turning point for Fast Search, culminating in the Microsoft deal.
Customers
My Washington, DC affiliate (BurkeHarrod LLC) involved me in a study of satisfaction with “behind the firewall search” systems in the last half of 2007. The data revealed that in our sample of US scientists and engineers, 62 percent of the respondents to the statistically-valid survey were dissatisfied with their existing “behind the firewall search” systems. My examination of the publicly-available customers of Autonomy, Endeca, and Fast Search revealed an overlap of about 50 percent among Fortune 1000 firms. The significant overlap is not surprising because large organizations have units with different search requirements. Incumbent systems are not eliminated, creating a situation where the typical large organization has five or more “behind the firewall search” systems up and running. Autonomy’s acquisition of Verity and Fast Search’s acquisition of Convera was about customers. Granted each acquired company brought new technical capabilities to their respective buyers. The real asset was the customer base. I learned when researching the first three editions of The Enterprise Search Report that customers are usually in “search procurement mode”. No single system is right for the information access requirements of a large organization.
New Direction?
One final issue warrants a brief comment. In the last five years, there’s been a shift in information access methods. In the early 2000s, key word search was the basic way to find information in an organization. Today users want their information retrieval systems to suggest where to look, offer point-and-click interfaces somewhat similar to Yahoo’s so a user can see at a glance what’s available, and systems that make it easy to pinpoint certain types of information needed to perform routine work tasks. Key word search systems have to bulk up with additional technology to deliver these types of information retrieval functions. The challenge, not surprisingly, is cost. With ever cheaper processors and storage, performing additional indexing and content processing tasks seems trivial. Rich text processing or metatagging adds complexity to already sophisticated systems. The market wants features that can be expensive and problematic to implement. Perhaps this is why investors are keen to fund next-generation search systems that go beyond key word search into linguistic, semantic, and intelligent systems. For the company that can deliver the right mix of functionality at the right price a financial windfall awaits. In the meantime, there’s the general dissatisfaction and churn that is evident in the present consolidation in the search sector.
Beyond Search Net-Net
These sidelights may be outside the mainstream of those tracking the information access industry. My view may be summarized in four observations:
- First, Microsoft SharePoint is complex. The Fast Search enterprise search platform (ESP) is complex. Integrating two complex systems will be a challenge. Microsoft’s engineers and Fast Search’s engineers are up to this task. The question will be “How long will the meshing take?” If speedy, Microsoft can expand its service offering and put another hurdle in the path of companies like Google eager to win more of the Microsoft market. If slow, the delay will allow further incursions into Microsoft territory by Google as well as IBM, Oracle, and SAP, among others.
- Second, customers may be wary of escalating risk. Just as Autonomy had to reassure Verity search system users after that buy out, Microsoft will have to keep Fast Search’s more than 2,000 customers in the fold. The loss of some key accounts as a result of the deal will consume additional sales and marketing resources, thus adding to the cost of the acquisition. Companies like Autonomy and Endeca will be quick to make an attempt to win some of Fast Search’s more lucrative accounts such as its deal with Reed Elsevier for the SCIRUS service. Upstarts like Exalead, ISYS Search Software, Siderean, and others will also seek to provide a seamless replacement for the Fast Search solution. Other customers will be content to use an existing Fast Search system, worrying about changes when they occur. The search sector is about to get much more interesting and fast, pun intended.
- Third, investors react to the news of $1.2 billion changing hands in predictable ways. I look for more interest in companies in the search sector. I can also envision the acquisition of Autonomy by a larger firm. In fact, looking forward 12 months, I see a series of shifts in the search landscape. There will be more search interest by the superplatforms such as Google, IBM, Oracle, and other enterprise software vendors. These large firms will want to expand their share of the Fortune 1000 market and capture an increasing share of the small- and mid-sized market. Upstarts ranging from Paris-based Exalead to the almost-unknown Tesuji in Hungary. My list of “behind the firewall search” vendors numbers more than 50 companies, excluding specialist firms that offer specialized “snap ins” for content processing.
- Lastly, I think further consolidation in search will take place in 2008 and 2009. In the midst of these buy outs, customers will vote with their dollars to create some new winners in “behind the firewall” search. I will offer some thoughts on these in a future write up.
2008: Best or Worst of Times?
January 8, 2008
For search system vendors, it’s a Dickens’ “best of times, worst of times” business climate. Some companies like Autonomy have diversified, acquired competitors, and marketed effectively. Others have rushed from crisis to crisis, smothering bad news with excuses.
There are some up and comers in the “behind the firewall” search market. Companies to watch include the surging ISYS Search Software. It’s reliable. It’s speedy. And it sports some “must have” bells and whistles, including entity extraction and on-the-fly classification. Also worth watching is the semantic technology vendor Siderean Software. For companies wanting assisted navigation and slicing and dicing semantic metatags permit, Siderean’s system is worth a long, hard look. There are dozens of others making customers happy and reducing the hassles associated with finding information in an Intranet.
There are some companies struggling to keep their revenue in growth mode and leaping over rivulets of red ink. In 2007, Mondosoft, a Danish search system, floundered. It’s now part of the burgeoning SurfRay technology holdings. Entopia (Belmont, California) died quietly. Fast Search & Transfer survived some financial challenges and then with little warning withdrew from the Norwegian stock exchange. Is this a positive signal or a more ominous one.
The point is that many traditional search-and-retrieval vendors look one way and see the success of a Google. A look in another direction, there are warning signs that the “behind the firewall” sector is ripe for consolidation or an increasingly stringent shake out.
The best strategy for 2008 is to look for companies that can deliver a solution that works without a huge balloon payment for technical support and customization. A second tip is to look outside the US. ISYS has its technical roots in Australia. Exalead is a Paris-based company. Little-known Bitext operates from Madrid.
Procurement teams have a tendency to use what’s available. IBM, Microsoft, and Oracle offer search systems, often as a bonus when another enterprise product is licensed. Lucene beckons because some believe it’s free–as long as the licensee has open source savvy engineers on tap. Many enterprise systems such as content management systems include a search-and-retrieval system. When budgets are tight, the CFO asks, “Why pay again?”
My recommendation is to look at the up-and-comers in “behind the firewall” search. The brand names are safe, but you might be able to save money, time, and technical headaches by widening your horizons.
Google 2008 Publishing Output
January 1, 2008
If you had any doubt about Google’s publishing activities, check out “Google Blogging in 2008”. The article by Susan Straccia here provides a run down of the GOOG’s self publishing output. Google has more than 120 Web logs. The article crows about the number of unique visitors and tosses in some Googley references to Google fun. Pushing the baloney aside, the message is clear: Google has an effective, global publishing operation focused exclusively on promoting Google. Toss in the Google Channel on YouTube.com, and the GOOG has a communication, promotion, distribution mechanism that few of its rivals can match. In my opinion, not even a major TV network in the US can reach as many eyeballs as quickly and cheaply as Googzilla. Competitors have to find a way to match this promotional 30mm automatic Boeing M230 chain gun.
Stephen Arnold, January 1, 2009

