Exclusive Interview with Ana Athayde of Spotter
January 19, 2010
Search solutions have the attention of some executives who want actionable information, not laundry lists of results. I learned about an information retrieval company that I knew nothing about from Ana Athayde. Ms. Athayde developed Spotter as a consequence of her work in business intelligence for a large international organization. She told me, “Laundry lists are not often helpful to a business person.” I agree.
Spotter is what I would describe as a next-generation content processing company. The firm’s technology combines content acquisition, content processing, and output generation in a form tailored to a business professional. Spotter’s chief technology officer (Olivier Massiot) previously worked at the pioneering content processing company, Datops SA.
In an exclusive interview on January 18, Ana Athayde, the founder of Spotter (based in Paris with offices several European cities and the US), provides insight into her vision for next-generation information retrieval. She described the approach her firm takes for customers with an information problem this way:
Our clients ask for strategic input on a brand or market; they require more than a general alert and subject monitoring as provided by the services of popular search engines. Spotter clients expect to know more about their customers and what motivates them, learn about their company’s reputation, and about the current risk pervasive in their environment; not simply obtain an internet search-result report. Our clients need deep dive analysis for decision-making, not just a simple dashboard tool and quantitative graphic displays. They want to be able to interpret what it all means and not just receive a simple data-dump. Spotter provides content analysis and leading edge solutions that meet our customers’ analytical needs such as the ability to map and analyze information pertinent to their business environment, so as to gain a strategic business advantage and make new discoveries. Our solutions solve complex problems and deploy these results throughout the enterprise in a form that makes the information easy to use.
A number of companies are providing knowledge management and business intelligence services that output reports. I asked Ms. Athayde, “What’s the Spotter difference?” She said:
I think the key point we try to make clear is our “bundle”; that is, we deliver a solution, not a collection of puzzle pieces. Our ability to capture, monitor and analyze decisions and their impact requires rich, higher order meta data constructs. Many companies such as Autonomy, Microsoft, and Oracle also promise similar services. But once this has been done, the process of information toward decision is not complete. The main competitive advantage of Spotter is to be able to provide to its clients a full decision-making solution which includes, as I mentioned, analytics and our decision management system… Our solution is engineered to link efficiency and quality control throughout the content processing “chain.”
You can read the full interview with Ms. Athayde on the ArnoldIT.com’s Search Wizards Speak features. For more information about Spotter, visit the firm’s Web site at www.spotter.com. Search Wizards Speaks provides one of the most comprehensive set of interviews with search and content processing vendors available. There are now more than 44 full text interviews. The information in these interviews provides a different slant than the third party “translators” who attempt to “interpret” how a search system works and “explain” a particular vendor’s positioning or approach. The Search Wizards Speak series is a free service from ArnoldIT.com lets you read the full text of key players in the search and content processing sector. Primary source material is the first place to look if you want facts, not fluff.
Stephen E Arnold, January 19, 2010
Full disclosure. Spotter’s sales manager tried to give me a mouse pad. I refused. As a result, no one paid me anything to chase down Ms. Athayde, interview her, and go through the hoops needed to understand the Spotter system. Because the Spotter team seemed quite Euro-centric, I will report my sad state of non compensated work to the US Department of State. An organization sensitive to the needs, wants, and desires of non US people and entities.
A Possible Trajectory for Open Source Search
January 18, 2010
I read “How Hadoop Startup Cloudera Is Evolving” in the Industry Standard. Let me define a couple of terms to help me get my point across. Hadoop is an open source version of next generation technology for data intensive distributed applications. Hadoop has some hooks into some Google technology for data management wizardry. You can get some additional information in “Map Reduce Programming with Apache Hadoop”. A Cloudera employee (Doug Cutting) is pegged as the person who created Hadoop. With Hadoop as a top level Apache project, variants have emerged. Yahoo, for example, has its own Hadoop distribution. (No, I don’t know why. That’s what makes Yahoo Yahoo I suppose.) Hadoop is a big deal and both Google and IBM want to build Hadoop awareness among budding data wizards.
Now back to the Industry Standard article. For me the key point was that “Cloudera
has also quietly released a proprietary data integration app. It “doesn’t replace an Informatica or Ab Initio,” says Cloudera CEO Mike Olson, but it does provide extract and transform features. The data integration app will be formally released this quarter as part of the overarching Cloudera Dta Platform. No price has been determined yet, said Olson.
The trajectory strikes me that open source provides a technical and marketing angle. The revenue generating part of the approach is proprietary technology and services. Cloudera’s investors rightly want their money back and a payoff from their $11.0 million in funding to Cloudera. Open source is shifting from “open” to a more highly spiced approach.
Now what’s this have to do with open source search? In my view, the Cloudera approach is another example of using open source as a marketing hook that dangles lower cost, community support, and higher performance in front of very hungry chief financial officers. The Cloudera approach makes clear that the on premises console hooks into the Cloudera cloud service creating a very practical approach to issues of control and on premises security methods.
Will this highly spiced approach reduces the total cost of ownership of an open source search solution? The answer is a “maybe”. Here’s why:
- License fees are not an issue at the outset; additional software, if required, may require license fees
- Maintenance can be zero if you have the expertise to manage the system. If you don’t, you will have to pay for maintenance service
- Customization can be a do-it-yourself job. If you cannot do it yourself, you will need to pay for that service
- Integration can be embedded in other costs, but if extensive customization is required, that cost will surface.
In short, the open source angle is more of a marketing play than a significant difference from commercial software. When a proprietary software vendor jumps into the open source game, there is a marketing and revenue reason that overshadows the “do good” reasons.
Life is good with open source solutions if you are knowledgeable. If you aren’t, I am not confident that the total costs will be significant when I look out across staff turnover, the need for special functions, and a change driven by a change elsewhere in the organization’s technical infrastructure.
The Cloudera trajectory strikes me as one to monitor. Will the mission control folks say “Mission accomplished” or “Houston, we have a problem”?
Stephen E Arnold, January 18, 2010
No one paid me to write this opinion. Do I report this to the FCC, an outfit that seeks opinions?
SAP Does an About Face on Some Fees
January 16, 2010
I think SAP provides the same type of information a long range scout does. The scout goes out, checks out the territory, and then comes back and reports. The scout’s bosses listen to what the scout says and then decides what to do. The whole set up of an organization mode with scouts and bosses “in the rear with the gear” is fascinating to me. Vendors of complex, time consuming, resource intensive old fashioned software are trying to change. With revenue slipping and the economic crisis lingering, SAP provides useful information. Read “Price Hike Ditched as SAP’s Enterprise Support Makes a Comeback” and you can catch a glimpse of what other enterprise software vendors may be pondering; namely, a price cut and cooing to make peace with customers. Losing a customer is an expensive proposition because it is a one two hit. The company loses the revenue and then has to spend dough to find more customers to take up the slack. In my opinion, the most interesting comment in the write up was:
This move shows that SAP is listening to user groups, and therefore its customers, taking on board our feedback and making changes to meet the needs of all SAP users,” he said in a statement.
My thought is that this is a placeholder move. SAP will be under more price pressure, not less. Once the company blinks, the customers know that SAP must cave. Lower prices mean bad news for enterprise software vendors and resellers. Lower prices may force cloud alternatives to lower their rates. In short, am I seeing a phase change in enterprise software? Yep.
Stephen E Arnold, January 16, 2010
No one paid me to write this. I may fly over the SAP headquarters next week, but I don’t think I will visit. Do I report this to the FAA or to the DOT?
ConceptSearching and Its Busy January 2010
January 14, 2010
Concept Searching (“Retrieval Just Got Smarter”) has had a busy January 2010.
The company made several announcements about its information retrieval software.
First the company inked a deal with Union Square Software to use the Concept Searching technology in Union Square’s Workspace product. The Union Square Workspace is an email, document, and knowledge management product for the construction industry.
Second, the company announced support for Microsoft Windows Server R2’s File Classification Infrastructure. Like other Microsoft centric solutions, Concept Searching provides a snap in that extends the features of the Microsoft product.
Third, the company landed a deal with the Consumer Products Safety Commission to deliver search and classification to the CPSC’s public Web site and for the corporate Intranet.
The company was founded in 2002 with the goal of developing statistical search and classification products that “delivered critical functionality… unavailable in the marketplace.” The company’s software processes text, identifies concepts, and allows unstructured information to be classified via semantic metadata. The company supports SharePoint and other platforms. The company says:
Concept Searching are the only company to offer a full range of statistical information retrieval products based on Compound Term Processing. Our unique technology automatically identifies the word patterns in unstructured text that convey the most meaning and our products use these higher order terms to improve Precision with no loss of Recall. The algorithms adapt to each customer’s content and they work in any language regardless of vocabulary or linguistic style.
The company’s headquarters is in the UK, and the firm’s marketing operations are in McLean, Virginia. If you want more information, you can download a 13 megabyte video from K2 Underground.
Stephen E. Arnold, January 14, 2010
Oyez, oyez. A freebie. I shall report this public service to Securities House next time I am in London.
Microsoft SharePoint and Word Template Files
January 14, 2010
Short honk: The Beyond Search goslings spotted a document on the Microsoft Support site. Its title is “The Microsoft Office SharePoint Server Enterprise Search service does not full-text index Office Word 2007 template files (.dotx) in Office SharePoint Server 2007.” No big deal but with the hassle over XML, I found the fact interesting.
Stephen E Arnold, January 13, 2010
Another freebie. I think I have to report my not being paid for a SharePoint post to the Bureau of Labor Statistics. Is that right? I think the outfit shares office space with the US Postal Museum.
Enterprise Search Deployment Time
January 14, 2010
Our Overflight service snagged a news item in May 2009. The title was “Airbus Licenses Vivisimo Velocity Search Platform”. The release was good news for Vivisimo and straight forward, saying:
Vivisimo (Vivisimo.com), a leader in enterprise search, has entered into a major agreement with aircraft manufacturer Airbus for the license of the Vivisimo Velocity Search Platform. The license covers the corporate-wide intranet for Airbus and some extranet services for Airbus customers, indexing up to two petabytes of data for more than 50,000 users. Vivisimo had already provided search for a group within Airbus before winning the company’s broader corporate business in a competitive setting. In a solution proof of concept, Vivisimo Velocity demonstrated its capability to handle the complexity of Airbus’ many data repositories while respecting the company’s various security parameters.
When I read this, I thought that Airbus made a wise decision. A deployment and an evaluation process was used. That’s smart. Most organizations license an engine and then plunge ahead.
The news item I received in my email this morning was equally clear. “Airbus Lifts Off Vivisimo Velocity to Provide More than 50,000 Users the Power of Search” states:
Vivisimo (Vivisimo.com), a leader in enterprise search, today announced the successful installation of its award-winning Vivisimo Velocity Search Platform with the world’s leading aircraft manufacturer Airbus. Through this deployment, Velocity is powering search across its corporate-wide intranet and its customers, indexing up to two petabytes of data for more than 50,000 users.
After a quote the news release said:
In less than one month since the completed installation of Velocity, search has become the fastest growing application on the customer portal (AirbusWorld) homepage in terms of usage, which has resulted in increased page views.
I think the uptake information is good news for Airbus users and for Vivisimo. The other upside of my having these two statements is that it is possible to calculate roughly the time required for a prudent organization to move from decision to deploy to actual availability of the search service. The deal was signed in May 2009, and the system went online about January 2010. That means that after the trial period, another six months was required to deploy the system.
Several observations:
- Appliance vendors have indicated that their solution requires less time. One vendor pegs the deployment time in a matter of days. Another suggested a month for a complicated installation.
- The SaaS search vendors have demonstrated a deployment time of less than four hours for one test we ran for a governmental unit. Other vendors have indicated times in the days to two week periods, depending on the complexity of the installation. The all time speed champ is Blossom.com, which we used for the Threat Open Source Information Gateway project.
- System centric vendors with solutions that snap into SharePoint, for example, have indicated an installation time of a half day to as much as a week, depending on the specific SharePoint environment.
- Tool kit vendors typically require weeks or months to deploy an enterprise search system. However, in certain situations like a search system for a major publishing company’s online service, the time extended beyond six months.
What’s this mean? Vivisimo’s installation time is on a par with other high profile systems’ deployment times. The reason is that the different components must be integrated with the clients’ systems. In addition, certain types of customization—not always possible with appliances or SaaS solutions—are like any other software set up. Tweaking takes time.
With Google’s emphasis on speed, the Google Search Appliance is positioning itself to be a quicker install that some of the high profile enterprise systems.
What’s this mean? It looks to me that one group of vendors and services can deliver speedier installations. Other vendors offset speed with other search requirements. Beyond that obvious statement, I will have to think about the cost implications of deployment time.
Stephen E. Arnold, January 14, 2010
No one paid me to write this short article. Why would anyone pay me? It’s been 65 years of financial deprivation. I think I have to report this monetary fact to the Social Security folks.
End to End Move from Knowledge Tree and Capsys
January 14, 2010
I continue to think about the information retrieval food chain. I read in CBR Online’s “Knowledge Tree Integrates Document Management Software with Capsys Capture” that two companies have hooked their systems together. The story references a module,which I interpreted as a connector or code shim that permits the handshaking of the systems. In addition, the module:
provides users access to Capture, a browser-based, thin client application that requires no workstation software to be installed or maintained. The software also includes a workflow and design component that allows users to manage their capture processes, without requiring programming or scripting, based on the document requirements specific to their business…
The documents from Capsys, after “characterizing and indexing” are moved to Knowledge Tree’s repository.
My thought is that customers do not want to buy a document scanning and an optical character recognition product and then hook those components into an existing system. The move makes sense, but notice that the search and retrieval function are essentially subsystem or utility services, not the center stage performer.
As end to end solutions find their way into organizations, several questions came to mind:
- How will organizations search their various content objects? In effect, with search working within an application, will an organization end up with even more Balkanization of information?
- What are the cost implications of the licensing organization having multiple search “stubs” or search systems? Won’t inclusion of search in different vendors’ systems create more search systems for employees to learn and more duplication of content, more systems management work, and more costs?
- If search is embedded and a commodity, won’t the logical endpoint be more pressure placed upon organizations to license one super system in order to make the hassles, time, and costs manageable?
I will keep thinking about the food chain and its integration in terms of search and content processing. In the meantime, this tie up looks like a solid tactical move to me.
Stephen E Arnold, January 14, 2010
A freebie. I will report this to the Bureau of Prisons which is in charge of capturing my disclosures about compensation.
PolySpot Lands Crédit Agricole SA
January 13, 2010
PolySpot, a French systems development company, has landed the Economic Research Department of Crédit Agricole SA as a customer. The system will be used with the financial institutions bilingual Intranet portal. The story I saw appeared in Communauté Finance Opérationnelle. The PolySpot system will provide:
- Access to structured and unstructured data
- Theme suggestions
- Simple and advanced search options
- Programmable Custom Alerts
- Sort options
- Faceted navigation (grouping results by different criteria)
- Access rights management
- Stored query support.
You can get more information about PolySpot’s search and content processing system at www.polyspot.com/. You can read an interview with a PolySpot executive in the ArnoldIT.com Overflight service.
Stephen E Arnold, January 13, 2010
Nope, an unpaid post. When I am in Paris, I hide out in the flea market at Porte de Clignancourt. I will report this to the CRS shortly.
Search Vendors Working the Content Food Chain
January 13, 2010
In the last six months, I have noticed that three companies are making an effort to respond to ZyLAB’s success in the end-to-end content processing sector. There has been some uninformed and misleading discussion of search and content processing companies shift to vertical market solutions. I think this view distorts what some vendors are doing; namely, when one company finds a way to make sales, the other vendors pile into the Volkswagen. This is not so much “imitation as flattery”. What is happening is that sales are tough to make. When a company finds an angle, the stampede is on. In a short period of time, an underserved sector in search and content processing has more people stomping around than Lady Gaga.
Let’s go back in history, a subject that most of the poobahs, azure chip consultants, and self appointed experts avoid. The idea that certain actions have surfaced before is no fun. Identifying a “new” trend is easier, particularly when the trend spotter’s “history” extends to his / her last Google query.
The Mobius strip is non-orientable, just like search solutions that provide end-to-end solutions. A path on a Mobius strip can be twice as long as the original strip of paper. That’s a good way for me to think about end-to-end search and content processing systems. Costs follow a similar trajectory as well.
In the dim mists of time, one of the first outfits to offer and end-to-end solution to content acquisitions, indexing, and search was—believe it or not—Excalibur. The first demonstration I received of the Excalibur RetrievalWare technology included scanning, conversion of the scanned image’s text to ASCII, indexing of the ASCII for an image, and search. The information processed in that demonstration was a competitor’s marketing collateral. There were online search systems, but these were mostly small scale systems due to the brutal costs of indexing large domains of HTML. A number of companies were pushing forward with the idea of integrated scanning systems. Sure, in the 1990s you could buy a high end scanner and software. But in order to build a system that minimized the fiddly human touch, you had to build the missing components yourself. Excalibur hooked up with resellers of high end scanners from companies like Bell+Howell, Fujitsu, and others. The notion of taking a scanned image and then via an in memory processing performing optical character recognition of the page image and then indexing that ASCII was a relatively new method. UMI (a unit of Bell+Howell) had a sophisticated production process to do this work. Big outfits like Thomson were interested in this type of process because lots of information in the early 1990s was still in hard copy form. To make a long story short, the Excalibur engineers were among the first to create commercial product that mostly worked, well, sort of. The indexing was an issue. Excalibur embarked on a journey that required enhancing the RetrievalWare product, generating ready-to-use controlled vocabularies for specific business sectors like defense and banking. As you may know, Excalibur’s original vision did not work so the company mrophed into a search and content processing company with a focus on business intelligence. The firm renamed itself as Convera. The origins of the company were mostly ignored as the Convera package of services chased government work, commercial accounts like Intel and the National Basketball Association (data center SaaS functions for the former and video searching for the hoopsters). When those changes did not work out too well, Convera refocused to become a for fee version of the free Google custom search engine. That did not work out too well either, and the company has be semi-dissolved.
Why’s this important?
First, the history shows that end-to-end processing is not new. Like much of the hot search innovations, I find the discoveries of the azure chip crowd a “been there, done that” experience. Processing paper and making it searchable is a basic way to approach certain persistent problems.
Second, the synopsis of the Excalibur trajectory makes clear that senior managers of search and content processing companies scramble, following well worn paths. The constant repositioning and restating of what a technology allegedly does is a characteristic of search and content processing.
Third, the shifts and jolts in the path of the Excalibur / Convera entity are predictable. The template is:
- Start with a problem
- Integrate
- Sell
- Engineer fixes on the fly
- Fail
- Identify a new problem
- Rinse, repeat.
What has popped out of my Overflight intel system is that law firms are now looking for a solution to a persistent information problem; that is, when a legal matter fires up, most search systems work just fine with content in electronic form. The hitch is that a great deal of paper is produced. If something exists in digital form and one law firm must provide that information to another law firm, some law firms convert the digital information to paper, slap on a code, and have FedEx deliver boxes of paper. The law firm receiving this paper no longer has the luxury of paying minions to grind through the paper. The new spin on the problem is that the law firm’s information technology people want to buy a hardware-software combination that allows a box of paper to be put in one end and the magic between the hard copy and the searchable, electronic instance of the documents are magically completed.
Well, that’s the idea. Some of the arabesques that vendors slap on this quite difficult problem include:
- Audit records so a law firm knows who looked at what when and for how long
- A billing method. Law firms want to do invoices, of course
- A single point solution so there is “one throat to choke”.
What the companies want is what Excalibur asserted it had almost 20 years ago.
ZyLAB, under the firm hand of Johann Scholtes (a former Dutch naval officer), has made inroads in this market sector. You can read an interview with him in the Search Wizards Speak series, so I won’t recycle that information in this write up.
Autonomy was quick to move to build out its end-to-end solutions for law firms and other clients with a paper and digital content problem. In fact, Autonomy just received an award for its end-to-end eDiscovery platform.
Brainware offers a similar system. That company, a couple of years ago, told me that it had to add staff to handle the demand for its scanning and search solution. Among the firm’s largest customers were law firms and, not surprisingly, the Federal government. You can read an interview with a Brainware executive (who is an attorney) in the Search Wizards Speak series.
I learned that Recommind has inked a deal with Daeja Image Systems for its various document processing software components. The idea is to be able to provide an end-to-end solution to law firms, government agencies, and other outfits that need a system that provides access to paper based content and digital content.
Let’s step back.
What this addled goose sees in these recent announcements is that the “new” is little more than a rediscovery that law firms have not yet cracked the back of the paper to digital job and been able to get a search system that provides access to the source material. Sure, there were solutions 20 years ago, but those solutions don’t meet a continuing need. Notice that this problem has been around for a long time, and I don’t think the present crop of solutions will solve the problem fully.
Search Merging with CMS
January 13, 2010
When you have a CMS “hammer”, you have the opportunity to see an information problem as something that can be pounded with CMS. Let me be upfront. Most organizations are not in the information business. The idea that Big O’s tires in Kentucky is an information company is not just silly; it’s a financially imprudent assertion. Big O’s is a retail operation that sells tires and services. The company’s Web site is a marketing is a marketing effort, but when you need tires for your Hummer with a gun mount, you have to haul on over to the closest Big O’s, pony up cash, and get your tires mounted, balanced, and bolted on. Sure, information is important to the Big O operation, but like many other businesses, Big O’s moves tires. Information is an enabler, sort of a digital lubricant. A person dressed up in a Daniel Boone outfit holding a sign that says, “50% off Tires. Today only.” is information. But the pointy end of the business is selling tires.
Just hop right into the CMS tanning bed. It will make you look and feel great. Oh, there may be some risks, but what’s more important? Looking great or becoming a human Blutwurst.
When I read CMS Wire’s short article “MySource Matrix” I was surprised that search is becoming part of CMS. Yikes. CMS, content management systems, refers to a bunch of software components that perform integrated content operations for Web sites. There are document management systems that help nuclear power plants keep track of engineering change orders. And there are really expensive enterprise publishing systems from Hewlett Packard and StreamServe that manage and output certain types of enterprise information. I grant that when you can’t find a document, you can’t do much with any of these systems. So, search is a utility. Search in any of these three types of content systems often is not particularly good. Vendors license “stubs” stick them in CMS and related systems so when more features are needed, the vendors can turn on the taxi meter. Software cannot put an editorial sense into an organization. Humans have to do that, and humans often are not able to perceive the problem or its optimal solution when basking in the vendor’s tanning salon.
Here’s the passage from Squiz that caught my attention:
They’ve [Squiz, Funnelback, and MySource Matrix] chosen this direction because they see the lines between CMS and search blurring, where some projects may need search-based vertical applications rather than starting with a separate CMS and search library. According to Morgan [Squiz executive], this approach will reduce integration costs and increase access to data across an organization.
Note: Squiz owns the Funnelback search system. You can see this in action on the Australian Resource Centre for Healthcare Innovation or ARCHI.
Most CMS, DMS, and enterprise publishing systems are complicated beasties, and each has a contribution to make to certain organizations, the path to a functioning, easy to maintain content system can be a long, difficult one. In my experience, CMS means managing a Web site. CMS has been stretched into DMS territory, and some of the vendors with the biggest marketing horn have floundered and ended up chum for the M&A crowd. The document management systems that focus on a specific content purpose like the aforementioned ECOs work well, but one needs to have an records management specialist handy. The enterprise publishing systems are not widely known outside of certain market sectors. These cost a lot of money and suffer from one fatal flaw in my opinion. Most lack an information infrastructure service or foundation. No foundation, the structure built on it is dicey.
This notion of having everything in one place so anyone can edit, repurpose, and search is a great idea. Today, the cost of achieving that utopia can be high, both in time and money.
I can see the direction this marketing angle will lead. Thank goodness I am old and won’t have to deal with the wackiness these big marketing ideas unleash on cash strapped organizations struggling to keep their systems from breaking the bank each time those systems crash. There’s a lot of opportunity in content, but fuzzy thinking may not be what Boards of Directors and CFOs want.
Stephen E Arnold, January 13, 2010
I want to disclose to the Office of Management and Budget that I was not paid to point out the financial issues of fuzzy thinking. I bet this article was a surprise to them. Don’t Federal content and document managements systems work like spinning tops?

