Beyond Search: A New Look, More Search Information

March 20, 2008

We’ve introduced a new look for Beyond Search: The Web log. Several people — including Blossom Software’s wizard, Alan Feuer — told me that it was impossible to find posts. The “earthworm” format is now officially gone. It’s been eaten by the squawking goose logo, a three-column format, and more hot links to the Beyond Search essays.

Keep in mind that Beyond Search is both a Web log and the title of my new study to be published by Frank Gilbane. The Web log, however, has been a surprise to me. We have readers from Australia to Norway and China. Not a day passes without an email, a comment, or a telephone call sparked by the information presented in Beyond Search: The Web Log.

Here’s a run down of the changes:

My essays will appear in the left – hand column (Column A, if you are familiar with Gutenberg – style publishing lingo). These will be the interviews, profiles, and opinion pieces characteristic of the postings of old.
The center column will contain — what I said I would never include — news. I am fortunate to have an anonymous professional assisting me with these stories. I think it will take me a month or so to sort out the “real” news from the “faux” news. If you have a story, please, send me an email (seaky2000 at yahoo.com). Maybe you can submit an article, and I will pay a modest amount for your work. I want to go slowly with inputs from the news pro on the East Coast plus over-the-transom ideas as well
The right – hand column will feature a search box, Google advertisements, and hot links to the various types of information in the Web log.

You may notice some white space at the foot of each column. I’m trying to figure out what widgets to include. Expect some fiddling around over the next two or three months.

Written by Stephen E. Arnold · Filed Under Feature | 1 Comment

Civita: The Paradox of Disintermediation

March 19, 2008

In December 2007, Antonio Maccanico, director, Associazione Civita in Rome, Italy, asked me to contribute an essay to a forthcoming publication focused on new information and communications technology. The full essay will not appear in print until later in 2008, but I wanted to highlight several of the points in my essay because each is germane to the tumultuous search and content processing sector. When the Italian language publication becomes available, I will post a link to the full text of my essay “Open Access and Other New Communication Technology Plays: The Temptation and Paradox of Disintermediation Roulette”.

First, the title. One of the issues that arises when a new search or content processing technology becomes available is its usefulness. Few vendors assert that their newest system brings numerous benefits to a licensee, user, or business partner. A positive, optimistic outlook is one of the essentials of mental health. However, I’ve learned to be conservative when it comes to benefits. This essay of Associazione Civita reminds the reader that many new information technologies are powerful disintermediators.

Disintermediation means cutting out the middle man or woman. If it is possible to buy something cheaper direct from manufacturer, many people will. The savings can be a few pennies or orders of magnitude. Information technology disintermediates. In my experience, this is a categorical affirmative. The benefit of information technology — particularly search and content processing — is that it creates new opportunities. We are in the midst of a information discontinuity. Publishers — classic intermediaries between authors and readers — are learning about disintermediation as I keyboard this summary. Libraries continue to struggle with disintermediation as student rely on Google, not reference books for research. The paradox, then, is that dislocation is inevitable. So far, the information revolution has created more opportunities overall. Users are “winners”. Some entrepreneurs are “winners”. Some traditional operations are trying to adapt lest they become “losers”.

Second, the core of my argument in this essay for Associazione Civita boils down to three issues. Let’s look at each briefly. Please, appreciate that I am extracting a segment from a 20 – page essay:

Web sites, Web services, and Web applications do not guarantee success. In fact, inexperience or bad decisions about what to “Web – ify” can drag an organization down, and, in terms of revenue, plunge the operation into the red. Therefore, significant effort is required to create a browser experience that attracts users and continues to build usage. The costs of development, enhancements, and sales are often far greater than expected. In terms of search and content processing, customers learn (often the hard way) that there is neither money nor appetite for making the system perform as advertised. I see no change in this paradoxical situation. The more you want to do with content, the farther behind you fall.
Information on its own won’t ensure success. Users are now savvy when it comes to access, interface, ease of use, and clarity. I learned yesterday about a new search system that uses the Apple iPhone “flipping page” metaphor to display search results. A list of relevant results in the view of the venture firm pumping millions into this start up is that interface, not relevance, is as important as clever algorithms. I never thought I would say this, but, “I agree”. A flawed user experience can doom a superior search and content processing system within 30 seconds of a user’s accessing the service.
Assumptions have to be verified with facts. Echoing in my mind is a catch phrase from someone in either President Reagan’s or President Clinton’s administration. The catch phrase is, “Trust but verify”. One of the twists in the information world is that the snazzier the demonstration, the greater the gullibility factor. A “gullibility factor” is a person’s willingness to accept the demo as reality. Assumptions about what search and content processing can do contribute to most information retrieval project failures. We stop at “trust” and leap frog over “verify”.

What happens when a system works well? What takes place when an entrepreneur “invents” a better mouse trap? What takes place when senior management uses a system and gets useful results quickly and without an engineer standing by to “help out the suit”?

Disintermediation. When systems work, the likelihood of staff reductions or business process modification goes up. The idea is that software can “reduce headcount”.

This issue is particularly sensitive for libraries, museums, academic institutions, and certain citizen – facing services. The more effective a system is, the easier it is to justify marginalizing certain institutions, people, and manual work processes. As we pursue ever more potent search and content processing, keep in mind that the imperative of disintermediation follows closely behind.

Stephen Arnold, March 19, 2008

Written by Stephen E. Arnold · Filed Under Enterprise, Library automation, Online (general), Search | 1 Comment

ISYS Search Software: A Case Study about Patent Analysis

March 18, 2008

One of the questions I’m asked is, “What tools do you use to analyze Google’s patent applications, patents, and engineering documents?” The answer is that I have a work horse tool and a number of proprietary systems and methods. On March 7, 2008, the AIIM conference organizers gave me an opportunity to explain to about 40 attendees my techniques.

Before beginning my talk, I polled the audience for their interest in text analysis. Most of those in the audience were engaged in or responsible for eDiscovery. This buzz word means “taking a collection of documents and using software to determine [a] what the contents of the document collection are and [b] identifying important people, places, things, and events in documents,” eDiscovery needs more than key word, Boolean, and free text search. The person engaged in eDiscovery does not know what’s in the collection, so crafting a useful key word query is difficult. In the days before rich text processing tools, eDiscovery meant sitting down with a three-ring binder of hard copies of emails, depositions, and other printed material. The lucky researcher began reading, flagging important segments with paper clips or color bits of paper. The work was time consuming, tedious, and iterative. Humans — specifically, this human — have to cycle through printed materials in an effort to “connect the dots” and identify the substantive information.

You can see a version of the PowerPoint deck used on March 6, 2008, here. The key points in the presentation were:

Convert the source documents into a machine-manipulable form. This is a content transformation procedure. We use some commercial products but use custom scripts to handle most of our transformation work. The reason is that patent applications and patents are complicated and very inconsistent. Commercial products such as those available from open source or third – party vendors are not easily customized.
Collections are processed using ISYS Search Software. This system — which we have been using for more than five years — generates an index of the documents in the collection, identifies entities (names of people, for example), and provides a number of useful access points by category. We typically copy claims or sections of the abstract and run additional queries in order to pinpoint similar inventions. In the case of Google’s disclosures, this iterative process is particularly important. Google routinely incorporates other patent applications in patent applications. Chasing down these references is difficult without ISYS’s functionality. Keep in mind that other vendors’ systems may work as well, but I have standardized in order to minimize the surprises that certain text processing systems spring on me. ISYS has proven to be reliable, fast, and without unexpected “gotchas”. You can learn more about this system here.
Specific documents of interest are then reviewed by a human who creates an electronic “note card” attached to the digital representation of the source document and to its Portable Document Format instance if available. Having one-click access to patent applications and patents in PDF form is essential. The drawings, diagrams, figures, and equations in the text of a document must be consulted during the human analysis process.

The PowerPoint deck is here. What software do you use to analyze patent applications, patents, and engineering documents? Let me know.

Stephen Arnold, March 18, 2008

Written by Stephen E. Arnold · Filed Under Conferences, Enterprise, Search | 3 Comments

SAS Buys Teragram Corporation

March 17, 2008

SAS Institute Inc. (Cary, North Carolina) announced today that it had acquired Teragram “to strengthen industry-leading text mining, analyticss”. Teragram, founded in 1997 by two technology wizards from Luxembourg. I’m working to chase down more details of the deal. SAS (a tight-lipped, privately-owned company best known for its industrial-strength analytics) seems like an ideal owner of the low-profile, privately-held firm with offices in Cambridge, Massachusetts.

Among the capabilities SAS highlighted in its announcement today are Teragram’s functionality; specifically:

Natural language processing
Automatic categorization
Enterprise search
Mobile search.

When Inxight Software was gobbled up by Business Objects (another analytics outfit), I had a hunch SAS would rethink its use of the Inxight tools. SAS was in a difficult position because a competitor or semi-competitor was in a position to make life somewhat uncomfortable. Then SAP, the German software giant, bought Business Objects. SAS had to take action in order to increase its degrees of text analytics freedom. With Teragram, SAS has options and some interesting technology.

Look for a summary of Teragram’s technology. In Beyond Search, I decided not to include this company. Rumors about a change at Teragram surfaced earlier this year. I have learned that rewriting studies to adapt to the acquisitions and business failures is not much fun.

If you want a jump start on Teragram’s customers, click here. To check out Teragram’s explanation of its rules-based approach to content processing, click here. I will cover this particular aspect of Teragram’s technology in another essay.

More buy outs are looming. With the deepening financial morass in the US, I also believe some of the weaker search and content processing firms are going to turn off their lights. The cost of developing, innovating, and maintaining text processing technology is far greater than most people know.

SPSS — a direct competitor — acquired LexiQuest Inc., a linguistics-based text mining system developer. SPSS, therefore, took control of its text mining and analytics fate with this 2002 purchase. Licensing technology yields some significant benefits. When a technology provider goes belly up or is purchased by a competitor, the happy face can morph into a frown quickly.

Stay tuned. More Teragram information will appear in a day or two.

Stephen Arnold, March 17, 2008

Written by Stephen E. Arnold · Filed Under Enterprise, News, Search, Text processing | Comments Off on SAS Buys Teragram Corporation

OmniFind: IBM’s Search Work Horse

March 17, 2008

This weekend (March 15 and 16, 2008), I engaged in a telephone call and a quick email exchange about IBM’s “behind the firewall” search strategy. I did include IBM in my new study Beyond Search for The Gilbane Group, but I narrowed my remarks to the search work horse, IBM OmniFind. According to my sources, OmniFind makes use of Lucene, an open source search system. I wrote about Lucene in this Web log in January 2008.

In the last week, a barrage of information appeared in my RSS newsreaders. The headlines were compelling because IBM relies on skilled word smiths; for example, “IBM Upgrades Enterprise Search Software“. Catchy.

What’s New?

What are the enhancements to OmniFind Enterprise Edtion 8.5? Let’s take a quick look at what’s new:

An improved user interface. IBM has introduced a “dashboard” called Top Results Analysis. The dashboard provides a single point for Lotus Notes / Domino information. The dashboard supports Japanese, Korean, and Chinese double byte encoding.
Beefier Connectors to Lotus Notes / Domino and Lotus Quickr* service so OmniFind users have access to IBM’s behind-the-firewall collaboration system which IBM seems to be suggesting is now a “social networking function”. *[Quickr is a newish team collaboration component.]
Enhanced support for the FileNet P8 content management system.
Support for ProAct, a business intelligence and text mining application that supports the IBM “standard” known as UIMA (Unstructured Informatoin Management Analysis). A simplified explanation of UIMA is that UIMA lays out rules so it is easier for IBM partners, customers, and third-party developers to “hook into” the IBM WebSphere and OmniFind frameworks.

My calls to IBM went unanswered. No surprise there. I dug through my open source files and came up with some information that may presage what the “dashboard” interface will permit. IBM has technology described as the “WebSphere Portlet Factory”. The idea is that “portlets” can be used in an interface. In my files, I located these examples of “portlets”. The OmniFind interface makes use of this technology to provide data about search results graphically, one-click access to collaboration, and similar “beyond search” functionality. The graphic displays appear to make use of Cognos technology, acquired in late 2007.

This illustration combines screen grabs of various IBM portlet functions. Each “portlet” comes from different IBM open source materials.

If your blood runs IBM blue, you will recall that IBM offers a mainframe search system, hyped WebFountain, and invested significantly in semantic search. One IBM wizard defected to Google and “invented” semantic technology which Google has disclosed in its patent applications for its Semantic Web. You may also recall that whatever IBM “sells” is not an application that you load on your server and head home early. IBM’s approach is to make available components, services, and hardware. You can dip into the IBM virtuous circle for comparatively modest sums. For instance, the iPhrase technology for call center self-service support starts at less than $5,000. Scaling to enterprise-class platforms, not surprisingly, requires a more substantial investment. IBM — like Microsoft, Oracle, and SAP — deploy systems that routinely hit seven figures and often soar higher.

Observations

The IBM release of version 8.5 of OmniFind triggers several thoughts. Let me highlight four of them. If you disagree, let me know.

First, OmniFind 8.5 affirms IBM as the digital Julia Child. Mrs. Child — a self-taught French chef and TV personality — crafted recipes any American housewife with patience could convert to “real” French cooking. OmniFind is not an application. It is a group of components, servers, and functions. To make an IBM solution, you combine the ingredients. There, you have it.

Second, I have a tough time figuring out where the IBM-developed technology goes. The defection of Dr. Ramanathan Guha to Google and the five PSE patent applications struck me as important. But IBM has quite a bit of innovation that seems to fade when the company acquires another firm. A good example is the iPhrase technology. IBM uses this technology, but it did not make use of the semantic inventions from the unit where Dr. Guha worked. I find this interesting?

Third, each year I vow to track down a person who is in the IBM search unit. This year I tried to contact Aaron Brown, whom I heard was the head honcho of IBM’s Information Management Group’s search unit. No joy. Each year I remain puzzled over the various versions of WebSphere, the differences between OmniFind Discovery and plain vanilla OmniFind, the search functionality of DB2, the role of SearchManager/370 (a mainframe search solution), the WebFountain text services, the free Yahoo edition of OmniFind, and the search oddments in Lotus Notes, among others.

In closing, I will track down OmniFind 8.5, load it on one of my aging NetFinity 5500s, and give the system a shake down run. You probably thought that I was anti-IBM. Wrong. I’m a believer in the firm’s hardware, and I’ve done a number of projects with WebSphere, “portlets”, DB2, and various other cogs in the IBM machine.

I am frustrated. I can’t figure out the product line up, how the pieces fit together, and how to navigate among the large number of options and combinations. If a fan of IBM can’t figure out OmniFind and its enhancements, potential customers have to crack the code.

Stephen Arnold, March 17, 2008

Written by Stephen E. Arnold · Filed Under Enterprise, News, Search | Comments Off on OmniFind: IBM’s Search Work Horse

Endeca’s Pete Bell Interviewed

March 17, 2008

Endeca broke the $100 million revenue barrier in 2007, and the company has received additional financial backing from Intel and SAP. Endeca’s Pete Bell spoke with me in March 2007 and provided substantive information and insight into Endeca’s success.

Mr. Bell said: “We’re thriving as an Information Access platform whose architecture is based on a new class of database.” At the outset of the interview, I was operating on the assumption that Endeca was a search engine. Mr. Bell’s explanation of the company’s approach gave me a fresh appreciation of the firm’s engineering prowess. For example, Mr. Bell said:

Since imitators were playing catch up, nearly everyone else grafted facets onto their existing engine, so they do things like manage facets through application-side code. If you have a thousand products and three facets, that’s could work. But it gets ugly when you need to scale or want to make changes. But since we architected for facets from the very beginning, we built it into the engine. We’ve evolved industrial strength infrastructure for this.

With news of the Intel and SAP financial support, I wanted to know what those additional funds would fuel. Mr. Bell said:

Intel and SAP give us the opportunity to plan a product roadmap today that will be ready for how enterprises look three years from now…. It’s all about multi-core — what would you do with an 80 core chip? … Intel wants visionaries to create the demand for its next generations of chips. As for SAP, their software manages a lot of the world’s most valuable data. Today, the SAP data support business processes … But as soon as you veer off from a specific process, it can be difficult to make use of those data.

You can read the complete interview with Mr. Bell on the Search Wizards Speak section of the ArnoldIT.com Web site. Key links are:

If you want more information about Endeca, click here.

Stephen Arnold, March 17, 2008

Written by Stephen E. Arnold · Filed Under Database, Enterprise, Interview | Comments Off on Endeca’s Pete Bell Interviewed

Google and the Enterprise

March 16, 2008

On March 4, 2008, I delivered a short talk followed by open discussion at the AIIM Conference in Boston, Massachusetts. The title of my talk was “Google: Disrupting the Traditional World of Enterprise Applications”.

The core idea of my talk is that Google is poised to offer more enterprise applications and services. Most of the attention is directed at what some journalists characterize as “Microsoft Office killers”. That emphasis is incorrect. The more interesting enterprise functions include map-related capabilities and data integration and management functions.

Unfortunately I do not reproduce the online sessions I access when talking in front of an audience. I do not reproduce all of the visuals I use in my PowerPoint deck. Some of these figures come from my studies to which the copyright has been assigned to litigious publishers, an evil breed indeed. If you want to talk to me about an in-person briefing, you can send me email here: ait at arnoldit.com. I’m cutting back on my travel, but I will give your request serious attention. You can also buy copies of The Google Legacy and Google Version 2.0 from Infonortics, Ltd., in Tetbury, Glou. and in April 2008 my new study, Beyond Search: What to Do When Your Search System Won’t Work from Frank Gilbane. Most of the information in this AIIM briefing came from these studies.

Transportation

The first example uses Google’s familiar Maps and Earth service. You can look at digital versions of maps, plot routes, and see the names of local businesses, among other well-worn functions. With a click, you can replace the map with a satellite image. New features make it possible for you to upload data, display thos data on a map, and perform a wide variety of manipulations. The audience continues to enjoy looking at Google’s examples as well as those from clever Google Map hackers. Here’s a St. Patrick’s day Google Map that gives you some idea of the ways in which multimedia data can be embedded in a Google Map.

So what’s this have to do with an enterprise, government agency, or association. Quite a bit. The example I used in my talk is that Google is in the transportation routing, management, and logistics business. Few know about its capabilities in this field. When I asked a Googler about it, the response I received was, “I don’t know anything about that function.” While not surprising to me, the response illustrates how Google’s employees often lack a “big picture” view of what the company’s technical engineers have created and other Googlers have sold.

My example is Google’s transportation routing system in use in San Francisco, California. A Google employee can use a mobile phone to SMS for a shuttle pick up. Google’s automated system receives the request, figures out when a shuttle can be dispatched to the Googler’s location, and SMS es back to the Googler when the shuttle will arrive. The Google system updates the routing information to the human driver of the shuttle who proceeds to the location.

In this way, Google can provide “green” transportation services without the cost and hassle of traditional bus routes. You can read more about this invention in Google patent document US20060149461.

What’s this have to do with the enterprise? The technology disclosed in the patent document suggests to me that Google can use the same sytstem and method to:

Provide shuttle routing services to municipalities or to private operators of services such as airport shuttles
Offer cloud-based routing services to trucking and delivery companies
Apply the functions to broader logistics problems so that data and routing can be displayed in real time.

One of the fastest growing businesses at Google, according to my sources, is what is known as geo spatial services and applications. But Google’s capabilities give it the flexibility to move quickly and without warning into adjacent business sectors such as logistics.

The Google Search Appliance

This section of my talk described the GSA or Google Search Appliance. The enterprise group at Google has its own engineers and sales team. With each version of the GSA, the system improves. With more than 10,000 customers worldwide, Google’s GSA is arguably one of the most widely-used behind-the-firewall search systems. It’s no longer necessary to license a GSA to provide Web site search. Google’s free custom search engine can handle that job.

But the GSA is less interesting to me than the OneBoxAPI. In my talk, I showed several examples of how Google’s OneBox API makes it possible to use the GSA to federate information. (Federation means that a search system takes information from multiple sources and displays one relevance ranked list of results.) But I find laundry lists uninteresting.

The GSA goes “beyond search” as you can see in this Google screen shot I sucked down from the Web before the link went dead.

The tiny red and green bars in the screen shot graphic show the GSA pulling data about the query from a Microsoft Exchange Server. The traditional list of results is enriched with a dynamic view of the subject of the query’s schedule. In short, the GSA lets you pull a colleague’s phone number, schedule, and other related information by typing a first and last name into the GSA search box.

I find this suggestive, but I challenge the audience to tell me if a system can apply a certainty score to each result or provide a one-click way to determine where these data originated. The audience at my talk rarely speaks up, and on this occasion, a few people shook their heads. The others in the audience didn’t know how to query on certainty or lineage and let my question hang unanswered.

Google is working on a system that adds these types of queries to its search system. Information about this function is scarce, and I am now looking through Google’s public information to find more clues. So far, I have a buzz word uncertainty score and a name which may or may not be correct, Dr. Levy. Stay tuned on this subject. You will find a 10-page discussion of this fascinating extension to Google’s search technology in Beyond Search.

What’s this function bring to the enterprise? That’s a difficult question to answer. I think that it would be very useful to have a score such as 79 percent or 21 percent attached to each search result to indicate how likely the information were to be correct. Right now, few people looking for information give much thought to the reliability of data or to their provenance. This technology, if my research is accurate, would allow Google to expand its services to the enterprise for competitive intelligence. Law enforcement, of couirse, would be quite interested in knowing the likelihood of an item’s being accurate.

Wrap Up

Despite the small turnout for the talk, Information Week ran a short news item about one of the examples I used to illustrate my theme. You can read the story here. More information about a less visible Google enterprise application appears in the Entchev TIS Architects Web log. The New Jersey Transit Authority has partnered with Google to allow NJTA customers to plan their trips using Google. You can read this story here.

I’ve posted a PDF version of the PowerPoint deck I used to illustrate my talk. You will find that information on the ArnoldIT Web site on Monday, March 17, 2008. I have to jump through some hoops to create a usable PDF, so don’t look for this deck until Monday, probably about 4 pm US Eastern time.

Stephen Arnold, March 16, 2008

Written by Stephen E. Arnold · Filed Under Conferences, Enterprise, Google | Comments Off on Google and the Enterprise

Googzilla’s Assault and Old Media’s Nuclear Winter

March 15, 2008

Last summer, I gave a short talk to the Board of Directors of one of the US’s major news gathering organizations. I knew what to expect. I had worked for Barry Bingham Jr., owner of the once-prestigious Courier Journal & Louisville Times Co. After I left the Courier Journal a year or so after Gannett bought the Louisville mini-conglomerate, I worked for Ziff Communications. Bill Ziff was a media sensation. He created niche magazines, building properties to a peak, and then — almost without fail — selling them at a premium price. Mr. Ziff was not “old media”; he was among the first to be a “new media” innovator. Health forced changes at Ziff, and his empire was sold. His leaving the New York publishing sector was, I believe, a respite for news and publishing companies against which he and his staff waged commercial warfare.

Old Media Wants to Understand “the Google”

Last year, I was entering the sacred confines of “old media” in the Big Apple. My task was to participate in one of those interview-style presentations so much in vogue. A polished “old media” journalist was going to pepper me with questions. To make the interview more spontaneous, I was not given the questions in advance. Goodness, I had to match wits with an “old media” alchemist. I knew only that the subject was Google, an annoyance that continued to challenge the hegemony of “old media”. Even though I was tainted by my association with Mr. Ziff, I assumed that this tweedy group perceived me as sufficiently tame to “explain” what Google was doing to “old media’s” revenues.

Google — then as now — won’t answer my questions. I’m a lone goose consultant who his wings reading Google technical papers, watching dull Google videos by Googlers explaining another technical facet of Google, and reading patent applications filed by Google’s equally quotodian attorneys. But to the dewans (an Indian prince) in the audience, I was marginally less annoying than real Googlers.

The “interview” clanked over well-worn cobble stones. The leitmotif was advertising, specifically money that Google was “taking” from old media. Each question fired at me, meant, “How can these guys make so much money doing online things?” Pretty sophisticated questions, right?

The Business Model Problem

Newspapers and magazines sell advertising. Subscriptions contribute a modest amount to the bottom line. Historically, each “new medium” allows ad revenue to flow from “old media” (Point A) to “new media” (Point B). Radio sucked money, so newspapers like the Courier Journal got into the radio business. When I joined the Courier Journal, the newspaper owned AM and FM radio stations. When TV came along, more ad “blood” flowed. The Courier Journal bought a television station. When databases came along, the Courier Journal entered the database business. Most “old media” watched the Courier Journal’s online antics from the sidelines. Google has figured out the shortest distance from Point A to Point B.

Revenue from commercial online in the 1980s did not come from advertising. Customers paid for access, dealing with specialized timesharing companies. Ziff entered the online business in three ways. We offered electronic products tailored to information technology professionals. Our sales force called on companies and sought licensing deals. Ziff also entered the commercial database business, quickly becoming by the late 1980s one of the dominant players in full text and reference databases for libraries. And, we also struck a deal with CompuServer and created a Ziff branded online service called ZDNet. Both the Courier Journal and Ziff worked long and hard to make money from online. It’s a tribute to the professionals in the Courier Journal’s online business and to Ziff’s various electronic publishing units that both organizations generated positive cash flow.

Google’s Secret

Google, on the other hand, had something that my colleagues at the Courier Journal and Ziff Communications lacked. No, it wasn’t smart people. No, it wasn’t better programmers. Google had a business model “borrowed” from Yahoo – Overture and the burgeoning online “environment” generally described as the Internet. By 2004, when Google went public, Google’s business model and the exploding user base of “the Internet” ignited Google’s online advertising business.

In less than three years, Google had poked its business model into numerous nooks and crannies of “old media”. Unlike the tame online services of the 1980s, Google’s approach was operating “off the radar” of “old media”. Old media used traditional ad sales mechanisms. Old media thrived on an inner circle of competitors who behaved in a polite, club-like environment. Old media called the shots for advertisers, who by and large, had no choice but to deal with “old media” on “old media’s” terms.

Not Google. Google allowed anyone to buy an ad online. Anyone could sidestep “old media” and traditional advertising rules of engagement. More disturbing, other companies were looking at Google’s business model and trying to squeeze money from electronic ads. None of these competitors played by the rules crafted over decades of “old media” fraternizing. Disintermediation, the Internet, and a better business model — these are the problems “old media” has to resolve and quickly.

So, there I was. I answered questions about Google’s ad business. I answered questions about Google’s technical approach to online. I answered questions about Google’s arrogance. I don’t think the interviewer or the audience found my answers to their liking. I can say in retrospect that nothing I said about Google made any sense to these “old media” types. I could have been speaking Thagojian, and the effect would have been the same. “Old media” didn’t understand what the Courier Journal did in 1980, what Ziff did in 1990, or what Google was doing in the early years of the 21st century.

Watching the Money Disappear

Why am I dragging you through history and this sordid tale of my sitting under hot lights answering questions about a company that won’t answer my email? This post caught my attention this morning (6 am, March 15, 2008) in the Charlotte Airport: Google Sucks Life Out of Old Media: Check Out The 2007 Share Shift by Henry Blodgett.

The gist of Mr. Blodgett’s Web log post is: “The year-over-year growth of revenue on Google.com (US)–approximately $2 billion–was more than twice as much the growth of ad revenue in all of the offline media companies in this sample combined. This is such an amazing fact that it bears repeating: A single media property, Google.com (US), grew by $2 billion. All the offline media properties owned by the 13 offline media companies above, meanwhile–all of them–grew by about $1 billion.”

What this means is that “old media” are going the way of the dodo unless the “old media” club gets its act together. One of the more controversial statements I made to the dewans in their posh NY digs was, “Surf on Google.” The idea is simple. Google is the big dog in the advertising kennel. Instead of watching Googzilla eat your lunch, find a way to harness Google. I use the phrase Surf on Google to connote sitting down and figuring out how to create new revenue using Googzilla as an engine, not an enemy.

Problems with Newspapers

I was speaking some unintelligible language to these “old media” dewans. Even old dinosaurs like me listen to an iPods, read news on my mobile device, and often throw out unopened the print newspapers I receive each day. Why don’t I look at these traditional news injection devices? Let me count the ways:

Courier Journal. It just sucks. The recycled news is two or three days “old” when it hits print. I get current information via RSS, Google News, Yahoo News, and the BBC Web site, among others.
Financial Times. I get a paper three days out of six. This outfit can’t work out delivery to Harrods Creek despite its meaty price tag.
New York Times. I look at the Monday business section and maybe flip through the paper. I no longer spend an hour with the Sunday New York Times.
USA Today. I look at McPaper’s weather map and scan its TV grid to see if the History Channel is running the “Naked Archaeologist,” my current must-see program.
Wall Street Journal. I scan the headlines and check out the tech information. The banks for which I work expect me to know what the Journal says but I’m not sure my clients read the paper very thoroughly any more. Online is easier and quicker.

People in my son’s cohort spend less time with “old media” than I do. When I sit in airports, I watch what college students do. My sample is small, but I don’t see many 20-somethings immersed in “old media”. If you want to understand what young people do for news, embrace ClickZ stats. It’s free and useful.

I find encouraging that the Wall Street Journal, the New York Times, and the Financial Times “reinvent” their Web sites — again and again. But the engine of “old media” is advertising, and no spiffy Web site is going to make up for lost ad revenue.

Did my statement in June 2007 “Surf on Google” have an impact? Not that I can see. “Old media” are building a fort out of dead trees, traditional technology, and battle tactics used by cities besieged by Alexander the Great. The combatant — Google — is armed with nuclear weapons and is not afraid to use them.

For “old media” Mr. Blodgett’s summary of the financial devastation is confirmation that “old media” now finds itself suffering nuclear winter. There are some fixes, but these are not easy, not comfortable, not traditional, and not cheap. I’m glad I’m in the sunset of my career and no longer sitting in meetings trying to figure out how to out-Google Google. Innovation not contravallation is needed. Can “old media” respond? I’m betting on Google and its progeny.

Stephen Arnold, March 15, 2008

Written by Stephen E. Arnold · Filed Under Online (general) | Comments Off on Googzilla’s Assault and Old Media’s Nuclear Winter

Rain on the Search Parade

March 14, 2008

The storm warnings flash across the sky. This morning (Mrch 14, 2008) BearStearns is rumored to face a Carlyle-like liquidity crisis.

But so far no lightning has hit the search lightening rods. In fact, the unsettled financial weather has had no visible effects. The Google – DoubleClick deal is done. The Microsoft – Fast Search tie up is nearing port. Yahoo says that it is embracing the Semantic Web whatever that means (semantically, of course). France funds a Google killer. Radar’s Twine spools out. Business as usual in the search sector. But still we have no “real” solution to the “problem” of Intranet search, what I call behind-the-firewall search. The marketing razzle dazzle can’t mask the pain begging for lidocaine.

The turmoil in the financial market, the degrading dollar, and the $1,000 per ounce gold price seem to have little impact on search and retrieval so far. Anyone who suggests that a problem looms or that an actual panic could occur is an alarmist. I don’t want to sound any alarms.

InfoWorld‘s Web log contained a post that has to make search vendors’ pant with revenue lust. Jon Williams wrote here on March 13, 2008:

Every system we build has a search function built into it, usually hand-crafted (proprietary). Why? … Search on the internet, whether it be google, youtube, facebook, amazon, ebay, or linkedin, is solved for me, I always find what I need. And I believe the same is true for most consumers. But why not in the enterprise? Seems like a solution waiting to happen.

Spot on, Mr. Willliams. Spot on. This unanswered need is why you won’t hear gloom and doom from me. Search often sucks, and whoever solves this problem can make their investors happy in our down market.

An Entrepreneur’s Concern

At dinner yesterday evening (March 13, 2008) in Palo Alto’s noisy Fish Market, I showed the president of a hosted application my current list of 150 next-generation search and content processing companies. Most of the outfits on this list won’t resonate with you. Bitext operates from Madrid, Spain. Thetus has offices near Microsoft’s stomping grounds. PolySpot is tucked away in Paris, France. He had heard of none of these companies or most of the others on my list.

He said, “There are so many on this list unknown to me.” Not unusual. He then asked me, “How can these companies survive so much competition? I think the market downturn will make it very hard for these companies.

Right?”

I said, “Yep, tough sector. But no one has the one right answer. Not Google. Not IBM. Not the seven score newcomers on my list.”

The search market remains a triathlon, one of those “iron” versions that require competitors to climb mountains, swim rapids, and bicycle from Burlingame to Boise. But there are some formidable hurdles search vendors must overcome; namely:

Oversupply. Without rehashing dear old Samuelson’s Economics (now in its 18th edition I think), you have an embarrassment of riches for search. You have high-profile, publicly-traded “brands” like Autonomy. You have market-leading companies like Endeca. You have up-and-coming vendors like Coveo, Exalead, ISYS Search Software, and Vivisimo. You have state-of-the-art deep extraction providers like Attensity and Exegy (bet you never heard of Exegy, right?). You have free search software such as Lucene and Flax. You have such super-platforms as IBM, Microsoft, Oracle, and SAP including search with every enterprise applications licensed. You have specialists in entity extraction (Inxight / Business Objects), semantics (Siderean), ANSI standard controlled terms (Access Innovations). You get the idea. Can the market support hundreds of vendors of search and content processing?

Confusion. You don’t want me to belabor this point. There’s a great deal of confusion about search, content processing, text mining, and related disciplines. The easiest way to illustrate this is to provide you with a handful of the buzz words that I have collected in the last two weeks. How many of these can you define? How many of these do you use in your discourse with colleagues? Here are the “Cs” through the “Ks” only:

Collective knowledge systems
Community portals
Composite applications
Conferencing
Context aware games
Context aware mobile search
Context aware search
Context search
Faceted search
Folksomony
Formal language
Geospatial search
Glass boxes
Instant messaging
Intelligent agents
Knowledge base
Knowledge computing
Knowledge management
Knowledge spaces

Confused buyers often drag their heels as they try to decipher the nuances of search-speak.

Skepticism. Some vendors have told me that potential customers are skeptical about some search features and functions. For example, on a telephone call with a non-U.S. search system vendor, a principal in the company told me, “The nest has been fouled. Two prospects told me today that our two to five day deployment time was impossible. Their incumbent system took more than a month to get installed and another two months of effort before deployment.” As organizations get more behind-the-firewall search experience, those organization’s employees know that some vendor claims may be a blend of wishful thinking and science fiction.

Over confidence. I don’t have much to say about this human failing. Most chief technical officers over estimate what they know about search and retrieval. Most of the Intranet search problems problems have their roots anchored in the licensees’ assumptions about what their systems can do, their knowledge of search systems, and their ability to figure out software. I get my Greek myths mixed up, but there were, as I recall, quite a few stories about the nasty effects of pride. “Flame out” and Icarus resonate with me.

Loosey goosey pricing. In the course of the research for my new study Beyond Search, I encountered one vendor who refused to give me a starting price for its system. The president refused. I said, “Take your total revenue, divide it by the number of customers you have, and I will use that number as the average price.” He sputtered in anger. Let’s face it. Unless something is free, most search software comes with a price tag. Even a free system such as Lucene costs money because someone who gets a salary has to babysit the Lucene system. More and more vendors are tap dancing on the cost of their licenses, services, and support. I suspect that these vendors want to hold out to get the best possible price. Maybe these vendors don’t want other customers to know that a price is rising or falling?

Adam Smith’s “invisible hand” will reach out to strangle me. Economics in March 2008, however, continues to surprise the Wall Street set. Last time I checked the super-secret Carlyle Group did not expect fellow bankers to demand cash.

How untoward!

But if some of the best-known financial services companies are in the doo-doo, what will become of the more 300 firms engaged in search and retrieval? Even the Teflon-coated Google has drawn criticism. Today (March 14, 2008) Google’s share price will open at $443, down from its 52-week high of $747. Microsoft will pay $1.2 billion for a chance at bat to hit a search home run. That’s a pricey swing methinks. In my conversations at conferences, I detect a note of concern about making numbers. Entrepreneurs are thoughtful.

Wrap Up

To wrap up, I believe the search landscape will be pockmarked with Entopia-like shut downs. I also anticipate more strident marketing. Sigh. There will be some buy outs, but there will be some firms that cannot sell out. One reader of this Web log wondered if Autonomy was an example of company that many look at but none has carried over the threshold. Maybe the right suitor has not come forward? I believe that some countries will intervene in order to keep certain search firms in business. Anyone think that the French government has this as a motive for the funding of its Google killer? Other companies will give away search software and try to make money via services and consulting. And don’t forget the bundling option. Every time I buy an IBM server, I get Lotus Notes. Perhaps the same approach will be used by Microsoft and Oracle to “lock in” customers with this tactic.

The big concern I have is that search’s “bird flu” will land. The weaker firms will die after a tough fight. The stronger firms will capture a larger share of the market. Instead of the surfeit of choices we have today, we may end up with fewer choices, higher prices, and a stifling of innovation. What do you think? End or beginning for behind-the-firewall search?

Stephen Arnold, March 14, 2008

Written by Stephen E. Arnold · Filed Under Search, Text processing | Comments Off on Rain on the Search Parade

Yahoo Goes Semantic

March 13, 2008

Yahoo has embraced the Semantic Web. Yahoo’s Web log stated:

In the coming weeks, we’ll be releasing more detailed specifications that will describe our support of semantic web standards. Initially, we plan to support a number of microformats, including hCard, hCalendar, hReview, hAtom, and XFN. Yahoo! Search will work with the web community to evolve the vocabulary framework for embedding structured data. For starters, we plan to support vocabulary components from Dublin Core, Creative Commons, FOAF, GeoRSS, MediaRSS, and others based on feedback. And, we will support RDFa and eRDF markup to embed these into existing HTML pages. Finally, we are announcing support for the OpenSearch specification, with extensions for structured queries to deep web data sources.

Interesting, but maybe these two lads knew something I didn’t. What’s interesting about this announcement is that Google’s Programmable Search Engine, disclosed in a series of patent applications in February 2007, strikes me as a more sophisticated, well-conceived approach. But Google has kept its semantic technology under wraps.

Amazon, like Yahoo, has moved more quickly than Google. Jeff Bezos has
deployed cloud computing, introduced storage, and a hosted data management service. Google has these technologies and disclosed each in patent applications.
The question for me is, “Is Google content to let Amazon and Yahoo operate like lab experiments?”

Google doesn’t answer my email, so I can’t provide any insight based on information from the Googleplex. Google’s professionals are a heck of a lot more intelligent than I am. Google is hanging back, allowing two of its rivals to push forward in areas where Google has a core competency.

I find this puzzling. Do you?

Stephen Arnold, March 13, 2008

Written by Stephen E. Arnold · Filed Under News, Online (general), Search | Comments Off on Yahoo Goes Semantic

« Previous Page — Next Page »

Search the site
Subscribe to Beyond Search
Feature archive
News archive

Stephen E. Arnold monitors search, content processing, text mining and related topics from his high-tech nerve center in rural Kentucky. He tries to winnow the goose feathers from the giblets. He works with colleagues worldwide to make this Web log useful to those who want to go "beyond search". Contact him at sa [at] arnoldit.com. His Web site with additional information about search is arnoldit.com.

Categories
- 3D-Printing
- Acquisition
- Advertising
- Aggregation
- AI
- Alexa
- algorithms
- Amazon
- Amazonia
- Analytics
- Appliance
- Applications
- Audio
- Augmented Reality
- Big data
- Bing
- Bitcoin
- Bitext
- Book review
- Business intelligence
- Business process
- Business strategy
- Censorship
- Cloud computing
- Company Profile
- Conferences
- Connectors
- Consulting
- Consumer
- Content processing
- Copyright
- Corporate Concerns
- Cost
- Crawl
- Crowdfunding
- cryptocurrency
- Customer support
- Cyber OSINT
- cybercrime
- cybersecurity
- Dark Web
- DarkCyber
- Data
- Data mining
- Database
- Deepfakes
- Digital Assistant
- Digital Library
- E2EE
- ECommerce
- EDiscovery
- Editorial opinion
- Education
- Emoticons
- Enterprise
- Enterprise search
- Entity extraction
- Ethics
- Facebook
- Faceted search
- Factualities
- Feature
- Federated search
- Financial
- Fogint
- Google
- Governance
- Government
- Hackers
- healthcare
- IBM Watson
- Image search
- Indexing
- Infrastructure
- Innovation
- Integration
- intelware
- Interface
- Internet
- Interview
- Investment
- law enforcement
- Legal matters
- Library automation
- Management
- Marketing
- Mathematics
- Metadata
- Microsoft
- Mobile
- Natural language processing
- News
- NGIA
- Online (general)
- Open Access
- Open source
- OSINT
- Osint Radar
- Overflight
- Palantir
- Patents
- Personnel
- Podcast
- Policeware
- Portals
- Predictive coding
- Privacy
- Profile
- Publishing
- Quotation
- Real time search
- Reference tool
- Rich media
- Robot Writer
- Search
- Search enabled applications
- search engine
- Search quality
- Security
- Semantic
- Sentiment analysis
- SEO
- SharePoint
- Short Honks
- Smart Technology
- Social
- Social Media
- software
- Statistics
- Taxonomy
- Technology
- Text analytics
- Text processing
- Tools
- Tor
- Training
- Translation
- Twitter
- Uncategorized
- Unstructured Data
- User experience
- User Interface
- Vertical search
- Video
- visualization
- Voice search
- Voice technology
- Web 3
- Web Services
- Webinar
- Windows
- Work flow
- XML
- Yahoo

Beyond Search

Beyond Search: A New Look, More Search Information

Civita: The Paradox of Disintermediation

ISYS Search Software: A Case Study about Patent Analysis

SAS Buys Teragram Corporation

OmniFind: IBM’s Search Work Horse

Endeca’s Pete Bell Interviewed

Google and the Enterprise

Googzilla’s Assault and Old Media’s Nuclear Winter

Rain on the Search Parade

Yahoo Goes Semantic

Search the site

Categories

Archives

Recent Posts

Meta

Share this:

Share this:

Share this:

Share this:

Share this:

Share this:

Share this:

Share this:

Share this:

Share this:

Search the site

Categories

Archives

Recent Posts

Meta