Exalead Tightens NewspaperArchive Tie Up

March 26, 2010

A happy quack to the reader who alerted me to a Marketwire story about Exalead’s deal with NewspaperArchive.com. Exalead is one of the most interesting search applications and content processing companies we monitor. The story I read was “NewspaperArchive.com Scales With Exalead”.

The story reported:

NewspaperArchive.com is the largest historical newspaper database online. It contains tens of millions of newspaper pages from 1753 to present. Every newspaper in the archive is fully searchable by keyword and date, making it easy for people to quickly explore historical content. NewspaperArchive.com had bumped up against limitations of having nearly 100 million records. After the switch to Exalead in December 2009, NewspaperArchive.com has been able to scale again, increasing the number of records by 20%; while at same time reducing the amount of hardware by 75%.

The performance angle is important based on our research. There are very few companies with the engineering and architecture to deal with the types of data flows found in many organizations today. One of the founders of Exalead worked on the AltaVista.com search system. I have identified a number of Exalead innovations that moved beyond the Digital Equipment approach to search. One of the most important is scaling and a design that permits enterprise applications to break free of their lock step methods of making data available to users. Exalead can give today’s iPod savvy user a way to access business information with the fluidity of downloading a tune from Apple’s system. In the enterprise, this type of functionality is a rare animal in my experience.

Exalead, founded in 2000,

…is the leading search-based application platform provider to business and government. Exalead’s worldwide client base includes leading companies such as PricewaterhouseCooper, ViaMichelin, GEFCO, American Greetings and Sanofi Pasteur, and more than 100 million unique users a month use Exalead’s technology for search. Today, Exalead is reshaping the digital content landscape with its platform, Exalead CloudView™, which uses advanced semantic technologies to bring structure, meaning and accessibility to previously unused or under-used data in the new hybrid enterprise and Web information cloud. Cloudview collects data from virtually any source, in any format, and transforms it into structured, pervasive, contextualized building blocks of business information that can be directly searched and queried, or used as the foundation for a new breed of lean, innovative information access applications. Exalead is an operating unit of Qualis, an international holding company, with offices in Paris, San Francisco, Glasgow, Milan and Darmstadt.

I want to let you know that the last time I was in Paris I got a preview of Exalead’s forthcoming search application technology. I am not at liberty to let le chat out of the bag, but I will be describing the system when Exalead makes a formal announcement.

You can get more information about Exalead at www.exalead.com. Additional information about NewspaperArchive is available at

Nginer, Timesaving Metasearch

March 23, 2010

I find that for certain types of queries, I need a metasearch system. I have mentioned Devilfinder before, and I like that service because it generates the top 100 hits for queries. I can scan the list and get a good sense of what’s available. That type of overview is useful because I prefer to hit the Web before using the for fee services.

A reader sent me a link to Nginer.com, a system that displays results from Google, Bing, Yahoo, Clusty and several others in one window. I ran several queries and found the service quite useful. Here are the points I noted:

  1. A drop down list lets me confine the query to a collection; for example, blogs, books, and social bookmarks, among other subsets.
  2. A tabbed display which lets me look at the top hits across a number of search and metasearch systems. I like this approach because I can tell at a glance which system is indexing certain content quickly.
  3. A tag cloud that shows me what is getting traffic at the time I ran the query.

I did not like two things. First, there was no button to let me clear the results and run another query. No big deal, just annoying. Second, I want  way to eliminate systems that in my experience rarely yield any useful results for my types of queries. Otherwise, I was a happy camper.

My recommendation is for you take a test drive.

Google Beyond Text Sample Chapter

March 23, 2010

Stephen E Arnold’s new monograph about Google technology is just about complete. The study will be available in May 2010 from Intellas Press. The monograph focuses on Google non-text technical disclosures. Like Mr. Arnold’s previous three Google studies, the focus is on Google’s technical capabilities. Mr. Arnold’s 2009 study, Google: The Digital Gutenberg examined the company’s infrastructure as a digital River Rouge; that is, information goes in one end and complete content objects come out the other. Google Beyond Text tackles rich media, including images, audio, and video. The research has consumed more than one year and relies upon Google’s open source disclosures in technical papers, financial filings, and patent documents. You can get a sense of the type of information in this important new study by requesting a sample chapter from the monograph. The chapter that is available in draft form and without charge discusses the gap in Google’s non-text capabilities. Among the companies discussed are Catch Media, La La Media, and DoubleTwist. The book is intended from technically inclined individuals, investors, Google competitors, and those interested in a book that skips the “Sergey and Larry eat pizza” approach to Google’s technical systems and methods. You can request the sample chapter by navigating to http://www.theseed2020.com/gbt/. A PDF will be sent to you without charge. The hope is that you will either provide some constructive criticism and maybe order a copy of the monograph when it becomes available.

Ken Toth, March 23, 2010

This was a sponsored post supported by ArnoldIT.com. This is a marketing article.

Exclusive Interview with the Founder of Hot Neuron

March 23, 2010

What happens when a theoretical physicist focuses his attention on the problems of content processing? One answer is the Hot Neuron technology. Dr. Bill Dimm, after a successful career in physics and finance, founded Hot Neuron to “develop innovative methods and algorithms that help people find and organize information that will make their companies more productive.”

In an exclusive interview for the ArnoldIT.com feature Search Wizards Speak, Dr. Dimm said:

Clustify analyzes the text of your documents and groups related documents together into clusters. Each cluster is labeled with a few keywords to tell you what it is about, providing an overview of what the document set is about, and allowing you to browse the clusters by keyword in a a hierarchical fashion. The aim is to help the user more efficiently and consistently categorize documents, since he or she can categorize an entire cluster or a whole group of clusters with a single mouse click. Our approach to forming clusters is impacted by that goal. We use a modified agglomerative algorithm to ensure that the most similar documents get clustered together, and we allow the user to specify how similar documents must be in order to appear in the same cluster. By choosing a high similarity cutoff, the user can be confident that it is safe to categorize all documents in the cluster the same way. Clustify can also do automatic categorization by taking documents that have already been categorized, finding similar documents, and putting them in the same categories.

I asked Dr. Dimm about the intense competition in the text processing sector. He said:

For companies that do original research and adapt their products to their customers’ needs (like us, of course), there is a fair amount of opportunity for differentiation–customers really need to try the products and see what works in their situation. The companies that just pull an algorithm out of a book or mimic another product will be left competing on price.

You can see the technology in action at Dr. Dimm’s MagPortal.com site. For the full text of this exclusive interview with an innovative thinker in information retrieval, read the full text of  Hot Neuron interview. For more information, visit http://www.hotneuron.com.

Stephen E Arnold, March 23, 2010

A free write up and a free article. I will report this “free” stuff to the Department of Labor. I know the DOJ will care.

Infrastructure Ripple from SharePoint

March 22, 2010

Navigate to Thor Projects and read the article “Infrastructure Ripple Effect – The Story of Servers, Racks and Power.” I have about 48 inches of screen real estate and I needed all of it to read the article. The layout is – in a word – interesting. The point of the write up, in my opinion, is summarized in this passage from the article:

I am reminded that any change creates a ton of little ripples.

When an information technology pro runs into problems with a single server, I wonder what the impact of more massive on premises changes might be.

I thought about Mauro Cardarelli’s “Where Does SharePoint Still Fall Short?” when I thought about adding hardware. He wrote:

Let’s face it; the interface for security management is confusing and cumbersome… even for people who use it every day. What are the consequences? First, you increase the likelihood of security breaches (i.e. showing content to the wrong audience). Second, you increase the likelihood of giving users permissions greater than necessary. Finally, you increase the likelihood of a having a security model that is highly diluted and overly complex. This is probably why the 3rd party market for SharePoint administration has been so strong… someone needs to pay attention to what these folks are doing! But I would argue that this is reactive (versus proactive) management… and things need to be taken one step further.

Hardware and security. Hmmm.

Stephen E Arnold, March 22, 2010

No one paid me to write this article. I will report this to the Salvation Army, an outfit that knows about work without pay. Perhaps the cloud access to SharePoint will obviate the problem?

Coveo and GEICO Host Webinar on March 23, 2010

March 21, 2010

Fierce Media has asked Beyond Search to facilitate a discussion about “how GEICO thinks about leveraging its data-rich enterprise systems to generate real-time business value and intelligence.” The participants are GEICO and Coveo as well as Stephen E Arnold.

Topics include how the Coveo system can:

  • Enable improved business intelligence and decision making through dynamic dashboards and information mashups that provide actionable business information
  • Access structured and unstructured data from across enterprise systems and repositories without complex integration or data migration, improving efficiency and cost effectiveness through a unified indexing layer
  • Lower the cost of legacy system integrations and  upgrades, and reduce time-consuming data migration
  • Optimize social networks and incorporate the value of collaboration and just-in-time information exchange into the knowledge ecosystem

The audio program will be on Tuesday, March 23, 2010 beginning at 11:00am Eastern/8:00am Pacific. More information about Coveo may be found at http://www.coveo.com. You can register here.

Ben Kent, March 21, 2010, Beyond Search

This is a sponsored post.

InQuira Embraces the Cloud

March 19, 2010

I read “InQuira Puts It Knowledge Solutions in the Cloud” and learned that the approach “is in no way a light weight version.” On premises search systems can be tough to install, tune, and maintain. Blossom has been, in my opinion, one of the trail blazers for hosted search, and it offers a robust, powerful, and customizable solution. InQuira is moving in that direction as well.

According to the write up which quotes an InQuira officer:

InQuira has existing partnerships with Oracle CRM On Demand, Oracle’s Siebel offering, and Genesys Telecommunications Laboratories. The newest on-demand offering will extend the company’s reach…[InQuira] has a really established reputation as the best-of-breed intelligent search vendor that quickly and easily integrates with everyone,” says John Ragsdale, vice president of technology research for the Technology Services Industry Association (TSIA).

One feature of the approach is that storage is provided in an “on demand” model.

You can get more information from www.inquira.com.

Stephen E Arnold, March 19, 2010

Freebie. No one paid me to write this. I will report non payment to the Bureau of Labor Statistics, an outfit who tracks work for no compensation each day, every day.

Pew Documents What Some Info Vendors Will Learn the Hard Way

March 17, 2010

There are some tricks to learning. To memorize a list, put each item in a room of your house and walk the rooms, recalling each item by association. One of my classmates remembered the names of the Great Lakes with a mnemonic word. I prefer to look at survey data and let the numbers do the talking. The write up “Pew: Readers Prefer Ad Supported News to Pay Walls” provides me with some evidence that the dreams of traditional publishers to make yesterday’s revenue from gizmos like the iPad and the Nook might be just a figment of the imagination.

According to Pew, the oh-so-reliable research outfit, the article reports:

when it comes to online news, getting people to pay for content they otherwise value is “like trying to force butterflies back into their cocoons.

Yikes. People must not know this factoid which is pretty well understood among the savvy, but ageing commercial database publishing crowd.

I found this passage fascinating:

First things first: Pew notes that last year, online advertising saw its first decline since 2002. Numbers from eMarketer said that revenues fell by a total of $1 billion between 2008 and 2009. Still, a full 81 percent of Internet surfers say they’re cool with online ads if it means the content remains free, although “much of that is because they find them easy to ignore.” Further, 21 percent said they click on ads “at least sometimes”—much higher than we expected—and that number goes up when the user is more active. For example, among daily Internet surfers, 28 percent reported clicking on ads. For people who visit at least six sites per day, the click rate is as high as 37 percent.

Where’s the revenue going to originate? In my opinion, the former country club owners will be looking for regulatory help in the form of a “news tax” or some financial piece of the online revenue action from the new owners of the information country club. I caddied for peanuts and I don’t think the new country club proprietors will be too keen to give up too much cash to run “real news”.

Stephen E Arnold, March 16, 2010

Free, free as a goose. No one paid me to write this article. My reference to a goose reminded me of the Bethesda Country Club member who bludgeoned a swan to death decades ago to much fanfare. I will report my killing of this story to the new manager of that country club in suburban Washington.

Another Google Jibe

March 15, 2010

Poor, poor Google. From top of the world to a punching bag in less than three months. This new decade is proving to be a challenging one for Google. I just read “Six Delusions of Google’s Arrogant Leaders.” I want to disclose that I too have been accused of being arrogant. Now I don’t have any good reason to be arrogant. I just find that approach works for me, but, please, keep in mind that I am an addled goose, live in rural Kentucky, and am wandering slowly toward being 66 years old. I am no sports car in today’s NASCAR ego race.

But Google! According the write up, Google is coming across as “cocky”. I don’t want to run down the six delusions. I inveigh you to go direct and suck up the juiciness yourself. However, I can point to two of the examples and offer a comment.

The first is “users are hungry for Google synergy.” I am not sure what synergy means. I know that the Google platform is one that works like a giant plastic bag wrapped around the earth. The idea is to put everyone in the bag and keep them there. This is mostly complete, but about 25 percent of Web users are outside of the bag and Google wants to get them in one way or another. The notion that users want this is irrelevant. What this delusion makes clear is that Google is retrofitting public relations baloney to match what the company has been working on for about decade. What’s interesting is that it has taken mavens, pundits, and “real” journalists 360 months to figure out the Google game plan. Who’s delusional? Google which has mostly accomplished its mission or the folks just figuring out that Google has been and will continue to push the Google PR line?

The second delusion is that “Google is a worker’s utopia.” Okay, when you take money to do work, by definition, this situation is not utopia for the workers. Companies can make work less onerous or more meaningful, but it is work. I don’t think the Googlers I know are doing much more than drinking the Google Kool-Aid, trying to build their knowledge value, and get some money. Like Apple, Google operates a reality distortion field, and, let’s face it, having Google on one’s résumé is arguably more impressive than a degree in Harry Potter studies from Frostburg College. My view is that Google manipulates its workers as effectively as it manipulates the media. Like the media, Google employees play along. It’s a game with high stakes, but it is a game. Google knows exactly what it is doing.

Now what’s the arrogance? The arrogance is not unique to Google. I call this the Math Club Syndrome. Here’s how it works. A group of folks with specialized interests and skills bond, sort of like a golf foursome from Sigma Chi fraternity. The difference is that no one understands the Math Club and most people understand and envy the Sigma Chi golf foursome. As a method of coping with a world that simply does not understand math, the math club becomes insular. The club’s rules are insider rules and act like a protective barrier. No problem until the math club becomes the first next generation supra-national company jousting on an apparently equal footing with China, the Department of Justice, and giants like Microsoft.

What do we expect from the Math Club? I expect Math Club behavior, complete with the insider jokes about janitors in patent documents. (Oh, janitors is a way of describing Google’s semi autonomous agents which “clean up” statistical anomalies in petascale flows of data. Snort, snort, get it. Janitor equals Dilbert’s garbage collector, the smartest person in the comic strip. Oh, you don’t get it? Well, there you have it. A mismatch between Math Club humor and you, gentle reader.)

My view is that it is time to quit worry about Google’s power and time to start figuring out how to surf on Google. My column for KMWorld and this month’s column for the Smart Business Network address two different ways to surf on Google. I don’t grouse. I accept that over the last decade Google has emerged as a new ecosystem. You can’t kill it because the Googlers who leave the company spawn Google-centric entities. My last count tallied a couple of hundred of these Xoogler ventures. And Facebook is not much more than a “legacy” of Google. Maybe Facebook will become the new Google, but that won’t change the arrogance.

Math Club is congruent with arrogance. Reality. Live within it; don’t deny it.

Stephen E Arnold, March 14, 2010

No one paid me to write this article. Because I have not been paid and I refer to psychological behavior, I will report my writing for no pay to the Surgeon General who understands such esoteric notions as delusions.

Newsosaurs

March 15, 2010

I read “It’s Hard To Watch The Newsosaurs Turn A Blind Eye To Their Own Extinction” right after I flipped through the New York Times’s Sunday magazine clone from the Wall Street Journal outfit. Let me comment on each information MIRV and offer a couple of observations from my search vantage point.

First, TechCrunch’s write up has a killer comment:

Everyone wants to wall off the Web and keep grazing on declining ad revenues.

I agree. This is a combination of fear, anger, and ostracism. I enjoy pointing out that in the information economy, the traditional giants no longer own the country club. Each day, the former owners find their future will be as caddies to the new information elite. This is, I suppose, a bitter pill to swallow. The TechCrunch article includes the much quoted “burn the boats” admonition from one of the early superstars of the zippy-doo Web that is not the cat’s pajamas. Like Google’s advice to struggling industry, the listeners think that their outfits have already burned the boats, embraced technology, and reinvented themselves. This mismatch between advice and its perception is characteristic of the domain collision that is now taking place. The passage that caught my attention in the TechCrunch write up was:

The longer media companies wait, the bigger disadvantage they will have when they cross over to the other side and find a whole new host of competitors who never had any print legacy businesses to protect. Those competitors right now are blogs and online news hubs who are still furry little rodents in the underbrush, but who won’t stay little forever. The sooner print media companies cross over, the sooner they can be on pure offense. Their online strategies and business models won’t be crippled by any allegiance, or need to protect, to the old print business. If they wait until their online revenues become 25 or 50 percent before they fully commit, it will be too late.

I don’t disagree with the thought. I disagree with the “will be too late.” It is too late.

The example to wish I refer is the oversized, glossy, 80 plus page WSJ Magazine filled with “reading.” Well, that’s interesting. I just counted about 32 pages of ads plus a number of features that are tough for me to determine if these are placed for consideration or are actual editorial. The stories focused on cars and fashion with a profile tossed in for good measure.

I remember being told by my Financial Times’s delivery agent before I dropped my print subscription that he tossed the magazine insert because it was too much of a hassle. I wonder if my delivery person for my Saturday WSJ will follow the same path.

Did I read any of the stories? The answer is, “No.” None of them appealed to me. I have a person who works for me who drives a Mini Cooper and it seems to have constant tire problems. I am tired of with it executives who overcame hardship. Who hasn’t? Fashion? Not interested. I wear black Travel Smith jackets, black never wrinkle pants, and black shoes that do not set off any alarms anywhere I travel. Spare me the trendy. Was there any financial info, business intelligence, or juicy insights into making money grow? Nope. The WSJ added sports and now it is adding a New York Times’s magazine type publication every couple of months.

What’s my take?

  1. WSJ is going after the NYT advertisers. That’s okay but the effectiveness of print ads have to be demonstrable. That might be tough unless the editorial product provides some content consideration. The boundary between an auto story and an advertiser might be getting a few molecules narrower, might it not?
  2. The problem with traditional media is not content; the problem is finance and business models. Offering me 30 pages of ads in 80 pages of paper is somewhat 17th century in today’s world.
  3. The Financial Times’s last home delivery offer to me was $50 a year. Will the Wall Street Journal face the same subscription challenge as readers discover that blending sports, Details magazine editorial, and business profiles might be out of step with what subscribers like me do on a Saturday?

Now search? How will I be able to locate the Gucci suit on the WSJ Web site? Answer: Not until the WSJ figures out image indexing and some other search tricks. I bet that when the iPad version of the WSJ Magazine comes out I will be able to click on a suit and see a map of locations where I can buy a suit that will fit most 20 year old soccer players. Maybe for some folks. Not for me.

Stephen E Arnold, March 14, 2010

No one paid me to write this article. I will report a failure to charge for my writing to the editor of the Army Times, an outfit focused on information in the modern world.

« Previous PageNext Page »

  • Archives

  • Recent Posts

  • Meta