Google: A Utility Company

February 12, 2009

There are several news stories bopping around the datasphere this morning (February 12, 2009). The gist of these stories is that the GOOG bought an abandoned paper mill in lovely rural Finland. The speculation about the intent of this acquisition is interesting. Here’s a synopsis of items I found interesting. If you want the kicker to this post, skip the dot points and jump to the final paragraph. Now the news and analysis that feeds naive minds:

  • Reuters here reported here that the former Stora plant will be a data center.
  • PaidContent.org and the Washington Post reported here that the Stora Enso facility consumed 1,000 gigawatt hours of power
  • Global Markets reported here that part of the site will be transferred to the city of Hamina for other industrial uses.

The missing item is that Google’s patent document 20080209234 Water Based Data Center contains interesting language that includes systems and methods which can be applied to traditional power generation facilities. I wish I knew how to translate into Finnish Google Power & Light. Google Translate yields this phrase: “google teho ja kevyiden yritys”. I prefer GPL myself. I pronounce the acronym as “goo-pull”.

You can poke around other Google documents here. The “site:” instruction may be helpful too.

Stephen Arnold, February 12, 2009

IBM: Covering Its Bases

February 12, 2009

The love affair between IBM and Microsoft cooled years ago. After the divorce, Microsoft took the bank account and the personal computer industry. IBM entered counseling and emerged a consulting firm with some fascination for its former vocation as world’s leading computer and software company.

Jump to the present day. IBM has batted its eyes at Googzilla. IBM and Google have teamed to stimulate the flow of programmers from universities. You can refresh your memory here. In 2008, I received a copy of a letter to an intermediary that said, in part, we understand Googzilla quite well. The outfit interested in this answer was not the addled goose. The interested party was a certain government agency. That outfit was not confident IBM understood Googzilla fully. I wrote about this in a Web log story last year. You can find the article here.

IBM issued a news release here that has been picked up by various information services. The headline makes clear that IBM is not going steady with the GOOG: “IBM to Deliver Software via Cloud Computing With Amazon Web Services”. You can read the full article here. In a nutshell, IBM wants to

make available new Amazon Machine Images (AMIs) at no charge for development and test purposes, enabling software developers to quickly build pre-production applications based on IBM software within Amazon EC2. The new portfolio will over time extend to include Service Management capabilities from IBM Tivoli software for Amazon EC2 to help clients better control and automate their dynamic infrastructures in the cloud.

The idea is a good one. But the significance of this deal is that IBM is making clear to the GOOG that a certain someone is no longer numero uno. IBM is playing the field. Amazon has outpaced Google in some cloud services and by spending a fraction of the billions Google has invested. What’s IBM know that I don’t?

Stephen Arnold, February 12, 2009

Email Alert Web Site Online

February 12, 2009

A new web site is launching Monday that will send you e-mail alerts. Crap, I Missed It! is somewhat similar to Yotify.com (The Beyond Search addled goose wrote about Yotify.com here. CrapIMissed wants to distinguish itself by focusing on “sweet spot” alerts with info unavailable elsewhere, such as Amazon.com bestsellers, new audio CDs, upcoming concerts and so on. This isn’t like Google Alerts. You don’t enter a couple keywords and get an info dump as-it-happens or daily for every search term. Crap lines up your request into a single e-mail message per day compiled from only new information. There’s no account creation, there’s a reference archive of your alerts; and a one-step unsubscribe. For us goslings who suffer daily information overload, it may be worth a try.

Jessica Bratcher, February 12, 2009

Yahoo Monetizes BOSS Soon

February 12, 2009

A happy quack to the reader who alerted me to Yahoo’s business model for BOSS (build your own search system). BOSS has been “free”. The reasons range from building traffic to expanding the reach of Yahoo. Free is generally good. Spidering the Web and indexing content can be expensive. Yahoo knows this first hand. Certain vendors have been quick to embrace BOSS because the price was right. Besides Yahoo’s own search results often leave me unsatisfied. I have written about Cluuz.com because the company uses BOSS and the firm’s technology makes much better use of the Yahoo index than Yahoo itself does in my opinion. Now Yahoo wants to charge outfits like Cluuz.com, a small Canadian company, to use BOSS. The new story in Network World here said:

Once Yahoo introduces BOSS fees towards mid-2009, it will also increase the number of search results an engine can obtain via a single API call to 1,000 from 50. The fees vary depending on the type and quantity of search result involved. Yahoo will also offer SLAs to promote the creation of more sophisticated BOSS search engines.

Yahoo assumes that some of the outfits using BOSS will monetize their services. I think that if a company using Yahoo’s BOSS could monetize its services, the company would be monetizing now. The economy is a bit shaky and among the hardest hit are small search and content processing companies. Yahoo and others of its ilk give away search systems. Presumably Yahoo perceives a revenue win.

In my opinion, BOSS users may start looking around for ways to shift from BOSS to another service. Microsoft, are you listening? The GOOG never listens but maybe the idea of tweaking Yahooligans again might spark some activity.

Maybe this is a brilliant play that will reverse Yahoo’s search fortunes? I hope so. I am uncomfortable watching the company follow the trajectory of Ask.com and America Online, among others in the information processing game.

Stephen Arnold, February 12, 2009

Exalead Accelerates

February 12, 2009

Exalead is flexing its muscles by powering search for the famous Wellcome Trust Sanger Institute (http://www.sanger.ac.uk). Sanger does genome research involving large scale sequencing and analysis like the Human and Cancer Genome projects. The institute needed a powerful search solution for its monstrous scientific data requirements, and Exalead offered a scalable, flexible program that can handle a lot of abuse: the data quantity at Sanger grows by about 120 million records annually. Now a simple text request will return results from a pool of 500+ million data–a mere puddle compared to the projection of a possible 20 billion files over time.  You can read the case study here. Looks like Exalead has a huge task ahead of it. If it succeeds, it will be indexing possibly one of the biggest public databases in the world. A happy quack to the Exalead team.

Jessica West Bratcher, February 12, 2009

Business Intelligence: An Useful and Intelligent Review

February 11, 2009

A happy quack to the reader who alerted me to Stephen Swoyer’s “How BI Habits Are Changing” here. In general, I agree with his points. What impressed me was the jargon free analysis of the changes in business intelligence. For me, the most interesting comment was:

SAP has a lot of account control in SAP shops that makes up-selling to Business Objects possible. With IBM, it seems the WebSphere and consulting services bring influence.” she [Cindi Howson — founder of BIScorecard.com] says. “Customers that are only using DB2 — so that’s all that IBM is in the account — don’t seem to be looking at Cognos in a new light. So again, it really depends who has strong account control… [emphasis added]

You may find other nuggest in Mr. Swoyer’s write up. I don’t get too excited about business intelligence, but I think one can discern a movement in this market sector that may provide some hint of what lies in the future for enterprise search and content processing. Worth reading.

Stephen Arnold, February 11, 2009

More SEO Oh, Oh

February 10, 2009

Years ago when I worked at Ziff Communications I listened to John Dvorak. He was an insightful thinker, and his expertise was not leveraged at Ziff. I left the company coincident with Bill Ziff’s announcement of his intention to sell his empire. Mr. Dvorak continued to work on Ziff properties after the break up. I have followed his comments over the years, and I regret that I was probably a suit in a crowd in meetings we both attended.

This morning I read PCMag.com’s “SEO Fiascoes: the Trouble with Search Engine Optimization” and noticed that he wrote the article. The article does a good job of pointing out what I have long known. The statement “…from what I can tell its proponents are modern snake-oil salesmen” is coincident with my research findings.

I steer clear of the SEO crowd. I gave talks at several SEO conferences six or seven years ago, and it was clear to me that this group of “experts” were promising Web site traffic by tricking the indexing subsystems. I mentioned this in The Google Legacy (2005), and that accomplished one thing: no SEO conference organizer wanted me on their program.

Please, navigate here and read Mr. Dvorak’s column. I am not going to summarize it. He is a good writer and an addled goose can do little to make his message more clear. A happy quack for his taking a stand on an issue and indirectly the consultants who snooker clients to pay to get traffic without investing in high value content.

Stephen Arnold, February 10, 2009

Patent Search from Perfect Search

February 10, 2009

The Perfect Search Corp. Google patent search attracted significant traffic on February 9, 2009. If you have not tried the system, navigate to http://arnoldit.perfectsearchcorp.com and then open a new window for your browser. You will want to point that new tab or window at http://www.google.com/patents. Now run the following query in each system: programmable search engine.

Here’s what Google displays to you:

pse google

The Google results force me to track down the full patent document to figure out if it is “sort of” what I want.

Run the same query programmable search engine and you get these results:

pse perf search

The ArnoldIT.com and Perfect Search collection makes it easy to decide if a hit is germane. A single click delivers the PDF of the patent document. Everything is clear and within a single interface.

Try these services and run the same query. You will find that you can dig into Google’s technology quickly and without the silly extra steps that other services insist upon. The USPTO search system is here. How about that USPTO search interface? I don’t want to name the vendor who provides this system and I don’t want to call attention to this company’s inability to make a user accessible system. FreePatentsOnline.com is here. Do you understand the results and how you access the various parts of a patent document? I do, but it takes quite a bit of work.

Notice the differences. First, the abstracts or summary of the patent is more useful because it contains substantive detail. Second, the key words in the query are in bold face, making it easy to spot the terms that were in your query. Third, notice the link to the PDF file. You don’t see fragments of the patent document. You get one click access to the patent document plus the diagrams if any. Fourth, because the Google patent collection includes only Google patent documents, you can easily explore the technical innovations that often “fly under the radar” of the Google watchers who deal with surface issues.

Information about the Perfect Search system is here. You can read an interview with one of the Perfect Search senior engineers, Ken Ebert, here. A happy quack to the Perfect Search team for contributing to this free Google patent document search and full text delivery service. Right now the system includes the Google patent documents that I have been able to identify in the course of my research into Google’s technical infrastructure. I cannot say with certainty that this collection has every Google patent application and granted patent. If you know of a Google patent document, I have overlooked, please, let me know. I am not an attorney and take my advice, “Don’t use this system as your only source of Google patent information.” It’s free and not the whiz bang service that West and Lexis provide for a fee. A hefty fee I might add.

Stephen Arnold, February 10, 2009

More Google Layoffs

February 10, 2009

Interesting article at http://blog.searchenginewatch.com/blog/090208-230735 concerning the GOOG and quiet employee cuts. According to the blog post, some industry people are all a-Twitter with the news, acting like it was unexpected. But Google has already been in the news about some paring down that included RIFs (engineers too), the apparent closure its office in Trondheim, Sweden, (http://arnoldit.com/wordpress/2009/01/15/google-shutters-trondheim-office/) and the announcement that it was reorganizing Google Israel offices (http://arnoldit.com/wordpress/2009/02/08/google-israel-re-googles/). With the economy the way it is, why are people surprised that even the GOOG would have a deadwood reduction plan? As a side note, one of the addled geese here at Beyond Search wrote about the great Twitter versus Google battle. You can read that article here.  http://arnoldit.com/wordpress/2009/02/09/google-threatened-by-twitter/. Maybe Twitter’s the as-it-happens news source, replacing the you-were-there of the dead tree crowd.

Jessica W. Bratcher, February 10, 2009

Semantic Engines Dmitri Soubbotin Exclusive Interview

February 10, 2009

Semantics are booming. Daily I get spam from the trophy generation touting the latest and greatest in semantic technology. A couple of eager folks are organizing a semantic publishing system and gearing up for a semantic conference. I think these efforts are admirable, but I think that the trophy crowd confuses public relations with programming on occasion. Not Dmitri Soubbotin, one of the senior managers at Semantic Engines. Harry Collier and I were able to get the low-profile wizard to sit down and talk with us. Mr. Soubbotin’s interview with Harry Collier (Infonortics Ltd.) and me appears below.

Please, keep in mind that Dmitri Soubbotin is one of world class search, content processing, and semantic technologies experts who will be speaking at the April 2009 Boston Search Engine Meeting. Unlike fan-club conferences or SEO programs designed for marketers, the Boston Search Engine Meeting tackles substantive subjects in an informed way. The opportunity to talk with Mr. Soubbotin or any other speaker at this event is a worthwhile experience. The interview with Mr. Soubbotin makes clear the approach that the conference committee for the Boston Search Engine Meeting. Substance, not marketing hyperbole is the focus for the two day program. For more information and to register, click here.

Now the interview:

Will you describe briefly your company and its search / content
processing technology?

Semantic Engines is mostly known for its search engine SenseBot (www.sensebot.net). The idea of it is to provide search results for a user’s query in the form of a multi-document summary of the most relevant Web sources, presented in a coherent order. Through text mining, the engine attempts to understand what the Web pages are about and extract key phrases to create a summary.

So instead of giving a collection of links to the user, we serve an answer in the form of a summary of multiple sources. For many informational queries, this obviates the need to drill down into individual sources and saves the user a lot of time. If the user still needs more detail, or likes a particular source, he may navigate to it right from the context of the summary.

Strictly speaking, this is going beyond information search and retrieval – to information synthesis. We believe that search engines can do a better service to the users by synthesizing informative answers, essays, reviews, etc., rather than just pointing to Web sites. This idea is part of our patent filing.

Other things that we do are Web services for B2B that extract semantic concepts from texts, generate text summaries from unstructured content, etc. We also have a new product for bloggers and publishers called LinkSensor. It performs in-text content discovery to engage the user in exploring more of the content through suggested relevant links.

What are the three major challenges you see in search / content processing in 2009?

There are many challenges. Let me highlight three that I think are interesting:

First,  Relevance: Users spend too much time searching and not always finding. The first page of results presumably contains the most relevant sources. But unless search engines really understand the query and the user intent, we cannot be sure that the user is satisfied. Matching words of the query to words on Web pages is far from an ideal solution.

Second, Volume: The number of results matching a user’s query may be well beyond human capacity to review them. Naturally, the majority of searchers never venture beyond the first page of results – exploring the next page is often seen as not worth the effort. That means that a truly relevant and useful piece of content that happens to be number 11 on the list may become effectively invisible to the user.

Third, Shallow content: Search engines use a formula to calculate page rank. SEO techniques allow a site to improve its ranking through the use of keywords, often propagating a rather shallow site up on the list. The user may not know if the site is really worth exploring until he clicks on its link.

With search / content processing decades old, what have been the principal barriers to resolving these challenges in the past?

Not understanding the intent of the user’s query and matching words syntactically rather than by their sense – these are the key barriers preventing from serving more relevant results. NLP and text mining techniques can be employed to understand the query and the Web pages content, and come up with an acceptable answer for the user. Analyzing
Web page content on-the-fly can also help in distinguishing whether a page has value for the user or not.
Of course, the infrastructure requirements would be higher when semantic analysis is used, raising the cost of serving search results. This may have been another barrier to broader use of semantics by
major search engines.

What is your approach to problem solving in search and content processing? Do you focus on smarter software, better content processing, improved interfaces, or some other specific area?

Smarter, more intelligent software. We use text mining to parse Web pages and pull out the most representative text extracts of them, relevant to the query. We drop the sources that are shallow on content, no matter how high they were ranked by other search engines. We then order the text extracts to create a summary that ideally serves as a useful answer to the user’s query. This type of result is a good fit for an informational query, where the user’s goal is to
understand a concept or event, or to get an overview of a topic. The closer together are the source documents (e.g., in a vertical space), the higher the quality of the summary.

Search / content processing systems have been integrated into such diverse functions as business intelligence and customer support. Do you see search / content processing becoming increasingly integrated
into enterprise applications?

More and more, people expect to have the same features and user interface when they search at work as they get from home. The underlying difference is that behind the firewall the repositories and taxonomies are controlled, as opposed to the outside world. On one hand, it makes it easier for a search application within the enterprise as it narrows its focus and the accuracy of search can get higher. On the other hand, additional features and expertise would be required compared to the Web search. In general, I think the opportunities in the enterprise are growing for standalone search
providers with unique value propositions.

As you look forward, what are some new features / issues that you think will become more important in 2009? Where do you see a major break-through over the next 36 months?

I think the use of semantics and intelligent processing of content will become more ubiquitous in 2009 and further. For years, it has been making its way from academia to “alternative” search engines, occasionally showing up in the mainstream. I think we are going to see much higher adoption of semantics by major search engines, first of all Google. Things have definitely been in the works, showing as small improvements here and there, but I expect a critical mass of
experimenting to accumulate and overflow into standard features at some point. This will be a tremendous shift in the way search is perceived by users and implemented by search engines. The impact on the SEO techniques that are primarily keyword-based will be huge as well. Not sure whether this will happen in 2009, but certainly within
the next 36 months.

Graphical interfaces and portals (now called composite applications) are making a comeback. Semantic technology can make point and click interfaces more useful. What other uses of semantic technology do you see gaining significance in 2009? What semantic considerations do you bring to your product and research activities?

I expect to see higher proliferation of Semantic Web and linked data. Currently, the applications in this field mostly go after the content that is inherently structured although hidden within the text – contacts, names, dates. I would be interested to see more integration of linked data apps with text mining tools that can understand unstructured content. This would allow automated processing of large volumes of unstructured content, making it semantic web-ready.

Where can we find more information about your products, services, and research?

Our main sites are www.sensebot.net and www.semanticengines.com. LinkSensor, our tool for bloggers/publishers is at www.linksensor.com. A more detailed explanation of our approach with examples can be found in the following article:
http://www.altsearchengines.com/2008/Q7/22/alternative-search-results/.

Stephen Arnold (Harrod’s Creek, Kentucky) and Harry Collier (Tetbury, Glou.), February 10, 2009

« Previous PageNext Page »

  • Archives

  • Recent Posts

  • Meta