Maybe the Google Fatal Flaw Revealed
March 4, 2009
Mashable, the go to Web log for interesting cloud applications, has a blockbuster of an article here. “Why Googlers Are Leaving to Start Social Sites (And Invites to One of Them)” reveals a flaw at Google that is likely to get worse before Googlers address the issue. The comment that triggered this post was:
According to Reddy, “Most Google infrastructure is based on the original search thinking that scaling is done by using lots of cheap hardware using software layers to protect against machine failures. While this works really well for certain problem classes, there is a “scalability and complexity tax” which most new services pay in terms of development speed, even though they don’t need it in the initial phases.” The reason she sees more opportunity with Likaholix is because Google can’t always leverage open source tools due to infrastructure limitations, where Likaholix can “leverage as many open source tools as possible,” admitting that they “could not have made this much progress over the last 7 months or so in terms of product, UI and engineering if we were to build this at Google.”
Quite an interesting point. If this statement is accurate, Google’s not inept; Google is hamstrung by the wizardry that catapulted it to a dominant position in Web search. The Innovator’s Dilemma comes alive.
Stephen Arnold, March 5, 2009
Yahoo BOSS Queries per Second
March 4, 2009
Update: March 4, 2009, 5 30 pm Eastern. A relevant link from Lemur Consulting: http://www.flax.co.uk/blog/2009/03/04/performance-metrics/
A number of readers have commented via the blog feedback and by email about the Autonomy metrics I summarized here. To provide some baseline data, I dipped into my search archive an located an item that appeared in Search Engine Land in December 2008. I don’t know if these data are accurate, but judging from the feedback on the Autonomy metrics, readers are not shy about providing other data points. You can find the “Yahoo BOSS Now Serving 100 Queries per Second” write up here.
I also had a copy of a presentation given in 2004 by Gurmeet Singh, Information Sciences Institute. You can still find those data here. What is interesting about these data is that a Web interface chops down the query per second rate. In the 2004 report, 8,000 queries per second were achieved on the test system without the Web interface. Combine a large database with a Web interface and the QPS rate drops to hundreds of queries per second. Complex queries knocks performance down as well. The 2004 data hit 800 queries per second with greater drop offs when the database is larger. As I reviewed these 2004 data, I recalled reading Google technical documents about the importance of optimizing for Web interfaces. Google’s engineers must have experienced similar performance in their experience which maybe influenced the “speed” angle of Chrome. Who knows? Google won’t tell me.
You will have to draw your own conclusions about:
- Autonomy’s performance data cited above
- The validity of the ISI data
- The computational capability of Yahoo BOSS
- Google’s 2000 queries per second referenced in my Autonomy summary.
In my experience, metrics for search systems are difficult and expensive to determine. The variables have to be squeezed out when comparing systems. The notion of an apples to apples comparison is difficult in today’s financial climate. I take most performance data with a liberal amount of nuoc cham.
Stephen Arnold, March 4, 2009
YAGG: Google Groups
March 4, 2009
ZDNet Web Logs reported here that “Archived discussions on all Google Groups were unavailable for a short time this afternoon.” Groups are social. Social is hot. Might be a good idea to keep these services up and running. And don’t beat up on me. I’m not a Googler, just an addled goose. An old addled goose. I am just pointing to the post by Ed Burnette. If you have forgotten, YAGG is yet another Google glitch. Coming with shorter intervals between each event. What do you think? Normal? Signs of deterioration? Growing pains? Indifference? I am clueless.
Stephen Arnold, March 4, 2009
SEO Cheat Sheet
March 4, 2009
I never thought much about cheating in school. I just grunted along and took what grades I earned. For readers who do have a fondness for cheat sheets, short cuts, and line jumping, here’s a link for you–“The Web Developer’s SEO Cheat Sheet” by Danny Dover. There’s a link on the site to a PDF version of the “cheat sheet”. You will learn about “important” tags, indexing limits aka stay under the stuffing ceilings, syntax, url conventions, redirects, bot factoids, bot traps, robots.txt syntax, and sitemap syntax. I scanned the tips and concluded that this is less of a “cheat sheet” that a check list to avoid silly errors. One person’s cheat sheet is another person’s reminders.
Stephen Arnold, March 4, 2009
Twitter: For the Poor, the Downtrodden
March 4, 2009
If this post is accurate, Twitter is “the poor” person’s email. I read this and thought of the Statue of Liberty. Not sure why. Twitter may be like the arrival from another land who showed up, worked hard, and defined a category of search; specifically, real time search. The story “Google CEO: Twitter A ‘Poor Man’s Email System’ by Dan Frommer strikes me as accurate an eerily authentic. For me the key segment of the article was:
I think the innovation is great. In Google’s case, we have a very successful instant messaging product, and that’s what most people end up using.
If Mr. Frommer had not labeled this as a statement attribute to chief Googler Eric Schmidt, I would have hooked the statement to a Microsoft executive. Several comments:
- I don’t use Twitter but I use http://search.twitter.com. It is useful and it beats Google to the news punch on certain topics by minutes and many times by hours.
- The demographic of Twitter users strikes me as similar to what Google’s user base was prior to the consumerization of search after the Google IPO. In short, Twitter has to be viewed as an important service, and it is an important service attracting high profile people who talk about the Twitter service.
- The assumption that Twitter users will switch to Google’s system is possible, but I think Twitter has some decent legs. Is Twitter perfect? Nope. Is it important? Yep.
Google is starting to sound like Microsoft. Google, like IBM and Microsoft before it, is showing that it has lost its ability to think and act with the agility it possessed just a few short years ago. Just my opinion.
Stephen Arnold, March 4, 2009
Shift in Online Behavior May Be Evident
March 4, 2009
Enid Burns, ClickZ, wrote “More Time Spent Online Communicating than Getting Entertained” here. I think the data summarized in her article may be harbingers of a shift for some demographic sectors. You can read her article here. She summarizes a report from Netpop Research, so I don’t want to recycle her analysis. The most important point for me was this statement:
Time spent communicating online went from 27 percent of time online in 2006 to 32 percent in 2008. Communication, in the survey, includes activities such as e-mail, instant messaging, posting to blogs, and photo sharing. “We’re really looking to create personal relationships and communicating with people,” said Josh Crandall, managing director of Media-Screen Crandall.
Three observations:
- The Internet technology is absorbing broader human and communication functions. The pace will accelerate and saturation will occur in the foreseeable future in developed nations. Landlines are goners and the new net-based comm modes will ignite considerable change and innovation
- The demographic push on organizations means that social functions of those connected will move to cyberspace. Implications and consequences are difficult to pinpoint. I think the impact will be significant, leaving some traditional online companies behind quickly unless these outfits adapt.
- These services will want to coalesce into what I call a natural monopoly. This too has significant implications for users, regulators, and organizations competing in this emergent ecosystem.
In short, if these data are accurate, the next revolution is underway. Save the Google, Microsoft, and Yahoo T shirts. These items may become collector items if these firms don’t adapt to the traffic speeding down the information superhighway. Think roadkill.
Stephen Arnold, March 4, 2009
Beyond Keyword Search
March 4, 2009
An interesting tie up between LinkedIn and Twitter caught my attention. The story appeared in Search Engine Journal. Dev Basu’s “LinkedIn Teams Up with Twitter through Company Buzz” reported here that the networking service LinkedIn and the micro blogging service Twitter have teamed to offer an enterprise service. Mr. Basu wrote:
Every second thousands of people are sending out messages about topics and companies through twitter. Company Buzz lets you tap into this information flow to find relevant trends and comments about your company. Install the application and instantly see what people are saying.
This is an interesting development. Confusion about the meaning of the term “search” is commonplace. In a telephone conversation yesterday, two people on the conference call used the word “search” to describe what their organization needed. I asked each to define their understanding of the word “search”. One said, “We need to find specific data in our research reports. Not the whole document. Just the pertinent chunk.” The other said, “We need to know who knows what about a specific topic.”
The word “search” is used without much thought given to what different people mean when they throw the buzzword around.
This deal between LinkedIn and Twitter comes close to what quite a few people in the last couple of months have been describing as “search”. Key word retrieval has a place, but users want more. Will LinkedIn and Twitter dominate this market space? Hard to say. I think the deal is one to watch.
Stephen Arnold, March 4, 2009
MapReduce in a Browser: A Glimpse of the Google in 2011
March 4, 2009
I have no idea who is behind Igvita, but I will pay closer attention. You will want to read “Collaborative Map-Reduce in the Browser” here. When I read the write up and scanned the code, I thought, “Yep, this is the angle the Google is taking with Chrome, containers, and a bunch of other Googley patent documents’ “inventions”. I won’t spoil your fun. For me, the most important information in the write up is the diagram. A happy quack to Igvita. Heck, have two quacks.
Stephen Arnold, March 4, 2009
Autonomy IDOL Metrics
March 3, 2009
I was updating my files and noticed that the company had added metrics to its IDOL write up. You can find the information here. Among the information I noted were these points:
- Support over 470 million documents on 64-bit platforms
- Accurately index in excess of 110 GB/hour with guaranteed index commit times (I.e. how fast an asset can be queried after it is indexed) of sub 5ms
- Execute over 2,600 queries per second, with subsecond response times on a single machine with two CPUs when used against 70 million pieces of content, while querying the entire index for relevant information
- Support hundreds of thousands of enterprise users, or millions of web users, accessing hundreds of terabytes of data
- Save storage space with an overall footprint of less than 15% of the original file size.
These metrics are quite amazing. To buttress the argument, the company quotes a number of consultants. Happy customers include Satyam, a firm that has been in a bit of a swamp. The write up about Autonomy IDOL’s security support is equally remarkable. I did a calculation based on public data about Google. You can find that write up here. Notice that Autonomy’s system processes more queries per second than Google’s, if these data are accurate. If you have other metrics about Autonomy or any other search engine, feel free to post these data in the comments section of this Web log.
Stephen Arnold, March 3, 2009
SEO: Good, Bad, Ugly
March 3, 2009
A happy quack to the reader who sent me a link to the February 20, 2009, article by George for Insiders View: Insurance Blow here. “More and More SEO Scams” made the statement:
It seems that there are few whitehat agencies these days. I always advocate some gray hat to stay on top and some blackhat to determine what others are doing. But this is getting ridiculous. The economic climate has pushed people out of the city so instead of brokering toxic investments, they’re now brokering SEO services.
Strong words. I had seen the About.com posting “How to Avoid Being Taken by SEO Scams and Bad SEO Companies” here, but I was not sure how widespread the problem was. Dave Taylor here made this comment in his “SEO Company Promises Top Three Positions: A Scam?”:
Of all the aspects of the Internet, none seems to be so full of con artists and purveyors of dubious businesses than so-called search engine optimization companies. The reason for this is that the basics of SEO (which I’ll call it for simplicity) are simple and can be explained in five minutes. Heck, Google even has a free guide to SEO best practices.
Image source: http://3.bp.blogspot.com/_jhSlOGUoB5k/R-1-flxJm0I/AAAAAAAAE40/y1pVNDBfyXE/s400/scam.jpg
Several thoughts:
- As the economy slides toward a financial black hole, some companies hope their Web sites can be a source of sales leads and revenue. Managers turn to their marketing advisors and Web professionals to deliver a return on the Web investment. Pressure increases.
- The dominance of Google in Web search means that a company not in the Google index does not exist in some cases. A company whose product or service does not come up on the first page of Google results may not get much traffic.
- The quality of Web sites (content, coding) becomes increasingly important. But quality takes thought, time, and effort.
When one mixes these three ingredients together, search engine optimization becomes a must. If a company can afford to buy Google AdWords, then the Web site must have compelling landing pages and the technical plumbing to make it easy for the person landing on a link to take the desired action.