Web Search Usage Statistics
October 16, 2009
comScore has responded to earlier data from about Web search vendors’ September 2009 market share. I found the write up What 5% Drop? ComScore Says Bing Search Share Stayed Steady In September in TechCrunch thought provoking. For me, the meat of the article was this comment:
According to comScore, Bing’s U.S. search market share remained steady at 9.4 percent in September, up from 9.3 percent in August. That is not blowing the doors off of anything, but it is at least holding its own.
I find it interesting that the estimates of traffic are viewed as absolutes. Most of the companies creating league tables use proprietary methods to generate their data. Variances are to be expected. The margins of error can be significant. In one case in 2008, I looked at data for companies in one industry in Europe. I had “real” logs. I also had reports on traffic from a number of vendors. What I learned was that the variance between the actual logs of the sites’ traffic and the commercial league tables was a variance of as much as 20 percent.
I don’t have an answer for the usage variances. The advantage goes to the company that can count everything and avoid statistical methods. Estimates are going to create some false impressions in my experience.
Stephen Arnold, October 16, 2009
Government Attic
October 16, 2009
A happy quack to the reader who sent me a link to Government Attic. The Web site is a repository of US government documents obtained via a Freedom of Information Act request. The site uses a Google custom search engine, which works quite well. The site says:
Governmentattic.org provides electronic copies of hundreds of interesting Federal Government documents obtained under the Freedom of Information Act. Fascinating historical documents, reports on items in the news, oddities and fun stuff and government bloopers, they’re all here. Think of browsing this site as rummaging through the Government’s Attic — hence our name.
Useful.
Stephen Arnold, October 16, 2009
Guha and the Google Trust Method Patent
October 16, 2009
I am a fan of Ramanathan Guha. I had a conversation not long ago with a person who doubted the value of my paying attention to Google’s patent documents. I can’t explain why I find these turgid, chaotic, and cryptic writings of interest. I read stuff about cooling ducts and slugging ads into anything that can be digitized, and I yawn. Then, oh, happy day. One of Google’s core wizards works with attorneys and a meaningful patent document arrives in Harrod’s Creek goose nest.
Today is such a day. The invention is “Search Result Ranking Based on Trust” which you can read courtesy of the every reliable USPTO by searching for US7,603,350 (filed in May 2006). Dr. Guha’s invention is described in this patent in this way:
A search engine system provides search results that are ranked according to a measure of the trust associated with entities that have provided labels for the documents in the search results. A search engine receives a query and selects documents relevant to the query. The search engine also determines labels associated with selected documents, and the trust ranks of the entities that provided the labels. The trust ranks are used to determine trust factors for the respective documents. The trust factors are used to adjust information retrieval scores of the documents. The search results are then ranked based on the adjusted information retrieval scores.
Now before you email me and ask, “Say, what?”, let me make three observations:
- The invention is a component of a far larger data management technology initiative at Google. The implications of the research program are significant and may disrupt the stressed world of traditional RDBMS vendors at some point.
- The notion of providing a “score” that signals the “reliability” or lack thereof is important in consumer searches, but it has some interesting implications for other sectors; for example, health.
- The plumbing to perform “trust” scoring on petascale data flows gives me confidence to assert that Microsoft and other Google challengers are going to have to get in the game. Google is playing 3D chess and other outfits are struggling with checkers.
You can read more about Dr. Guha in my Google Version 2.0. He gets an entire chapter (maybe 30 pages of 10 pt type) for a suite of inventions that make it possible for Google to be the “semantic Web”. lever company, brilliant guy, Guha is.
Stephen Arnold, October 15, 2009
Dust Up between Libraries and Publishers Possible
October 16, 2009
The New York Times reported that some libraries are lending digital books. You will want to read the original article “Libraries and Readers Wade Into Digital Lending” yourself. For me the most important statement in the write up was:
Publishers, inevitably, are nervous about allowing too much of their intellectual property to be offered free. Brian Murray, the chief executive of HarperCollins Publishers Worldwide, said Ms. Smith’s proposal was “not a sustainable model for publishers or authors.”
I am intrigued that Microsoft and Yahoo pulled out of the digital book game. Google faces tough sledding but a compromise seems to be possible. Even government national libraries are slow to the starting line.
The world’s traditional book foundations seem to be under increasing stress. Exciting. The business model of libraries is about to collide with the business model of publishers. After centuries of living in harmony, friction seems to be increasing.
Stephen Arnold, October 17, 2009
Google Probes the Underbelly of AutoCAD
October 15, 2009
Remember those college engineering wizards who wanted to build real things? Auto fenders, toasters, and buildings in Dubai. Changes are the weapon of choice was a software product from Autodesk. Over the years, Autodesk added features and functions to its core product and branched out into other graphic areas. In the end, Autodesk was held captive by the gravitational pull of AutoCAD.
In one of my Google monographs, I wrote about Google’s SketchUp program. I recall several people telling me that SketchUp was unknown to them. These folks, I must point out, were real, live Google experts. SketchUp was a blip on a handful of users’ radar screen. I took another angle of view, and I saw that the Google coveted the engineering wizards when they were in primary school and had a method for keeping these individuals in the Google camp until they designed their last, low-cost fastener for a green skyscraper in Shanghai.
No one really believed that this was possible.
My suggestion is that some effort may be prudently applied to rethinking what the Google is doing with engineering software that makes pictures and performs other interesting Googley tricks. The first step could be reading the Introducing Google Building Maker article on the “official” Google Web log. I would gently suggest that the readers of this Web log buy a copy of the Google trilogy, consisting of my three monographs about Google technology. Either path will give you some food for thought.
For me, the most interesting comment in the Google blog post was:
Some of us here at Google spend almost all of our time thinking about one thing: How do we create a three-dimensional model of every built structure on Earth? How do we make sure it’s accurate, that it stays current and that it’s useful to everyone who might want to use it? One of the best ways to get a big project done — and done well — is to open it up to the world. As such, today we’re announcing the launch of Google Building Maker, a fun and simple (and crazy addictive, it turns out) tool for creating buildings for Google Earth.
The operative phrase is “every built structure on early”. How is that for scale?
What about Autodesk? My view is that the company is going to find itself in the same position that Microsoft and Yahoo now occupy with regard to Google. Catch up is impossible. Leap frogging is the solution. I don’t think the company can make this type of leap. Just my opinion.
Stephen Arnold, October 15, 2009
Another freebie. Not even a lousy Google mouse pad for my efforts.
How to Make Money with Google AdSense Video Released
October 15, 2009
You can watch a four minute video that provides a quick primer on how to make money with AdSense. To view the video, navigate to ArnoldIT.com and click on the Video link or click here. The video has been produced by the ArnoldIT.com team to fill a gap in the flood of information about AdSense. “The idea,” said Stephen E. Arnold, “was to put in one place a quick overview and links that a person needs to get started with AdSense. My hope is that libraries will point patrons who want to find possible business ideas to these videos.” He added, “Google provides the information, but we learned from our client work that a quick overflight of the Google money making options as needed.”
The video series was announced at the International Computers in Library Conference in London, England today, October 15, 2009. In his talk he said, “Google offers the same type of opportunity for third parties as did Microsoft in the early 1980s. In these tough economic times, an understanding of the revenue potential the Google platform provides is a prudent business step.”
Five more videos in the “How to Make Money with Google” series will be released in the coming weeks. A person looking for extra revenue or a way to build a new career by focusing on the opportunities presented by the Google platform can view one or more of these videos to get ArnoldIT.com’s view about what Google offers.
The next free video “Search Engine Optimization Consulting” will be released at the end of October 2009. Other free videos in the series cover writing programs for the Google platform, becoming a Google partner, and introductory and wrap up videos.
The videos are provided without charge for two reasons. According to Mr. Arnold, “We received client questions and spam promising “get rich quick” schemes regarding Google. I decided it would be a useful exercise to produce brief, factual videos to make clear that Google is a significant opportunity for motivated individuals, organizations, and commercial enterprises. Many people see Google as a one trick pony, even though Google has matured into a platform for programmers, consultants, and computer service businesses.”
The full series will be out by the end of November 2009 and can be viewed as individual videos or as a 35 minute program. ArnoldIT.com is not affiliated with Google. The videos were designed and funded by Stephen E. Arnold.
Jessica Bratcher, October 15, 2009
No one paid for this write up.
Google Wants to Be a Media Company = Content Delivery Network Rumors
October 15, 2009
Barron’s is one of those business newspapers that blends caution with molecules of nouns to whip investors into a frenzy of uncertainty. Barron’s “Akamai Rallies on Rumor of Google Bid” is an interesting write up. CDNs or content delivery networks are complicated. Akamai has proprietary technology, legions of ISPs on board, and nifty methods for getting popular content to a user quickly. An investor type, who actually bought me lunch at Taco Bell, floated this idea past me. I pointed out:
- Akamai is sophisticated outfit
- Akamai has plumbing in place and on-board ISPs who get financial and bandwidth benefits from their support of the Akamai methods. These involve the injection of smart bits in packets and some other magic
- Video is becoming the method of communication in the emerging semi literate world of the US of A
- Companies with a plan to be a media giant can benefit from owning an Akamai or similar outfit because it generates revenue and provides a convenient way to slash certain operational costs.
Barron’s said:
Briefing.com notes that AKAM calls are seeing buying interest this morning amid “GOOG for AKAM chatter.” I’m not sure that Google really wants to be in the content delivery network business, particularly given a spreading view on the Street that AKAM’s results could be hurt by intensifying pricing pressure in the CDN market. But clearly, somebody believe the rumor.
See fan and back peddle. Fan and back peddle.
With churn the name of one popular game on Wall Street, I sure don’t know if Googzilla is going to gobble up the staff and the technology at Akamai. Google has its own CDN in place, but with the volume of rich media that will be coming down the road in the months ahead, this type of acquisition makes sense to me. Akamai has technology, ISP relationships, plumbing, and people. Did I mention really good people?
Stephen Arnold, October 15, 2009
Sadly no one paid me to write this article. The investor on Friday bought me a chicken thing with a made up name, though.
The Microsoft UX Wins AP Love
October 14, 2009
A happy quack to the reader who sent me the link to Gawker’s “AP’s Betting the Farm Microsoft Will Crush Google”. The story reports that Microsoft’s new interface (user experience or UX) approach is going to allow Microsoft to catch up with Google. If you are a fan of the AP’s view of technology, check out the article. If you think that Google’s 80 percent market share is too large a gap to narrow, you may want to skip the article. For me the most interesting point in the write up was the hint that Google and the AP have not been engaged in productive, frequent discussions. I don’t think the AP is sufficiently Googley to click with the Mountain View crowd.
Stephen Arnold, October 14, 2009
Exclusive Interview with CTO of BrightPlanet Now Available
October 13, 2009
William Bushee, BrightPlanet’s Vice President of Development and the company’s chief technologist, spoke with Stephen E. Arnold. The exclusive interview appears in the Search Wizards Speak series. Mr. Bushee was among the first search professionals to tackle Deep web information harvesting. The “Deep Web” refers to content that traditional Web indexing systems cannot access. Deep Web sites include most major news archives as well as thousands of specialized sources. These sources typically represent the best, most definitive content sources for their subject area. For example, in the health sciences field, the Centers for Disease Control, National Institutes of Health, PubMed, Mayo Clinic, and American Medical Association are all Deep Web sites, often inaccessible from conventional Web crawlers like Google and Yahoo. BrightPlanet supported the ArnoldIT.com analysis of the firm’s system. As a result of this investigation, the technology warranted an in depth discussion with Mr. Bushee.
The wide ranging interview focuses on BrightPlanet’s search, harvest, and OpenPlanet technology. Mr. Bushee told Search Wizards Speak: “As more information is being published directly to the Web, or published only on the Web, it is becoming critical that researchers and analysts have better ways of harvesting this content.”
Mr. Bushee told Search Wizards Speak:
There are two distinct problems that BrightPlanet focuses on for our customers. First we have the ability to harvest content from the Deep Web. And second, we can use our OpenPlanet framework to add enrichment, storage and visualization to harvested content. As more information is being published directly to the Web, or published only on the Web, it is becoming critical that researchers and analysts have better ways of harvesting this content. However, harvesting alone won’t solve the information overload problems researches are faced with today. The answer to a research project cannot be simply finding 5,000 raw documents, no matter how good they are. Researchers are already overwhelmed with too many links from Google and too much information in general. The answer needs to be better harvested content (not search), better analytics, better enrichment and better visualization of intelligence within the content – this is where BrightPlanet’s OpenPlanet framework comes into play. While BrightPlanet has a solid reputation within the Intelligence Community helping to fight the “War on Terror” our next mission is to be known as the commercial and academic leaders in harvesting relevant, high quality content from the Deep Web for those who need content for research, business intelligence or analysis.
You can read the full text of the interview at http://www.arnoldit.com/search-wizards-speak/brightplanet.html. More information about the company’s products and services is available at http://www.brightplanet.com. Mr. Bushee’s technology has gained solid support from some professional researchers and intelligence agencies. BrightPlanet has moved “beyond search” with its suite of content processing technology.
Stephen Arnold, October 13, 2009
Google and Content Processing
October 12, 2009
I find the buzz about Google’s upgrades to its existing services and the chatter about Google Books interesting but not substantive. My interest is hooked when Google provides a glimpse of what its researchers are investigating. I had a conversation last week that pivoted on the question, “Why would anyone care what a researcher or graduate students working with Google do?” The question is a good one and it illustrates how angle of view determines what is or what is not important. The media find Google Books fascinating. The Web log authors focus on incremental jumps in Google’s publicly accessible functions. I look for deeper, tectonic clues about this trans-national, next generation company. I sometimes get lonely out on my frontier of research and analysis, but, as I said, perspective is important.
That’s why I want to highlighting a dense, turgid, and opaque patent application with the fetching title “Method and System for Processing Published Content on the Internet”. The document was published on October 8, 2009, but the ever efficient USPTO. The application was filed on June 9, 2009, but its technology drags like an earthworm through a number of previous Google filings in 2004 and more recent disclosures such as the control panel for a content owner’s administering of a distribution and charge back for content. As an isolated invention, the application is little more than a different charge at the well understood world of RSS feeds. The problem Google’s application resolves is inserting ads into RSS content without creating “unintended alerts”. When one puts the invention is a broader context, the system and method of the invention is more flexible and has a number of interesting applications. These are revealed in the claims section of the patent application.
Keep in mind that I am not a legal eagle. I am an addled goose. Nevertheless, what I found suggestive is that the system and method hooks into my analysis of Google’s semantic functions, its data management systems, and, of course, the guts of the Google computational platform itself for scale, performance, and access to other Google services. In short, this is a nifty little invention. The component that caught my attention is the controls made available to publishers. The idea is that a person with a Web log can “steer” or “control” some of the Google functions. The notion of an “augmented” feed in the context of advertising speaks to me of Google’s willingness to allow a content producer to use the Google system like a giant information facility. Everything is under one roof and the content producer can derive revenue by using this facility like a combination production, distribution, and monetization facility. In short, the invention builds out the “digital Gutenberg” aspect of the Google platform.
Here’s how Google explains this invention:
The invention is a method for processing content published on-line so as to identify each item in a unique manner. The invention includes software that receives and reads an RSS feed from a publisher. The software then identifies each item of content in the feed and creates a unique identifier for each item. Each item then has third party content or advertisements associated with the item based on the unique identifier. The entire feed is then stored and, when appropriate, updated. The publisher then receives the augmented feed which contains permanent associations between the third party advertising content and the items in the feed so that as the feed is modified or extended, the permanent relationships between the third party content and previously existing feed items are retained and readers of the publisher’s feed do not receive a false indication of new content each time the third party advertising content is rotated on an item.
The claims wander into the notion of a unique identifier for content objects, item augmentation, and other administrative operations that have considerable utility when applied at scale within the context of other Google services such as the programmable search engine. This is a lot more interesting than a tweak to an existing Google service. Plumbing is a foundation, but it is important in my opinion.
Stephen Arnold, October 12, 2009