Addressing PageRank’s Issues

In 1996 when Backrub (Larry Page’s precursor to PageRank)  became available to some users, the idea of using links as a signal of importance was new to Web search. The approach had been widely used by Dr. Eugene Garfield in his link analysis work. Dr. Garfield was among the first to apply citation analysis to STM (scientific, technical, medical) information. The idea was  warmly embraced by researchers. Papers referenced frequently by other researchers were more important, in most cases, than papers infrequently referenced.

Mr. Page’s insight was applying citation analysis  to work by Web researchers like Dr. John Kleinberg, and meshing links with clever math. By 1998, Backrub had morphed into Google. Mr. Page and Mr. Brin melded their respective math and computer science skills. A misspelling of a mathematical term yielded Google, and the rest is a blur of growth and innovation.

PageRank: Long in the Tooth

Flash forward a decade. PageRank has been tuned, enhanced, and re-engineered. Google’s wizards have pushed the venerable system into audio, video, and mobile search. Google has used increasingly sophisticated mathematics to deliver relevant results to its users. Some users type in a single term such as vacation, trusting Google to deliver useful links in a few milliseconds. Other users navigate to the advanced search page and specify file type, Boolean operators, and domains to query. Either way, Google continues to deliver useful results.

Unlike Backrub, which was used by a small number of people, Google today processes upwards of a billion queries a day. The company indexes more than 12 billion Web pages. More important, Google accounts for upwards of 60 percent of the queries on the Internet worlcwide.

PageRank now includes more than 100 “factors” in its relevance calculations. The amount of computational horsepower is mind-numbing. What few people realize is that PageRank is becoming like a decade old race car. The machine can run fast, but it lacks some of the features of newer machines.

Google, always willing to look at its technology with a critical eye, has made progress in creating a “wrapper” for PageRank. This approach does not throw out PageRank. Indeed Google is likely to continue to use it for certain processes. But the “wrapper” – variously called next-generation search or the little-known acronym PSE (programmable search engine) – adds important new capabilities to Google’s system.

What’s Broken?

SEO (search engine optimization) mavens can boost a Web site’s ranking. Some of the techniques are those that Google encourages; for example:

  • A Google-style site map
  • Page titles that are brief, descriptive, and conceptually related to the Web site’s domain name
  • Appropriate use of heads and subheads
  • Original, information-filled content
  • Accurate coding.

SEO experimentation has revealed a number of glitches, wormholes, and behaviors in PageRank. Not surprisingly, some people want their Web site to appear at the top of a Google results list. Here’s a screenshot of a search for Google patents. Notice that the first result points to Google itself but the fourth result points to

'Google patents' search engine results

Contrast this for a query for the terms  translation guide. What you see is a page of links. The substance is shallow, but there are sponsored results on this page. This is an example of SEO spoofing Google. Users like me don’t like these types of page because the pages waste my time. Google doesn’t appreciate “content free” pages. To find them, Google must waste computer cycles and annoy some of its engineers with the grunt work needed to create algorithms to find these types of pages and keep them from becoming a light-weight page in the top result position. splash page, as a matter of principle, is not too enthusiastic about SEO. We do believe that Web sites with solid content, well-formed code, and informative metatags are more satisfying that mavens’ skullduggery.

Another Issue: Pre-Google Companies

As Google tweaks PageRank, the company stamps out certain types of SEO abuse. However, certain PageRank adjustments can create additional challenges for legitimate companies. One interesting “problem” surface in July 2007. While not fully resolved, our engineers identified it as a glitch that Google’s new Programmable Search Engine may address.

The idea behind the PSE is that a Web site owner can create specific instructions for Google. When the Googlebot visits a site, the special instructions are processed by the Google spider and then added to a site information file. The idea is that instead of Google figuring out what the Web site owner wants done with the site’s content, the site owner can provide Google with the rules and instructions for a site, its data, and its peculiarities, straights, and weaknesses.

Let me illustrate.

A Kentucky-based company  founded before Google was the powerhouse it is today opened its doors under the name Sumerset Houseboats. When the Web came on the firm’s radar, the firm acquired the domain name

The company has many enthusiastic customers for its line of luxury houseboats. Professional athletes, wealthy individuals, and companies with a love for “tailgating” on a lake as a prelude to a big sporting event design lavish vessels. Sumerset custom builds these floating palaces and delivers them throughout the continental United States.

As Google labored over the last nine years to fight SEO mavens, the company instituted a number of very important technologies. It’s not possible to cite all of these modifications in this short document, but here are two germane to this site:

  • Google instituted technology to correct common misspellings. For example, type Brittney Speers into a Google search box. Google dutifully says, “Did you mean Britney Spears”, automatically correcting the error.
  • Google also developed watch lists with certain words of interest in the post-911 era. The company name Sumerset contains the root word Sumer, the land between the Tigris and Euphrates rivers. Mesopotamia for those who don’t recall their Middle Eastern history is now  smack in the middle of the geography of the Iraq War.

The question becomes, “How can this legitimate site stay clear of the autocorrection feature in Google and not be confused with Sumer, a geographic place?”

Some Commonsense Steps

Some SEO mavens might suggest creating parallel Web sites using different domain names.  When a visitor comes to these dummy sites, each click redirects the user to the “real” Web site. Other tricks include hiding key words  in the pages by making them the same color as the background. This is a “trick” used by BMW, the German automobile manufacturer. Some SEO gurus scour the Web for content, make minor changes, and then stuff the site with recycled information in an attempt to fool Mother Google.

To address sites’ Google ranking when Google tweaks raise the height of the hurdle, consider these steps:

  • Original, substantive content. Navigate to the sitemap for What do you find? Links to content. Various formats are presented to the user, but most of the documents carry a “semantic payload.” Each document is on a single topic. Sumerset Houseboats can leverage this simple technique. Company executives are implementing this “content strategy” at this time (August 2007).
  • Follow Google’s rules. This means scouring the site for bad links (404 errors), incorrect code, and oversights like forgetting to include a unique, meaningful page title on each page.
  • Include a Google site map. If you need specific instructions, click here to get Google’s up-to-date instructions. (You may need to register to view this Google page.)
  • Insert appropriate headlines and subheadlines in each page. Keep in mind that the Googlebot consumes only a portion of extremely long Web pages, so keep your content short and to the point. Instead of extremely long pages with more than 100,000 characters, break the information into smaller chunks.
  • Use metatags. Many SEO mavens pooh-pooh metatags. At, metatags are a 21st century way to describe indexing. Well-chosen index terms are a very useful addition to a Web page. Furthermore, there are dozens of Web indexing operations in addition to Google. These services make use of metatags, so we believe good indexing is a plus whether Google favors them or not.

Getting Ready for the Next-Generation PageRank

The information about the Programmable Search Engine is hard to find. Much of it is buried in hard-to-read patent applications. For example, try your hand with US20070038603 “Sharing Context Data across Programmable Search Engines” or the equally snappy US20070038614 “Generating and Presenting Advertisements Based on Context Data from Programmable Search Engines.”

Sumerset Houseboats (splash screen shown below) is  working to adjust to the changes that Google and other Web indexing firms are making to their systems. The job is not easy, and it is one that is likely to be a work in progress. splash page

Keep these points in mind: There is a significant benefit for sites that can boost their rankings in a results list. The burden, therefore, shifts to legitimate Web site owners to rely on solid content, not tricks. The temptation to cheat in order to boost a site’s rankings is great. Let content, following indexing guidelines, and clean code be the touchstones.

ArnoldIT Comment

With Google’s adjustments to PageRank now moving forward for September release, content, not trickery, becomes more important. The need to provide Google with XML that instructs Google how to process a particular site will weed out many unscrupulous SEO mavens.  ArnoldIT recommends that you talk with a responsible Web marketing firm such as

