YAGG: Google Talk

February 24, 2009

Tweets and posts are flying by about an alleged pfishing exploit for Google email. Mashable reports here that another issue may be poking its snout into hapless users’ lives. Adam Ostrow wrote:

Gmail is now being attacked by a phishing scam that is spreading like wildfire.

If true, YAGG strikes again. You get message “check me out” with a link to a tinyurl. Click the puppy and you go to “a site called ViddyHo.” Lucky you. Your contacts get an email. Nifty. Love those tiny urls which mask the destination url.

Stephen Arnold, February 24, 2009

Amazon: Outage Reported

February 24, 2009

The old US of A’s computing infrastructure seems to be showing that it ain’t what it used to be. ComputerWorld’s Sumner Lemon wrote here “Amazon Search Engine Suffers Brief Outage.” I have not been too thrilled with some of the features of the A9 system. But my quibbles are minor compared to the search system’s not working. Search is the means by which Amazon generates the bulk of its money. The vaunted cloud services are still modestly sized French fries at the Amazon revenue feast. The system was down for about an hour, but, hey, cloud services are supposed to have rock solid uptime this addled goose thought.

Stephen Arnold, February 24, 2009

Google: Yet Another Google Glitch

February 24, 2009

YAGG (a new acronym pronounced like “gag” as in choke) has been coined by the goslings in the mine drainage pond. The addled goose has little to add to the Washington Post’s headline “Trouble in the Clouds: Gmail Turns into Gfail” here. Xooglers are sending nasty grams to other Googlers, when Gmail and MOMA work obviously. Glitches are becoming very Vista like in the opinion of the addled goose. The reasons offered by main stream dead tree publications omit such interesting causes as:

  • Googlers are smart, but the size of the company has made the culture susceptible to the Microsoft product management disease
  • Dependencies within the system are usually trapped by Google’s compile time checks and the peer quality assurance project, but as more Googlers become too busy, little errors can grow up to be big mistakes. Google has not created an Orkut class issue, but the Gmail issue is more immediate
  • Problems are evident in such unrelated areas as ad metrics, malware flagging, and today mail.

Too bad there is no competitor in a position to challenge the GOOG. A decade of indifference has created a culture of failure among Google’s direct competitors and now a soupƧon is evident to the addled goose in some Google functions. Just my honking opinion. I don’t have a fix. The future is evident to me for some Google services. I can see that vista before me. Can you? More pointedly, can you see your Gmail?

Stephen Arnold, February 24, 2009

Twitter Security: An Oxymoron

February 24, 2009

PCWorld’s Joan Goodchild wrote an interesting article about Twitter’s security issues here. She identifies three potential areas of concern. First, a url shortener can send a hapless user to an unknown and potentially harmful location. Second, she identifies a lack of email authentication. And, third, my favorite: Twitter can be useful those who want to “follow” a person. The addled goose is confident that these three issues do not exhaust the security vulnerabilities. The goose does not directly Twitter, send tweets, or fiddle with Twitter ecosystem tools. Those who follow the goose often want to cook it. Could Twitter users get their geese cooked?

Stephen Arnold, February 24, 2009

Mysteries of Online 8: Duplicates

February 24, 2009

In print, duplicates are the province of scholars and obsessives. In the good old days, I would sit in a library with two books. I would then look at the data in one book and then hunt through the other book until I located the same or similar information. Then I would examine each entry to see if I could find differences. Once I located a major difference such as a number, a quotation, or an argument of some type, I would write down that information on a 5×8 note card. I had a forensics scholarship along with some other cash for guessing accurately on objective tests. To get the forensics grant, I had to participate in cross examination debate, extemporaneous speaking, and just about any other crazy Saturday time waster my ā€œcoachesā€ demanded.

Not surprisingly, mistakes or variances in books, journals, and scholarly publications were not of much concern to some of the students who attended the party school that accepted an addled goose with thick glasses. There were rewards for spending hours looking for information and then chasing down variances. I recall that our debate team, which was reasonably good if you liked goose arguments, were putting up with a team from Dartmouth College. I was listening when I heard a statement that did not match what I had located in a government reference document and in another source. The opponent from Dartmouth had erroneously presented the information. I gave a short rebuttal. I still remember the look of nausea that crossed our opponent’s face when she realized that I presented what I found in my hours of manual checkingĀ  and reminded the judges that distorting information suggests an issue with the argument. We won.

image

For most people, the notion of having two individuals with the same source is an example of duplicate information. Upon closer inspection, duplication does not mean identical in gross features. Duplication drills down to the details of the information and to the need to determine which item of information is at variance and then figuring out why and what is the most likely version of the duplicate.

That’s when the fun begins in traditional research. An addled goose can do this type of analysis. Brains are less important than persistence and a toleration for some dull, tedious work. As a result, finding duplicative information and then figuring out variances was not something that the typical college sophomore spends much time doing.

Enter computer systems.

Read more

Deep Web, Surface Sparkles Occlude Deeper Look

February 23, 2009

You can read pundits, mavens, and wizards comment on the New York Times’s “Exploring a Deep Web that Google Can’t Grasp.” The original is here for a short time. Analysis of varying degrees of usefulness appear in Search Engine Land and the Marketing Pilgrim’s “Discovering the Rest of the Internet Iceberg” here.

There’s not much I can say to reverse the flow of misinformation about what Google is doing because Google doesn’t talk to me or to the pundits, mavens, and wizards who explain the company’s alleged weaknesses. In 2007, I wrote a monograph about Google’s programmable search engine disclosures. Published by BearStearns, this document is no longer available. I included the dataspace research in my Beyond Search study for The Gilbane Group in April 2008. In September, I then with Sue Feldman wrote about Google’s dataspace technology. You can getĀ  copy of the dataspace report directly from IDC here. Ask for document 213562. Both of these studies explicate Google’s activities inĀ  structured data and how those data mesh with Google’s unstructured information methods. I did a detailed explanation of the programmable search engine inventions in Google Version 2.0. That report is still available, but it costs money and I will be darned if I will restate information that is in a for fee study. There are some brief references to these technologies available at ArnoldIT.com without charge and in the archive to this Web log. You can search the ArnoldIT.com archive at www.arnoldit.com/sitemap.html and this Web log from the search box on any blog page.

lga sfo

This sure looks like “deep Web” information to me. But I am not a maven, wizard, or pundit. Nor do I understand search with the depth of the New York Times, search engine optimization experts, and trophy generation analysts. I read patent documents, an activity that clearly disqualifies me from asserting that Google can’t perform a certain action based on its disclosed in open source disclosures. Life is easier when such disclosures are ignored or excluded from the research process.

So what? Two points:

  1. Google can and does handled structured data. Examples exist in the wild at base.google.com and by entering the query “lga sfo” from Google.com’s search box.
  2. Yip yap about the “deep Web” has been popular for a while, and it is an issue that requires more analysis than assertions based on relatively modest research into the subject

In my opinion, before asserting that Google’s is baffled, off track, clueless, or slow on the trigger–look a bit deeper than the surface sheen on Googzilla’s scales. No wonder outfits are surprised with some of Google’s “effortless” initiatives. By dealing with superficiality, the substance is not seen for what resides under the surface.

Pundits, mavens, wizards, please, takeĀ  moment to look into Guha, Halevy, and the other Googlers who have thought about and who are working on structured, semistructured, and unstructured data in the Google data environment. That background will provide some context for Google’s apparent sluggishness in this “space”.

Stephen Arnold, February 23, 2009

Exclusive Interview, Martin Baumgartel, From Library Automation to Search

February 23, 2009

For many years, Martin Baumgartel worked for a unit of T-Mobile. His experience spans traditional information retrieval and next-generation search. Stephen Arnold and Harry Collier interviewed Mr. Baumgartel on February 20, 2009. As one of the featured speakers at the premier search conference this spring, you will be able to hear Mr. Baumgartel’s lecture and meet with him in the networking and post presentation breaks. The Boston Search Engine Meeting attracts the world’s brightest minds and most influential companies to an “all content” program. You can learn more about the conference, the tutorials, and the speakers at the Infonortics Ltd. Web site. Unlike other conferences, the Boston Search Engine Meeting limits attendance in order to facilitate conversations and networking. Register early for this year’s conference.

What’s your background in search?

When I entered the search arena in the 1990s, I originated from library automation. Back then, it was all about indexing algorithms and relevance ranking where I did research to develop a search engine. During eight years at T-Systems, we analyzed the situation in large enterprises in order to provide the right search solution. This included, increasingly, the integration of semantic technologies. Given the present hype about semantic technologies, it has been a focus in current projects to determine which approach or product can deliver in specific search scenarios. A related problem is to identify underlying principles of user-interface-innovations to know what’s going to work (and what’s not).

What are the three major challenges you see in search / content processing in 2009?

Let me come at this in a non technical way. There are plenty of challenges awaiting algorithmic solutions, I see more important challenges here:

  1. Identifying the real objectives, fighting myths For an organization to implement internal search today hasn’t become any easier. There are numerous internal stakeholders, paired with a very high user expectation (they want the same quality as with Internet search, only better, more tailored to their work situation and without advertising…). To keep a sharp analysis becomes difficult in an orchestra of opinions, in particular when familiar brand names get involved (“Let’s just take Google internally, that will do.” )
  2. Avoid simplicity. Although many CIOs claim they have “cleaned up” their intranets, enterprise search remains complex; both technological and in terms of successful management. Therefore, to tackle the problem with a self-proclaimed simple solution (plug in, ready, go) will provide Search. But perhaps not the search solution needed and with hidden costs, especially on the long run. In the other extreme, a design too complex – with the purchase of dozens of connectors – is likely to burst your budget.
  3. Attention. Recently, I heard a lot about how the financial crisis will affect search. In my view, the effects are only reinforcing the challenge “How to draw enough management attention to Search to make sure it’s treated like other core assets”. Some customers might slow down the purchase of some SAP add-on modules or postpone a migration to the next version of Backup Software. But the status of those solutions among CIOs will remain high and un questioned.

With search / content processing decades old, what have been the principal barriers to resolving these challenges in the past?

There’s no unique definition of the ‘Enterprise Search Problem” as if it would be a math theorem. Therefore, you find somehow amorphous definitions about what is to be solved. Let’s take the scope of content to be searched: everything internal? And nothing external? Another obstacle is the widespread believe in shortcuts. Popular example: Let’s just index the content present in our internal content management system, the other content sources are irrelevant. That way, the concept of completeness in search/result set is sacrificed. But search can be as gruesome as the Marathon: you need endurance and there are no shortcuts. If you take a shortcut, you’ve failed.

What is your approach to problem solving in search and content processing?

Smarter software definitely, because the challenges in search (and there are more than three) are attracting programmers and innovators to come up with new solutions. But, in general, my approach is “keep your cool”. Assess the situation, analyze tools and environment, design the solution and explain it clearly. In the process, interfaces have to be improved sometimes in order to trim them down to fit with the corporate intranet design.

With the rapid change in the business climate, how will the increasing financial pressure on information technology affect search / content processing?

We’ll see how far a consolidation process will go. Perhaps we’ll see discontinued search products where we initially didn’t expect it. Also, the relation asked in the following question might be affected: software companies are unlikely to cut back at core features of their product. But integrated search functions are perhaps identified for the scalpel.

Search / content processing systems have been integrated into such diverse functions as business intelligence and customer support. Do you see search / content processing becoming increasingly integrated into enterprise applications?

I’ve seen it the other way around: Customer Support Managers told me (the Search person) that the built-in search-tool is ok but that they would like to look up additional information from some other internal applications. I don’t believe that built-in search will replace stand-alone search. The term “built-in” tells you that the main purpose of the application is something else. No surprise that, for instance, the user interface was designed for this main purpose – and will, in conclusion, not address typical needs of search.

Google has disrupted certain enterprise search markets with its appliance solution. What can a vendor do to adapt to this Google effect?

A vendor should point out where he differs from Google and why to address this Google-effect.

But I see Google as a significant player in enterprise search, if only for the mindset of procurement teams you describe in your question.

As you look forward, what are some new features / issues that you think will become more important in 2009?

The issue of cloudsourcing will gain traction. As a consequence, not only small and medium sized enterprises will discover that they might not invest in in house Content Management and Collaboration applications, but use a hosted service instead. This is when you need more than a “behind the firewall” search, because content will be scattered across multiple clouds (CRM cloud, Office cloud). I’m not sure whether we see a breakthrough there in 36 month; but the sooner the better.

Where can I find more information about your services and research?

http://www.linkedin.com/in/mbaumgartel

Stephen E. Arnold, www.arnoldit.com/sitemap.html and Harry Collier, www.infonortics.com

NSA Oral Histories Available

February 23, 2009

If you are looking for a test corpus against which to benchmark a search system, take a look at the National Security Agency’s declassified oral interviews. A happy quack to the reader who alerted me to BeSpecific’s write up “Declassified Oral History Interviews Posted by National Security Agency” here. Grab ’em quick. The NSA, according to the write up, has reworked its Web site. I enjoyed the “Doing Business with the NSA.” Interesting if not exactly in line with how the world in Beltway Bandit land often works. For more NSA content, run this query on Uncle Sam.

Stephen Arnold, January 23, 2009

Number 13 in the Biggest Technology Goof List

February 22, 2009

ComputerWorld published on February 22, 2008, here a list of the “The 25 Greatest Blunders in Tech History.” I find these lists amusing. I paddled right by the first 12 and the last 12. I focused on blunder number 13:

Search portals. Where are they now? At the height of the dot-com boom, web surfers had a plethora of search engines to choose from: AltaVista, Excite, InfoSeek, Lycos, and many more. Today, the major players of the past are mostly dead. A few have soldiered on, such as Ask.com, but only after repeated redesigns. Chalk it up to old-fashioned hubris. Instead of concentrating on their search offerings, the first-generation search engines fell victim to the portal arms race. They built up dashboards full of sports scores, stock quotes, news headlines, horoscopes, the weather, email, instant messaging, games, and sponsored content – until finding what you wanted was like playing Where’s Waldo. Neither fish nor fowl, they became awkward combinations of search portals and general-interest portals. The world went to Yahoo for the latter. And when an upstart called Google appeared with a clean UI and high-quality search, users told the other engines to get lost.

The consequence of the portal mania. Our pal Googzilla. The failure of portals opened the door to my favorite example of received wisdom (portals are the future) creating the space for a hyperconstruct to reshape online, search, and a number of other businesses. I would have moved this goof to the top 10. But 13 remains an unlucky number for the companies who jumped on the portal bandwagon a decade ago.

Stephen Arnold, February 22, 2009.

Google: Suddenly Too Big

February 22, 2009

Today Google is too big. Yesterday and the day before Google was not too big. Sudden change at Google or a growing sense that Google is not the quirky Web search and advertising company everyone assumed Googzilla was?

The New York Times’s article by professor Randall Stross available temporarily here points out that some perceive Google as “too big.” Mr. Stross quotes various pundits and wizards and adds a tasty factoid that Google allowed him to talk to a legal eagle. Read the story now so you can keep your pulse on the past. Note the words the past. (You can get Business Week’s take on this same “Google too powerful” here.)

The fact is that Google has been big for years. In fact, Google was big before its initial public offering. Mr. Stross’s essay makes it clear that some people are starting to piece together what dear Googzilla has been doing for the past decade. Keep in mind the time span–decade, 10 years, 120 months. Also note that in that time interval Google has faced zero significant competition in Web search, automated ad mechanisms, and smart software. Google is essentially unregulated.

Let me give you an example from 2006 so you can get a sense of the disconnect between what people perceive about Google and what Google has achieved amidst the cloud of unknowing that pervades analysis of the firm.

Location: Copenhagen. Situation: Log files of referred traffic. Organization: Financial services firm. I asked the two Web pros responsible for the financial services firm’s Web site one question, “How much traffic comes to you from Google?” The answer was, “About 30 percent?” I said, “May we look at the logs for the past month?” One Webmaster called up the logs and in 2006 in Denmark, Google delivered 80 percent of the traffic to the Web site.

The perception was that Google was a 30 percent factor. The reality in 2006 was that Google delivered 80 percent of the traffic. That’s big. The baloney delivered from samples of referred traffic, if the Danish data were plus or minus five percent, Google has a larger global footprint than most Web masters and trophy generation pundits grasp. Why? Sampling services get the market share data in ways that understate Google’s paw prints. Methodology, sampling, and reverse engineering of traffic lead to the weird data that research firms generate. The truth is in log files and most outfits cannot process large log files so “estimates” not hard counts become the “way” to truth. (Google has the computational and system moxie to count and perform longitudinal analyses of its log file data. Whizzy research firms don’t. Hence the market share data that show Google in the 65 to 75 percent share range with Yahoo 40 to 50 points behind. Microsoft is even further behind and Microsoft has been trying to close the gap with Google for years.)

So now it’s official because the New York Times runs an essay that say, “Google is big.”

To me, old news.

In my addled goose monographs, I touched on data my research unearthed about some of Google’s “bigness”. Three items will suffice:

  • Google’s programming tools allow a Google programmer to be up to twice as productive as a programmer using commercial programming tools. How’s this possible? The answer is the engineering of tools and methods that relieve programmers of some of the drudgery associated with developing code for parallelized systems. Since my last study — Google Version 2.0 — Google has made advances in automatically generating user facing code. If the Google has 10,000 code writers and you double their productivity, that’s the equivalent of 20,000 programmers’ output. That’s big to me. Who knows? Not too many pundits in my experience.
  • Google’s index contains pointers to structured and unstructured data. The company has been beavering away to that it no longer counts Web pages in billions. The GOOG is in trillions territory. That’s big. Who knows? In my experience, not too many of Google’s Web indexing competitors have these metrics in mind. Why? Google’s plumbing operates at petascale. Competitors struggle to deal with the Google as it was in the 2004 period.
  • The computations processed by Google’s fancy maths are orders of magnitude greater than the number of queries Google processes per second. For each query there are computations for ads, personalization, log updates, and other bits of data effluvia. How big is this? Google does not appear on the list of supercomputers, but it should. And Google’s construct may well crack the top five on that list. Here’s a link to the Google Map of the top 100 systems. (I like the fact that the list folks use the Google for its map of supercomputers.)

The real question is, “What makes it difficult for people to perceive the size, mass, and momentum of Googzilla?” I recall from a philosophy class in 1963 some thing about Plato and looking at life as a reflection in a mirror or dream (???????). Most of the analysis of Google with which I am familiar treats fragments, not Die Gestalt.

Google is a hyper construct and, as such, it is a different type of organization from those much loved by MBAs who work in competitive and strategic analysis.

The company feeds on raw talent and evolves its systems with Darwinian inefficiency (yes, inefficiency). Some things work; some things fail. But in chunks of time, Google evolves in a weird non directive manner. Also, Google’s dominance in Web search and advertising presages what may take place in other markets sectors as well. What’s interesting to me is that Google lets users pull the company forward.

The process is a weird cyber – organic blend quite different from the strategies in use at Microsoft and Yahoo. Of its competitors, Amazon seems somewhat similar, but Amazon is deeply imitative. Google is deeply unpredictable because the GOOG reacts and follows users’ clicks, data about information objects, and inputs about the infrastructure’s machine processes. Three data feeds “inform” the Google.

Many of the quants, pundits, consultants, and MBAs tracking the GOOG are essentially data archeologists. The analyses report what Google was or what Google wanted people to perceive at a point in time.

I assert that it is more interesting to look at the GOOG as it is now.

Because I am semi retired and an addled goose to boot, I spend my time looking at what Google’s open source technology announcements that seem to suggest the company will be doing tomorrow or next week. I collect factoids such as the “I’m feeling doubly lucky” invention, the “programmable search engines” invention, the “dataspaces” research effort, and new patent documents for a Google “content delivery demonstration”, among others — many others I wish to add.

My forthcoming Google: The Digital Gutenberg explains what Google has created. I hypothesize about what the “digital Gutenberg” could enable. Knowing where Google came from and what it did is indeed helpful. But that information will not be enough to assist the businesses increasingly disrupted by Google. By the time business sectors figure out what’s going on, I fear it may be too late for these folks. Their Baedekers don’t provide much actionable information about Googleland. A failure to understand Googleland will accelerate theĀ  competitive dislocation. Analysts who fall into the trap brilliantly articulated in John Ralston Saul’s Voltaire’s Bastards will continue confuse the real Google with the imaginary Google. The right information is nine tenths of any battle. Apply this maxim to the GOOG is my thought.

Stephen Arnold, February 22, 2009

« Previous PageNext Page »

  • Archives

  • Recent Posts

  • Meta