Google Books: Advantage to Whom
January 2, 2009
The Register on December 31, 2008, published Chris Castle’s “Is Google’s culture grab unstoppable?” here. The article is in two parts and does a good job of summarizing Google’s deal to sidestep the copyright issues with Google Books. For me, the most interesting comment in the article was:
Regulators should care who controls the Google Books registry because it can easily reach out to other content. Google is well on its way to dominating all search and advertising, and now maybe a significant share of online content. Google’s ability to accomplish transparent accounting is definitely in doubt.
In my opinion, Google is going to be difficult to manage, channel, or control. And not just in books.
Stephen Arnold, January 2, 2009
Google 2009 Lego Blocks
January 1, 2009
Do you want to know what the Google will do in 2009? If the answer is yes, you will want to navigate to TechCrunch’s article about the top 10 Google services here. Take a gander at the traffic for each of these. Then click to Simply Google here and review the listing of Google’s products and services. You can find this list here. Now to see the future, pick a high traffic site from the TechCrunch list and pick six Google services as you Lego blocks. How can you combine these six products and services on one of Google’s high traffic sites to create a YAGPOS (Yet Another Google Product or Service)? In 2009, Google will be mixing and matching what exists and deploying these combo services on its high traffic sites, pushing these hybrids into the enterprise via the Google Search Appliance, or converting a plain Jane service like Google Product Search into a procurement manager’s dream system. Oh, no recoding required. Ramp up time? A day, maybe two. Has anyone see a T shirt with this message WWGD?; that is, What Will Google Do? Google will convert companies who don’t compete with the GOOG into competitors in 2009. The change will be sudden and surprising. Google’s Lego block approach to new products and services will be surprising. Proud parents Larry and Sergey will beam with the outputs of their progeny.
Stephen Arnold, January 1, 2009
eBay’s Challenge in 2009: Googzilla
December 31, 2008
In my September 2007 study Google Version 2.0, published by Infonortics Ltd. in Tetbury, Glou. here, I commented on Google’s eCommerce capabilities. I included a diagram that outlined one of the scenarios that was taking shape as I did the research for that report. The idea was for Google to ignore Amazon and focus on a weakening eBay. Without fanfare, Google would attract sellers. Then Google’s “as is” back office capabilities were known mostly by AdSense participants who received accounting reports and checks from the GOOG without much fuss or hassle. Flash forward to 2009, and we see more of the Google eCommerce strategy becoming visible. First, Google uses the Amazon service to deliver music to its Android based mobile devices. And, Pete Barlas, Investor’s Business Daily, summarized data about Google’s growing influence in eCommerce. You can read “Google’s Product Search Catching Up Fast with Shopping Rivals” here for a short period of time. (Yahoo News stories often go dark quickly, so you may have to hunt around for this December 29, 2008, news story.) For me, the key comment in the article was:
Google Product Search had 11.8 million unique visitors in November. That’s up a whopping 786% from the year-ago period — the biggest one-year increase by far of any online comparison shopping service, says market tracker comScore.
Mr. Barlas includes other useful nuggets in his story; for example, the display of “sample searches” to intrigue shoppers. But I wanted to add two comments about this Google service not discussed in Mr. Barlas’ article:
First, eBay is a wounded duck. The signals of discontent have been flashing for a while. Fees have gone up. The issues about fraud and PayPal’s customer service continue to spark discussions among eBay users. The shift to fixed price items produces a more predictable shopping experience, but Amazon and now Google offer more efficient mechanisms. The eBay model is wearing out. When I look for a computer part, I have to wade through pages of irrelevant listings. Try a search for NC6000 and you will see hundreds of batteries and components. There is no way to limit the scope of the query to a working laptop computer.
Second, Google moves slowly and on numerous fronts at one time. As a result, it is easy to ignore a Google activity. Competitors like eBay have not figured out how to identify key Google moves, track them, and put defensive actions into play. Google seems to be floundering along with lava lamps and Odwalla juice and then the Google’s shadow falls across the eBay business. Surprise. Google has been a player in eCommerce for six or seven years. Now, Googzilla is in the front yard. Eeek.
In Google Version 2.0, the investments Google made in R&D make it clear that there are six business sectors in which Google is making similar strategic moves. Eeek is not a satisfactory response. By the time research data shows the shift is underway, the damage is done. Remediation is difficult, expensive, and likely to be ineffective. In my opinion, 2009 will be a pivotal year for eBay.
Stephen Arnold, December 31, 2008
Spreadsheet Fever: Information Overload Cost Estimate
December 31, 2008
Trophy kids with MBAs seem to have migrated from banking to consulting, or at least some did. When I read “Info Overload Costs $900 Billion, Blame Mr. Rogers” in Ars Technica here, I laughed–a while in fact. The Ars Technica story reported findings from a study generated by a New York azure chip consultant. As you may know, there are blue chip consultants like McKinsey and Bain. Then there are azure chip consultants. These are outfits hoping that the intellectual wavelength shifts to allow their firms to jump to the blue chip level. With studies like the one referenced by Ars Technica, it will be a while before the recruiters for the blue chips beat a path to the wizards behind the Information Overload Calculator here. You must visit the link, plug in your values, and get your rock solid cost estimate. The outfit behind the Calculator is Basex here. The firm’s tag line is a great one indeed: “Management science for the knowledge economy.” From my mud floored office in Harrods Creek, the “knowledge economy” does not look too peppy at this point in time. As a survivor of the original Booz, Allen & Hamilton meat grinder, I am also skeptical of “management science.” In my opinion, “management science” is an oxymoron, but what do I know. Earlier today I learned to great fanfare that Microsoft is a software company. Man, what an insight was that revelation. Ars Technica handles the write up of the Calculator with journalistic objectivity. The comment in the article that interested me was this one about search:
Another big time sink, says Spira [an expert cited by Ars Technica] , comes from the need to sort through reams of data to find the particular piece of information a worker needs. He claims that about half of web searches fail—other estimates put the figure closer to 30 percent—and that almost half of what those users regard as successful are really partial failures, because the information recovered is outdated or inaccurate. Part of the solution, Spira argues, is better search algorithms. “Most search is done now by keyword,” says Spira, “and that’s a terrible way of searching—by itself. It’s not terrible when it’s done in conjunction with taxonomies. But if I can’t narrow down my search, how does the search engine know what my goal is?”
You can figure out whom the Ars Technica reporter is interviewing. For me, this is one more trip down the return on investment garden path. Estimates of time wasted are not useful in my opinion. I worked on a year long study of innovation for a Fortune 50 company for a year. I watched quite a few geniuses and wizards in action. I recall thinking that these guys and gals fiddled around a lot. One wizard at a large defense company told me that ideas came while she slept. She worked on “work” on weekends. During the work day, she fiddled around. The product emerging from this clear example of wasting time was the machine gun ammunition clip for a high speed cannon.
The leap to search is a common mental jump among trophy and entitlement thinkers. The idea of remembering, analyzing, and synthesizing data over time is a novel one. Check out the article and copy down the data. I think quite a few azure chip outfits will be recycling the data or inventing their own content free calculators. In my view, using the Calculator wastes just wastes time.
Stephen Arnold, December 31, 2008
ClickZ: Year 2008 as Search Terms
December 31, 2008
For you search engine optimization lovers, navigate to “The Year in Search: A 2008 Review” by Enid burns, ClickZ here. Ms. Burns has gathered the top searches for 2008. The resulting word list provides you with an indication of what will suck traffic to your Web site. It may help if your Web site is about one of the topics in the word list, but I know a couple of Web site wranglers who worry more about spoofing Google, Microsoft, and Yahoo, than about content. Use the list as you will. The Lycos system returned “poker” as the number one search term. If you want to buy a portal and risk your savings, acquire Lycos. None of the terms is surprising because the search terms mirror what’s hot. Web search is different from enterprise search in many ways, but I have yet to see Louis Vuitton or Clay Aiken in the enterprise search logs I have had the opportunity to review.
Stephen Arnold, December 31, 2008
Cruel to Cuil
December 30, 2008
TechCruch pushed the boulder off the hill, and now the avalanche is crashing down on Cuil. I wrote about this service when it first rolled out. You can read that article here. You can find CNet’s take on the failure of Cuil here. Matt Asay’s “Breaking the Google Habit” summarizes the Web search traffic chasm between the GOOG and every other Web search service. Keep in mind that the Google is not doing as well in China, India, Russia, and a couple of other places. But for most of the US of A, search means Google. (When I make this statement in my public lectures, I get entitlement children and trophy crazed 20 somethings chewing on my ankle. Folks, I am just reporting data, not imagining them. What do you call a 65 to 70 percent market share in Web search and nearly 26,000 Google Search Appliances and nearly complete saturation of US government mapping activities with Google Maps?) Mr. Asay picks up the theme of search as a habit. I have mentioned this characteristic of online once or twice in the last 30 years, but it’s a novel idea for CNet. The point is that Cuil started strong and ended up sucking air behind such stars as Ask.com and AOL.com. For me, the most important comment in Mr. Asay’s write up was:
..for competitors looking to kick the Google search habit, you can’t take the Cuil route and compete on search. It just won’t matter if you’re better. You need to create a different, compelling habit.
Wait a minute. I need to get out my acid free paper and archival ink. I want to write that down. I bet the Cuil venture fund check writers would prefer to capture their thoughts with branding irons on Cuil flesh.
Stephen Arnold, December 30, 2008
Wring Value from Google Analytics
December 30, 2008
I fielded several questions in the days before the ghost of Christmas present visited the coal country where I live in rural Kentucky. One question pertained to Google Analytics. If you haven’t seen analytics in action, point your browser to http://trends.google.com and you can see some of what’s possible. A Google search for “Google Analytics” works well too. To dig more deeply into Google Analytics, you will want to read Kissmetrics’s article “50 Resources for Getting the Most Out of Google Analytics” here. My Web log gets so little traffic, analytics depress me. But you, gentle reader, probably have a high traffic site, and you will benefit by clicking through sites and services grouped by useful headings; for example, “Beginner” and “Plugins, Hacks & Additions”. Very useful collection of information. Highly recommended.
Stephen Arnold, December 30, 2008
Microsoft in the Crystal Ball
December 30, 2008
Five scenarios for Microsoft’s future made the shortlist at InfoWorld. You can read the full, remarkably choppy story here. I don’t want to spoil your fun as you scan the five scenarios the wise journalists cooked up for today’s intellectual meal. I can mention one of the scenarios, however. For example, after noting that Microsoft has wandered a bit in 2008, one of the future scenarios is a gentle drift downwards. When I read this, I thought, “If the economy tanks, how gentle will that crash be for the Redmond wizards?” In my opinion, not too gentle. You can work through the other four scenarios which strongly suggest that someone at InfoWorld might want to sign up for an evening MBA program at one of the universities near the InfoWorld offices. The scenario that I think warrants a bit of thought is the break up. The company may be worth more chopped into three or more segments. With a stock price in the $20 range, beleaguered investors and users who have to clean up Word’s crazy behavior on a Windows machine by firing up Word for a Mac may force the issue. If more trouble looms, maybe Mr. Gates will come back. Marketing is not closing the gap between Redmond and the GOOG. Technology, not Zunes, is necessary. And quickly. Time is dribbling away. The company’s 2008 acquisitions provide additional evidence that Microsoft finds itself in a strange new Googley world.
Stephen Arnold, December 30, 2008
Dead Tree Update: Chicago and Suburban Shoppers
December 29, 2008
Newsweek Magazine, a dead tree publication in some danger of marginalization, published “Chicago’s Newspapers Facing a Troubled Future” here. When I read this article, I had the impression that the author, F.N. D’Alessio, was writing about Newsweek and the Associated Press. Mr. D’Alessio refers to newspaper “addicts”. I don’t know too many. I receive four dead tree newspapers: the Courier Journal, USA Today (affectionately known as McPaper), the New York Times, and the Wall Street Journal. I used to get the Financial Times, but the delivery was so erratic I dropped the paper in January 2008. I received an offer of a year’s subscription for $99, and I threw it in the trash. Too much hassle trying to work through clumps of papers arriving twice a week. For me, the most significant comment in the Newsweek story was a comment about the Tribune’s rival, the Chicago Sun Times:
Hollinger’s biggest move was to create the Sun-Times Media Group by buying up 70 suburban and neighborhood newspapers, more than a dozen of which are dailies. Some of those are profitable, and some newspaper analysts envision the Sun-Times company shutting down the namesake paper and keeping the suburban ones.
I read this as a clear statement that big city papers are gone geese. Check out the Tribune’s online version of the newspaper. It is a disaster. My discussion of this wounded duck is here.
The future for dead tree outfits–if there is to be one–is to become ad supported, micro publications serving narrow markets. For years, I thought the Gaithersburg Gazette was had potential. Now that type of publication along with penny shoppers may be the margin of the information world available to the dead tree crowd.
You can make money in niches, but the revenue will buy used Malibus, not the flashy Mercedes the princes of journalism see as suitable transportation.
Stephen Arnold, December 29, 2008
Duplicates and Deduplication
December 29, 2008
In 1962, I was in Dr. Daphne Swartz’s Biology 103 class. I still don’t recall how I ended up amidst the future doctors and pharmacists, but there I was sitting next to my nemesis Camille Berg. She and I competed to get the top grades in every class we shared. I recall that Miss Berg knew that there five variations of twinning three dizygotic and two monozygotic. I had just turned 17 and knew about the Doublemint Twins. I had some catching up to do.
Duplicates continue to appear in data just as the five types of twins did in Bio 103. I find it amusing to hear and read about software that performs deduplication; that is, the machine process of determining which item is identical to another. The simplest type of deduplication is to take a list of numbers and eliminate any that are identical. You probably encountered this type of task in your first programming class. Life gets a bit more tricky when the values are expressed in different ways; for example, a mixed list with binary, hexadecimal, and real numbers plus a few more interesting versions tossed in for good measure. Deduplication becomes a bit more complicated.
At the other end of the scale, consider the challenge of examining two collections of electronic mail seized from a person of interest’s computers. There is the email from her laptop. And there is the email that resides on her desktop computer. Your job is to determine which emails are identical, prepare a single deduplicated list of those emails, generate a file of emails and attachments, and place the merged and deduplicated list on a system that will be used for eDiscovery.
Here are some of the challenges that you will face once you answer this question, “What’s a duplicate?” You have two allegedly identical emails and their attachments. One email is dated January 2, 2008; the other is dated January 3, 2008. You examine each email and find that difference between the two emails is in the inclusion of a single slide in the two PowerPoint decks. You conclude what:
- The two emails are not identical and include both and the two attachments
- The earlier email is the accurate one and exclude the later email
- The later email is accurate and exclude the earlier email.
Now consider that you have 10 million emails to process. We have to go back to our definition of a duplicate and apply the rules for that duplicate to the collection of emails. If we get this wrong, there could be legal consequences. A system develop who generates a file of emails where a mathematical process has determined that a record is different may be too crude to deal with the problem in the context of eDiscovery. Math helps but it is not likely to be able to handle the onerous task of determining near matches and the reasoning required to determine which email is “the” email.
Which is Jill? Which is Jane? Parents keep both. Does data work like this? Source: http://celebritybabies.typepad.com/photos/uncategorized/2008/04/02/natalie_grant_twins.jpg
Here’s another situation. You are merging two files of credit card transactions. You have data from an IBM DB2 system and you have data from an Oracle system. The company wants to transform these data, deduplicate them, normalize them, and merge them to produce on master “clean” data table. No, you can’t Google for an offshore service bureau, you have to perform this task yourself. In my experience, the job is going to be tricky. Let me give you one example. You identify two records which agree in field name and data for a single row in Table A and Table B. But you notice that the telephone number varies by a single digit. Which is the correct telephone number? You do a quick spot check and find that half of the entries from Table B have this variant, or you can flip the analysis around and say that half of the entries in Table A vary from Table B. How do you determine which records are duplicates.