More Social Network Issues
January 12, 2009
Social search, social networks, and social pitfalls–the cheerleaders don’t want the social bandwagon to be delayed but trouble looms. Google’s Orkut made clear the issues that can arise when a social network becomes the playground of some interesting people in Brazil. Now you can read “(Under)mining Privacy in Social Networks” here by a trio of Googlers. The Google write up identifies some obvious flaws; for example, exposing information unintentionally. But the more significant part of the paper in my opinion are the references to merging social graphs. The dataspace drum beats are getting louder.
Stephen Arnold, January 12, 2009
Ask.com’s Search Technology Advances
January 12, 2009
Ask.com keeps trying. On January 8,2009, the company announced “Semantic Search technology Advances from Ask.com.” You can read the company’s statement here. The company asserts:
In October last year we introduced our proprietary DADS (Direct Answers from Databases), DAFS (Direct Answers from Search), and AnswerFarm technologies, which are breaking new ground in the areas of semantic, web text, and answer farm search technologies. Specifically, the increasing availability of structured data in the form of databases and XML feeds has fueled advances in our proprietary DADS technology. With DADS, we no longer rely on text-matching simple keywords, but rather we parse users’ queries and then we form database queries which return answers from the structured data in real time. Front and center. Our aspiration is to instantly deliver the correct answer no matter how you phrased your query.
The idea is that a user–assuming there is enough traffic to make the site viable in 2009–can enter a query any way he or she wishes. The Ask.com system will figure out the query and provide a Direct Answer. Let’s check out the system.
My first query was, “What’s the daily show?” The system responded with the top result “The Daily Show with Jon Stewart.” Good. My second query was, “What is a dataspace’s application?” The system responded by asking me the question, “What is a data spaces application?” The first result was a link to Sourceforge’s information about EQUIP2. Sorry, the correct answer was in my mind a link to the ACM papers about dataspaces. My third query was, “What is an information manifold?” This is no trick question because there is a technical paper with a title that contains the bound phrase “information manifold.” The Ask.com system asked me, “What is an information mannford?” I don’t know what a “mannford” is.
For the types of questions a middle school student might ask, the new system will work pretty well. For popular culture topics, the system will probably be better than some I have examined this week. For the types of queries I have about technologies that address the known weaknesses of traditional semantic processing, Ask.com won’t help me too much. That’s good. Knowing what questions to ask allows me to feed my goslings. Ask.com won’t put me out of job this year. One final point: I clicked on “mannford”. It’s a a city in Oklahoma. No dataspaces among that state’s wide open spaces. Look west, young search, look to Mountain View, California.
Stephen Arnold, January 12, 2009
Xsearch CEO Norbert Weitkämper Interviewed
January 12, 2009
Weitkämper Technology–based in Staffelsee, Germany–is a search and content processing vendor with a low profile in North America. The firm offers its multi-source search suite that incorporates proprietary technology to deliver fast content and query processing. The company’s XSEARCH package is customizable to focus on the client’s specific need. It offers nine variables: Clustering Engine, Suggest, DidYouMean, Summarizer, Linguistic Engine, Federated Search, Facet Navigator, Entity Extractor and Intelligent Classifier.
The industrial engineer was dissatisfied with the search results available from commercial products. Norbert Weitkämper developed Xsearch after working in electronic publishing. He told Search Wizards Speak:
As we are specialized on search for more than a decade our package is very well tuned; not only for speed but also for content for example. We will combine our new HitEngine with our established technologies like Linguistic, Did-You-Mean, clustering, synonyms and ontologies, or our personal ranking mechanisms. They are already released, we just have to melt them together.
He added:
For the complex roman languages our linguistic engine with its morphologic analysis is a big advantage, because algorithmic approaches like Bayesian or Porter, which are doing a good job for English, are a miserable failure.
On the subject of semantic analysis, Mr. Weitkämper said:
Semantic analysis is much more difficult for European languages than for English. We are already able to integrate thesauri or ontologies. I have not seen any system yet which meets the requirements for semantic analysis – at least when you have a closer look into the system. But storing information in a quick and accessible way is even more important for this approach, as you have to consider much more than only keywords and positions. So I can imagine that our optimized index structure may help also in this field to achieve adequate results in an acceptable amount of time.
More information about the company is available at its Web site, http://www.weitkamper.com. The full text of the interview with Mr. Weitkämper is at http://www.arnoldit.com/search-wizards-speak/xsearch.html.
Stephen Arnold, January 12, 2009
British Library Dubunks Myth of a Google Generation
January 11, 2009
Libraries are fighting for money and a role in the digital world. The plight of white shoe publishers is well known. Newspapers, once the life blood of information, are now stuffed with soft news or, what’s worse, old information. The shift from desktop boat anchor computers to sleek hand held devices is moving forward. Flag ship PC vendors like Dell Computers is in a fight for Wall Street respectability. The television and motion picture pasha believe that the fate of the traditional music publishing business is not theirs.
On January 16, 2008 (the date and the information come from this source), the British Library press room issued or issues or will issue “Pioneering Research Shows Google Generation Is a Myth.” The news release summarizes the study Information Behaviour of the Research of the Future. Here’s the link I located but it did not work without some clicking around. The report strikes me as something developed in an alternate universe where the Googleplex and its information system are small potatoes indeed.
He does not exist, but this member of the Google generation made it to the cover of the British Library debunking the myth study. In the future, this lad will be retrieving information from a mobile device, no PC or library required thinks this addled goose.
The study was, according to the press release,
Commissioned by the British Library and JISC (Joint Information Systems Committee), the study calls for libraries to respond urgently to the changing needs of researchers and other users. Going virtual is critical and learning what researchers want and need crucial if libraries are not to become obsolete, it warns. “Libraries in general are not keeping up with the demands of students and researchers for services that are integrated and consistent with their wider Internet experience”, says Dr Ian Rowlands, the lead author of the report.
Now this paragraph seems to suggest that “something” has happened and that libraries must “respond urgently to the changing needs of researchers and other users.” My hunch is that libraries are not surfing on the Google but paddling along trying to keep Googzilla’s spikey back in view.
Most of these curves head south, right? © British Library 2009 and presumably in the universe which I inhabit.
The news release also suggests libraries must turn to “Page 2.0”, which I presume is another silly reference to the made up world of Search 2.0, Enterprise 2.0, and Web 2.0. The news release from the future ends with the mysterious phrase “The panel:”.
Keep in mind that I am writing this notice on January 11, 2009, at 9 30 am Eastern time. The news release is from the future. It has a date of January 16, 2009. One would think that the British Library, operating outside the normal space time continuum could do more than tell me that the myth of the Google generation does not exist. Clever headline aside, libraries must define a role for themselves before funding dwindles even more. University libraries might be grandfathered into the institutional budget. Other types? Might be a tough sale.
In my opinion, what does not exist among some in the library profession is a firm grip on the hear and now. I am 65, and I think the Google generation exists. I wish it were not so, but it exists and the world one hopes will be better for the generation’s presence. Libraries seem to exist in a medieval world. Even Shakespeare is in step with the shift from paper to digital information. Consider Hamlet’s statement from one of the versions of the play crafted from Shakespeare’s foul papers:
Let us go in together,
And still your fingers on your lips, I pray.
The time is out of joint—O cursèd spite,
That ever I was born to set it right!
Nay, come, let’s go together.
No myth this, sprites.
Stephen Arnold, January 11, 2009
Microsoft’s Data Robustness
January 11, 2009
The “we may go out of business” Seattlepi.com Web site ran a story with the cruel title “Microsoft’s Servers Overloaded by Interest in Windows 7.” You can read this sort of weird headline and its accompanying story here. The story makes clear that Microsoft’s investments in its data centers was not up to the load imposed by the faithful downloading Windows 7.
The misstep was described as a “borkfest” by Lifehacker here. This goose isn’t sure what a borkfest is, but he can make a guess. Gina Trapani’s article nails the problem. She wrote:
If lack of infrastructure to handle an insane traffic spike over a few hours was truly the problem (even though these were conditions Microsoft created), there are lots of alternatives they could’ve used that would have kept their servers up. In fact, users have been happily downloading and distributing the Windows 7 beta build 7000 now for weeks using an efficient file-sharing protocol called BitTorrent.
When the GOOG streamed its live concert test last year, the Googlers tapped Akamai. Did Microsoft use its own content delivery network? Did Microsoft contract out the job? Whoever handled the job may want to check out another line of work in my opinion. Seattlepi.com quotes a Microsoft Web log. I noted this sentence: “We are adding some additional infrastructure support to the Microsoft.com properties before we post the public beta.” Good idea.
Stephen Arnold, January 11, 2009
Yahoo: Slipping and Dipping
January 11, 2009
I have deep skepticism about third party data. Nevertheless, when reports about Web site traffic and online advertising share appear, the data get snapped up the way Tess goes for a dropped chicken wing. Silicon Alley Insider’s “Yahoo’s Share of All Search Advertisers Drops 36% in QY (YHOO)” is worth reading. You can find the story and the scary red line here. Let’s assume the data are accurate. Bad news for Yahoo. Let’s assume the data are off a tad, say, down 18 percent in Q4. Slightly less bad news. If the Yahooligans continue to slip, the GOOG benefits. Yahoo started as a directory, became a portal, and then floundered. Like a person overboard in the Arctic waters off Nordkaap, even a strong swimmer succumbs. A weak swimmer, well, not much chance. Yahoo is now in the Arctic waters.
Stephen Arnold, January 11, 2009
Business Week: All Over the GOOG
January 10, 2009
Business Week may want to rename one of its editorial sections “Google Week.” The editors at Business Week crank out articles about Google. Most are interesting, but some of the Google coverage is–well, let me be gentle–obvious. Here’s an example, “Small businesses Love Google, Even When things Go Wrong.” Now we know that search is not very good. I know that folks with multiple PhD’s and big IQs will beg to differ but I point to the research I have done, Jane McConnell in Paris has done, and that Martin White in London has done. Our data reveal that about two thirds of the users of a search system are dissatisfied. Now Business Week has embraced a Neilsen-WebVisible survey that says 92 percent of Internet users are satisfied with Web search. But–and this is an important “but”–“39 percent of them frequently can’t find companies they’re looking for.” Search doesn’t work too well. Imagine that. You must read the Business Week article here which includes a link to the news release from the big time research outfit here. In my opinion, the reason people love Google has to do with the imprint Google has stamped on two thirds of the people who look for information on Google. Google is search. Search is Google. If a free service works in a manner one can describe as “good enough”, that’s okay. The key is the brand power and magnetism Google possesses. Perception is a big part of a search system’s success. Google’s been working on perception for a decade, and the GOOG has done a bang up job. Now if we can shift people from their grip on the view of Google as an ad company, I would be a happier goose.
Stephen Arnold, January 10, 2009
Microsoft Innovation According to Network World
January 10, 2009
Mitchell Ashley wrote “Top 8 Microsoft Research Projects to Improve Our Lives” here. Straight away, let me remind you that I am a goose, an addled goose at that. I doubt very much that Mr. Ashley was thinking about geese when he penned his headline. What are the eight research project that may improve a human’s life? For starters, these eight “socio-digital systems (whatever that phrase means) include “I know it’s here somewhere.”
The idea is that big hard drives are available, economical, and sucking data into their depths. But wait, humans use other digital devices like SD cards and USB sticks. (Not this goose. I lose them.) So there are two socio-digital innovations that address this problem. You will have to read Mr. Ashley’s article for the other six innovations. I focus on search and content processing.
The first innovation is called “digital shoebox” and “family archive”. Here is what Mr. Ashley wrote:
It’s like the data management version of the cryogenic-freezing program: We all keep creating personal digital content and buying more disk drives in hopes that someday they’ll discover a cure for the information archiving, searching and retrieval of all that stuff before our time on earth is up.
Now I must admit that I had a tough time figuring out what these innovations are. I turned to Microsoft Live Search for elucidation. I noted a reference dated 2002 to this technology, and I saw a December 5, 2007, Business Week article here about this innovation. I jumped back to the 2002 reference to digital shoebox research here and then back to the 2007 reference. Same invention. I asked myself, “At what point does an innovation become JAT or just another technology. I think five years is a long time to move from innovation in one part of the R&D to the public relations office in another part of the R&D department.
From my quick scan of these documents, I think a server that indexes and points to where information objects are. I am not sure how the digital shoebox works on non-text objects, what metadata are generated, how the index update operates, or the indexing overhead.
The family archive proved to be easier to locate. Microsoft offers a brief description here. The key point for me was:
This project aims to understand the needs of families to interact with, manage, and archive materials which are important in preserving and sharing family memories. We are developing a system which allows the input and safe archiving of both digital and physical media, and which allows natural interaction with those media. This work has been informed by our in-depth studies of “photowork” and of “videowork“.
I think the archive adds smarter software to the digital shoebox.
My hunch is that Microsoft wants to make it easy for a Vista user to dump data anywhere. The Microsoft technology will sniff out the data and index it. When the user wants to find something, the “server” (probably a software component, not a power sucking six figure system) will allow the user to browse, search, and click metadata like a date or some other tag like Wesak and see hits that match.
A couple of thoughts.
These are interesting search and content processing ideas. I need to test these systems to see if my life becomes easier with them. My previous brushes with smart information object metatagging systems is a love-hate affair. Some systems I downright hate. Others I sort of love. So far none of Microsoft’s search technologies has made me swoon. I am thinking about search in Outlook, native search in XP, free SharePoint search, and the Byzantine Microsoft Fast ESP system.
Second, the notion of dumping data locally is out of step with what my research suggests young people want to do. The notion of dumping stuff is viable. The last set of interviews I did revealed that dumping should be automatic and the data dump should be located on a server somewhere. The idea was that when the device dies, a new device can suck data from the dump in the sky.
Finally, with Microsoft’s share of the search market slipping, Microsoft needs to make market share gains. R&D is okay when it yields more than words about innovation. I want innovation.
Stephen Arnold, January 10, 2009
Social Search: Manipulating for Money
January 9, 2009
Mike Elgan wrote “How China’s 50 Cent Army Could Wreck Web 2.0” here. The point of this article is that a person with money can hire Chinese computer users to insert comments into social networks. The infusion of posts would, in effect, distort the much-ballyhooed wisdom of crowds. Mr. Elgan does a good job of explaining how these army works and pointing out the fragility of user-dependent Web 2.0 services. I think he strays from the tethering ring when we asserts that the Chinese “army” can undermine free speech, but otherwise, he’s spot on.
However–and I know you relish my “howevers”–a few of my addled goose observations are now in order.
First, the “social network” revolution is not as zippy as most pundits assert. Mr. Elgan’s write up explains how the person with money can pay to make a specific issue, product, or person percolate upwards. Money can’t buy happiness but it sure can buy visibility in a Web 2.0 service that depends on user inputs.
Second, social networks is more of marketing story than a technology innovation. Sure, MySpace.com and Facebook.com move well beyond discussion fora and individual Web pages. These sites have knitted together functions and surfed on young-at-heart users who need a way to connect in today’s Jetson’s world. As the young-at-heart grow old and infirm, their use of network communication methods will persist, but these methods are extensions of older technologies, not sudden inventions.
Third, the implications of a technology cannot be accurately predicted. As a result, when an issue arises with a technology application or suite of technology applications like social networks, the “fix” will be more technology. My concern with MySpace.com and Facebook.com stems not from what they do, but my concern arises from the new technologies these services will require to handle the problems. For example, what’s the fix for the Chinese “army” issue? Think more stringent controls. The casualty is not free speech. It is freedom.
Stephen Arnold, January 9, 2009
Google Revenue: Why the One-Trick Pony View Persists
January 9, 2009
A one trick pony is a pony that does one thing. Like let kids sit on its back. The one-trick pony is highly prized among certain carnival concession owners. Kids who afraid of big animals may also cotton to the one-trick pony. Wall Street likes any type of animal as long as it makes a lot of money. Google’s viewed as a one-trick pony because the company makes money from advertising. Ignore the boundaries that separate Google’s different types of advertising. The one-trick pony at the carnival might do some more interesting things in its stall at night with another pony. For the carnival impresario and the Wall Street crowd, Google’s a one trick pony shown below:
Google Blogoscoped presented a textbook example of how the one-trick pony view of Google is perpetuated. Navigate here and scan the table that shows “how Google makes money”. The useful list of more than 80 products and services includes three ways to make money. For each product and services, Blogoscoped tallied how Google monetizes these services. The three ways for Google to make money are–you guessed it–one trick ponies. There are ads and some fees to get involved with ads; for example, to be an AdWord advertiser, Google charges. But this is a variation on the one-trick pony’s ribbons, not a change to the one-trick pony.
I think it would be useful to consider these types of revenue horses. In 2009, a couple of them will be given their heads. competitors may find these ponies harder to ride than the docile one-trick pony that sits quietly as noisy kids climb on and off; to wit:
- Payments from educational institutions for various Google services. Example: the fees paid by New South Wales to license Google services for school kids
- License fees from the oft-reviled but highly-disruptive Google Search Appliance
- Subscription fees to commercial Google products and services; for example, Google Earth or SketchUp
- Payments from partners to become one of Google’s best pals
- Fees assessed to organizations when one of their top dogs decides that paying Google for Postini email archiving is better than getting caught unprepared for a discovery process.
Do you want to be standing flat footed in front of this group of ponies. Source: http://www.summers-photo.co.uk/Feb2007/images/Stampede_jpg.jpg
My list of other revenue ponies is longer than this group of five. Looking at the GOOG too closely or from one narrow angle makes it difficult to perform these tasks:
- Assess which pricing models could be implemented with little or no warning for unmonetized products and services; for example, charging me to look at pages when running a Google Book Search
- Place Google in a competitive context where advertising will not work; for example, Google charges for certain content constructs that it creates and are not available from other online services. Think a directory of specialized vendors in a specific market like video production.
- Understand what business models Google will have to implement in order to meet its financial objectives and Wall Street’s expectations; for example, if travel advertising goes down, what monetizing options are available to Google to address that shortfall.
If one wants to understand Google, one may want to keep track of the revenue herd. Granted ads generate a whopping 95 percent of Google’s “now” revenue. But going forward I like to watch those ponies. One or two may grow up to be different revenue animals. More about these options appears in my forthcoming Google and Publishing study available this spring from Infonortics Ltd. here.
Stephen Arnold, January 9, 2009