Alpha Cold like Cuil or Hot like Google
May 11, 2009
Adam Ostrow has an excellent write up about the Wolfram Alpha system. He works through the limited examples in a useful way. Compared to the MIT Technology Review analysis, Mr. Ostrow took the pants off MIT’s Alpha reviewer. He gathers screenshots of the mash up and answers the demo Alpha has on offer. For me, the most interesting comment in the article was:
Ultimately, it’s hard to see how Wolfram Alpha could be called either the next Google or the next Cuil. Rather, it seems to have the ambition of making accessible a whole different type of information, that could be quite useful to a significant subset of Internet users. And eventually, that might make it a good compliment, but not a replacement, for today’s leading search engines.
Clip and save this write up for reference.
Stephen Arnold, May 9, 2009
XML as Code and Its Implications
May 11, 2009
I read Tom Espiner’s ZDNet article “EC Wants Software Makers Held Liable for Code” here. I have been thinking about his news story for a day or two. The passage that kept my mind occupied consists of a statement made by an ED official, Meglena Kuneva:
If we want consumers to shop around and exploit the potential of digital communications, then we need to give them confidence that their rights are guaranteed,” said Kuneva. “That means putting in place and enforcing clear consumer rights that meet the high standards already existing in the main street. [The] internet has everything to offer consumers, but we need to build trust so that people can shop around with peace of mind.
Software makers for some high profile products shift the responsibility for any problems to the licensee. The licensee is accountable but the software maker is not. I am not a lawyer, and I suppose that this type of thinking is okay if you have legal training. But what if XML is programmatic? What does that mean for authors who write something that causes some type of harm? What about software that generates content from multiple sources and one of those sources is defined as “harmful”?The blurring of executable code and executable content is a fact of online life today. Good news for lawyers. Probably not such good news for non lawyers in my opinion.
Stephen Arnold, May 11, 2009
Autonomy Scores a PR Coup
May 11, 2009
If you are in the search marketing business, you may want to do a case study of Autonomy. The London Times’s story “It May Seem Confusing but Autonomy Can Help” by Mike Harvey was a master stroke. You can read the full text of the write up here. With headlines going to Google, Microsoft, and Wolfram Alpha, Autonomy’s management has wrested attention from these firms and slapped the attention on its products. The subhead for the article made a case for building an organization’s information framework with Autonomy’s digital building blocks with this statement, “The company’s technology enables customers to decipher information from multiple sources, giving it a world-leading role.” For me, the most interesting comment in the article was:
According to Dr Lynch, Autonomy is leading a revolution in the information technology industry. After 40 years of computers being able to understand only structured information that could be found in the rows and columns of a database, computers armed with Autonomy’s software can understand human-style information, such as phone conversations. That means, Dr Lynch argues, that Autonomy now has the world’s most advanced search engine for businesses, which can help companies to reveal the value in the masses of e-mails, phone calls and videos that form the majority of ways in which staff communicate with each other.
I think it will be interesting to see how its competitors respond. Oh, the article includes a biographical profile of Sir Michael Lynch. Not even Messrs. Brin and Page rate that type of coverage.
Stephen Arnold, May 11, 2009
Link Horror: The Thomas Crampton Affair
May 10, 2009
Link loss is not movie material. Careless “removal” of a server or its content can cause some pain. You can read about the personal annoyance (anguish maybe”?) expressed by journalist Thomas Crampton. His stories written for a fee have disappeared. The details are here. There is another angle that is not just annoying, it is expensive to rectify. Wikipedia linked to a Web site called IHT.com, the online presence of the International Herald Tribune, or was was the Web site. You can read about that issue here. Now the Wikipedia links are dead and the fix is going to require lots of volunteers or a script that can make the problem go away. Either way, this is an example of how organization’s think about what’s good for themselves or what those organizations perceive is the “right” approach and the unexpected consequences of the decision. I see this type of uninformed decision making too frequently. The ability to look at an issue from an “overflight” position is seen as silly, too time consuming, or not part of the management process. I think the Thomas Crampton Affair might make a compelling video.
Stephen Arnold, May 10, 2009
Security: Search a Factor
May 10, 2009
Security of online information is critical to any company who operates on the Internet, from large corporations to medical institutions to the federal government. Remember the stolen laptop? Security online, especially when setting up a database of searchable, confidential material, is a herculean task, because if it’s online–someone can search and find it. Case in point, a headline from May 7: US Med Data Held Hostage by Hackers; Ransom: $10M. See the article at http://bit.ly/16IoZi. Hackers stole over eight million cases of drug prescription records, social security numbers, and driver’s license details from Virginia on April 30. It was reported that several layers of protection failed and allowed the hackers access. It’s not the first time something like this has happened. Data security online must be improved, or we’re all going to be facing a lot more fraud in the future.
Jessica Bratcher, May 10, 2009
SharePartXXL Taxonomy Component
May 10, 2009
Some azure chip consultants tout a taxonomy as the spike that will kill the werewolf of information retrieval. A number of vendors have recognized the hunger of organizations with disappointing search systems. I cast an eye over the offerings, and I have visited with developers of these systems. A large number of SharePoint taxonomy solutions exists in the Microsoft ecosystem.
SharePoint Reviews covers quite a few SharePoint add ins. Jeremy Caney does a good job describing a product available from SharePartXXL. You can read “Taxonomy Extension by SharePartXXL Integrates Nicely with MOSS 2007” here. The product snaps into SharePoint and adds taxonomy management functions not included in SharePoint. Mr. Caney points to some shortcomings in the product. In my experience, there are only a few industrial strength taxonomy tools available that provide comprehensive control of term lists. Even fewer are able to generate ANSI standard taxonomies.
You can get information about SharePartXXL’s solutions here. These range in cost from about $1,500 to $3,500.
If you need the horsepower for managing ANSI standard term lists, taxonomies, and controlled vocabularies, you will want to take a look at the products available from Access Innovations here.
Stephen Arnold, May 10, 2009
YAGG Plagues Google Gmail
May 9, 2009
Short honk: The BBC reported here that Gmail suffered an outage’. You can read the May 8, 2009 story here. The Beeb’s headline tells the tale’ “Google Email Service Back Up After GFail.” For other Google glitches search this Web log for YAGG, yet another Google glitch.
Stephen Arnold, May 9, 2009
New Media, Old Media Spoofed
May 7, 2009
The story “Student’s Wikipedia Hoax Quote Used Worldwide in Newspaper Obituaries” here underscored for me the precarious nature of “information” in today’s world. The Irish Times reported that a fake quote in Wikipedia popped up in newspapers around the world. New media and old media both fell into the comfortable assumption that if it is online, information is correct, accurate, true, and well-formed.
At a conference yesterday, I spoke with a group of information professionals. The subject was the complexity of information. One of the people in the group said, “Most of the technical information goes right over my head. At work, people turn to me for answers.”
I don’t want to dip into epistemological waters. I can observe that the rising amount of digital information (right or wrong is irrelevant) poses some major challenges to individuals and organizations. The push for cost reduction fosters an environment suitable for short cuts.
Last Sunday, one of my publishers (Harry Collier, managing director, Infonortics Ltd.) and I were talking about the change in how large news organizations operated. He had worked from book and newspaper publishers earlier in his career as i had. He noted that the days of investigative reporting seem to have passed.
I had to agree. The advent of online has made research something that takes place within the cheerful confines of the Web browser. Interviews I once conducted face to face, now take place via email. Even the telephone has fallen from favor because it is difficult to catch a person when he or she is not busy.
A number of companies involved in content processing are experimenting with systems that can provide some clues to the “provenance” or “sentiment” of information. The idea is that tireless software can provide some guideposts one can use to determine if a statement is right or wrong, hot or cold, or some similar soft characteristic.
The quote story from the Irish Times highlights the pervasiveness of online short cuts. In this case, the quote is unlikely to do much lasting harm. Can the same be said of other information short cuts that are taken each day? Will one of these short cuts trigger a chain of unexpected events? Will the work processes that encourage short cuts create ever more chaotic information systems that act like a brake on performance? Who is spoofing whom? Maybe ourselves?
Stephen Arnold, June 8, 2009
Twitter Pumps Search
May 7, 2009
Newsfactor here and other Web news services posted stories about Twitter getting a dose of search steroids. You will want to read “Not-for-Sale Twitter Is Expanding Search Functionality” by Patricia Resende to get the details. Ms. Resende wrote:
Twitter Search will be used to crawl information from links by Twitters to analyze and then index the content for future use, Jayaram, a former vice president for search quality at Google, told Webware. Currently Twitter Search is only used to search words included in tweets, but not words in links. Along with its new crawling functionality, Twitter Search will also get a ranking system. When users do a search on trending topics — the top-10 topics people tweet about, which get their own link on the Twitter sidebar — Twitter will analyze the reputation of the tweet writer and rank search results partially based on that.
To me, I think this scoring will be an important step. Here’s why:
- Clickstream metrics by individuals about topics, links, and words provide important clues to smart software
- Individuals with large numbers of followers provide “stakes in the sand” for making some type of subjective, value-centric calculation; for example, a person with more followers can be interpreted as an “authority”
- Individuals who post large number of results and have followers and topics add additional scoring dimensions for calculating “reputation” and other squishy notions.
A number of commercial content processing companies are in the “reputation” and subjective scoring game, but Twitter is a free (for now) real time service with a large volume of posts. The combination makes Twitter a potential dark horse in the reputation analysis game. Believe me. That game has some high stakes. Nonsense about waiting in line at a restaurant becomes high value data when one can identify high score folks standing in line multiple times per week. You don’t have to be a rocket scientist to figure out that the restaurant is doing something right. The score may not be a Zagat type report, but it works pretty well for making certain types of marketing scans useful.
Twitter on steroids plus real time search. More than teen craziness I assert.
Stephen Arnold, May 8, 2009
Google and Publishing: Some Metrics
May 7, 2009
The Guardian reported some metrics about Google and publishing. You will find the summary of a Googler’s speech at a publishing conference here. The article is “Google: We Give Publishers £3.3bn”. Highlights from the news story include:
- A quote attributed to Googler Matt Brittin, “We want to help publishers make money online”
- Google sends a billion clicks to publishers each month
- Google wants to “work with publishers to improve their digital revenues and help close the gap between print and online advertising returns”.
For me, the most interesting comment in the article was this passage:
He [Matt Brittin, Googler] said publishers should look to use technology to help their digital publications move at a greater pace and keep up with consumer demand, but that while it could help, Google could not offer all the necessary solutions.
The challenge that I see is that publishers think about technology in terms of putting color pictures in newspapers and slashing costs. Technology as the term is used by Googlers may a more sophisticated approach.
I don’t think the audience was able to see a path through the swamp. I wonder if any of those Google billions were jingling in the pockets of the attendees?
Stephen Arnold, June 7, 2009