Social Search: Manipulating for Money

January 9, 2009

Mike Elgan wrote “How China’s 50 Cent Army Could Wreck Web 2.0” here. The point of this article is that a person with money can hire Chinese computer users to insert comments into social networks. The infusion of posts would, in effect, distort the much-ballyhooed wisdom of crowds. Mr. Elgan does a good job of explaining how these army works and pointing out the fragility of user-dependent Web 2.0 services. I think he strays from the tethering ring when we asserts that the Chinese “army” can undermine free speech, but otherwise, he’s spot on.

However–and I know you relish my “howevers”–a few of my addled goose observations are now in order.

First, the “social network” revolution is not as zippy as most pundits assert. Mr. Elgan’s write up explains how the person with money can pay to make a specific issue, product, or person percolate upwards. Money can’t buy happiness but it sure can buy visibility in a Web 2.0 service that depends on user inputs.

Second, social networks is more of marketing story than a technology innovation. Sure, MySpace.com and Facebook.com move well beyond discussion fora and individual Web pages. These sites have knitted together functions and surfed on young-at-heart users who need a way to connect in today’s Jetson’s world. As the young-at-heart grow old and infirm, their use of network communication methods will persist, but these methods are extensions of older technologies, not sudden inventions.

Third, the implications of a technology cannot be accurately predicted. As a result, when an issue arises with a technology application or suite of technology applications like social networks, the “fix” will be more technology. My concern with MySpace.com and Facebook.com stems not from what they do, but my concern arises from the new technologies these services will require to handle the problems. For example, what’s the fix for the Chinese “army” issue? Think more stringent controls. The casualty is not free speech. It is freedom.

Stephen Arnold, January 9, 2009

Lawyers and Metadata

January 8, 2009

Now the indexing world gets something to gnaw on. Automated indexing systems beat out humans when measured by cost per item indexed, speed, and consistency. Automated indexing systems can be as good as a human for some types of content. But humans are variably bad at indexing. Software hits a sweet spot and doesn’t get significantly better or worse unless the content throws in a wrench. Now the issue of not providing metadata arises. We can automate the creation of metadata, but it is early days in the world of automatic metadata scrubbing. I quacked happily when I thought, “I wonder who knows where their metadata are?”

Jim Calloway’s “Metadata–What Is It and Waht Are My Ethical Duties” here breathes new life into human indexing. What I find interesting is that lawyers charge by the hour. Human indexes are paid by piece work schedules or given a flat year fee and maybe some benefit crumbs. The economics of human indexing is based on keeping the per record cost as low as possible whilst one maintains the “quality” of the indexing. “Quality” in the commercial database world is often defined as a metric such as “four to six index terms per bibliographic record” or “16 records per hour with required fields completed”. You may have a more academic definition, but my examples come from the soon-to-be-marginalized world of human commercial database production.

The article defines metadata in terms of a legal eagle, of course. But the story gets interesting when Mr. Calloway cites a sitaution in which metadata became a legal issue. Where there is a legal issue, there is the risk of a fine, jail, or losing pride of place among the brood of legal eagles. Forget the compensation. Ego may be a bigger force in the legal eagle world. Mr. Calloway nicely hooks metadata with risk.

For me, the most important comment in this useful write up was:

In this writer’s view, the key is to avoid sending out documents with metadata that could disclose confidential information. Comparing metadata to a wrongly sent fax or e-mail is questionable and the idea that lawyers will be prohibited from examining metadata while parties, law enforcement officers and private detectives will be free to do so seems artificial at best. The Colorado rule that one must disclose receiving confidential information via metadata before acting on it seems to strike a rational balance. The best rule is for law firms to develop best practices internally to keep metadata from “escaping” in the first place.

I quite like “keep metadata from escaping in the first place”. To close, let me ask several questions:

  • Do you know why metadata are in the documents available for indexing on your Web site
  • Do you know how value added indexing in a dataspace can expand the access to a document in an often unrelated context
  • Do you know where metadata are in a document, in a Web page or other containing housing the document, or in the dataspace created for the information objects?

If not, you will want to dig up this information yourself. Asking your attorney will result in a very large legal bill. One final question: Do you think Mr. Madoff knows about his metadata?

Stephen Arnold, January 8, 2009

Google Joins EU Privacy Commission Advisory Group

December 30, 2008

Out-Law.com here reported that Google’s privacy law expert has snagged a seat on a committee which will provide input to the European Commission about data protection. You can read the story here. For me the most important comment in the story was:

Google prompted a debate on retention when it announced it would no longer keep logs indefinitely, but would delete them after 18 months. Data protection authorities argued that logs should be kept for no longer than six months. Google eventually conceded that the EU’s Data Retention Directive did not apply to the information, and has said that it will now only keep records for nine months.

The thought that I had was, “Why keep them any longer than necessary?” The GOOG crunches the data in near real time, tokenizes it, and stuff the outputs of its processes into its nifty data management system. Inside the GOOG, various systems and methods grind away, feeding outputs into other Google operations. The 18 months, the nine months, even the six months of retention are red herrings. The GOOG zooms through data so chopping “months” down may be a negotiating tactic. In my experience with government and quasi government advisory bodies, pertinent facts and solid technical knowledge can be as hard to find as a pig in the hollow who volunteers to become a Kentucky ham.

Stephen Arnold, December 30, 2008

« Previous Page

  • Archives

  • Recent Posts

  • Meta