Data.gov Revealed

July 22, 2009

Mashable ran an analysis by Jim Hendler called “What’s in Data.gov?” I must admit that I have not set aside the time necessary to figure out what this new government service offers. I am a bit jaded when it comes to government information. The Web accessible content is often interesting, but I find it less helpful than information that I have seen in the course of my projects for a number of countries’ governmental entities. In short, the good stuff is rarely online. What is online, is often baffling to me because basic metadata such as the date the document was created and last changed are often missing. Even the author is elusive. Locating a person who knows about a particular document can be an exercise in frustration. Mr. Hendler’s write up explains what is in Data.gov. For me, the most interesting comment was:

Not all of the datasets have a link to downloadable data because some offer only browseable data via their own websites,  Others  publish datasets in multiple formats. As of today, the online static files associated with the datasets are distributed as  follows:  204 datasets offer a CSV format dump, 10 datasets offer an XML format dump, and 21 datasets offer an XLS format dump.

In short, a promising start but inconsistent, incomplete, and fragmented. Governments are not particularly skilled in electronic publishing. Progress is evident however.

Stephen Arnold, July 22, 2009

Search Engine Types

July 21, 2009

The South African Web site IOLTechnology has a useful run down of its view of the types of Web search engines. I download “Moving beyond Bing.” Useful.

Stephen Arnold, July 20, 2009

Google Travel

July 20, 2009

Short honk: Navigate to Google.com. Enter the query “LGA SFO” and you get a structured search box. Click and you get air ticket prices. Google has a number of vertical sectors in its tractor beam. Google’s wants to be a player in online travel information as well. The story in Hotel Marketing provides the basics and offers some links. I provide a run down of the vertical sectors my research suggests. Disruption ahead for those in other sectors. Google is picking up its pace in my opinion. How will travel publishers and information providers respond? Ignore, surf, fight, or dither?

Stephen Arnold, July 20, 2009

Brainware and Exalead: Name Magnetism

July 20, 2009

A rose by any other name may smell as sweet … not in the world of search companies!

Let me give you two examples of search company naming and point out the importance of eliminating confusion for those seeking information about a search and content processing system.

First, run a query on Google Video for Exalead. The system returned 17 pages of videos. I scanned the 170 links and did not spot a false drop. I like the name Exalead. The “exa” reminds me of exabytes of data. With the volume of email I get as a result of my addled goose musings in this Web log. One thousand petabytes is an exabyte, so Exalead’s name connotes software that can handle large volumes of data. The “lead” evokes leadership. I think the founder of Exalead contributed to the company’s name. Whoever came up with “Exalead” deserves a happy quack. Herewith: Quack.

Exalead’s branding, therefore, is solid and strong in my opinion. Tossing the name “Exalead” into a metasearch engine like Navgle.com, I got a mash up of content from various sources, and again I did not spot a false hit. In fact, it is easy to isolate information about the Exalead search system. For Exalead, Twitter had spot on results. No false hits.

Second, run a query on Google Video for Brainware. Looking through the smaller hit list (roughly one third the number of hits for Exalead videos), I noticed several items of interest. (Please, run these queries yourself and draw your own conclusions.)

I noticed straightaway that another organization uses the name “brainware” as a conference name; that is, Montreal Brainware. Interesting. The conference was given in 2001 and did not seem to get traction. Some confusion, but it is difficult to confuse a conference with a software company. As I scanned the results list, I saw a link to a computer game, which seemed to be five months old but a dead link, a link to a wiffle ball team video, a link to a health related video, and more game references. I ran the “brainware” query on Navgle.com and left the hit list review with these points in mind:

  • Navgle returned a number of false drops for the query “brainware”, including a link to a childhood education program
  • One of the top Web results was a design company operating under the name “Brainware” at a .net domain which means the search vendor Brainware did not snag other domains to help prevent such brand claim jumping
  • There were zero tweets about the search vendor.

There were zero tweets about the search vendor. (Twitter is reviled by many, but it mirrors certain market conditions.)

Why is naming a search and content processing company important? In my opinion there are several reasons:

First, if a searcher is confused about “which Brainware”, it may be a marketing negative. Second, by not buying other domains, the search vendor  loses control of the selected name. I was surprised at how many “brainwares” were in the wild. Did the search vendor consider that potential customers, faced with wiffle ball, might give up look elsewhere for scanning and indexing technology? In my opinion, an unambiguous brand is important for search engine indexing robots, but obviously some people do not agree with my view.

My take: Exalead has a name that makes it quite easy for a potentially interested customer to find information about the firm’s search and content processing technology. No brand claim jumping. Even the single word query returns relevant results. Zero confusion in my goose pond.  In contrast, Brainware has a name that creates opportunities for confusion. Naming is a big deal for marketing, trademark protection, and getting a high ranking in Bing.com or Google.com result lists.

Google’s and Microsoft’s naming conventions are problematic in my view, but these outfits can cope due to their size and marketing horsepower. Smaller search vendors need to get the basics lined up like toy soldiers. Putting hurdles in front of a prospect does not seem like a good business tactic. In today’s business environment, getting the name associated aspects of marketing nailed down is important because it can affect the perceived value of a company and its
products. Just my opinion. Honk!

Stephen Arnold, July 20, 2009


Bing and Censorship

July 20, 2009

Short honk: A reader alerted me to the Bing.com filter that chops out certain content and creates a collection of a mini vertical search engine for segmented content. The filter is now applied to X rated content. You can read about the filter in Network World’s story “Bing Gets Porn domain to Filter Out Explicit Images and Videos”. There are a number of complicated issues in play. The present solution creates an interesting revenue generating opportunity for Bing.com. Will Microsoft exploit it? I wonder how different this type of filtering from the Amazon filtering of certain content?

Stephen Arnold, July 20, 2009

Wall Street Journal Suggests Internet Is Dead

July 19, 2009

The addled goose is not certain if the story “The Internet Is Dead (As An Investment)” will be online without a charge when you click the link. Newspapers fascinate me. Some of their information is free; some transient; and some available for hard cash.

What I find useful to follow are stories that make it clear that certain business sectors are “dead”. In Heathrow on Friday, June 17, 2009, I received a free Daily Telegraph when I bought a nut and granola bar. I did not want a newspaper because my Boingo connection was alive. Even though the Daily Telegraph was a svelte bundle of paper, the news was old. Free “yesterday” was not compelling. The argument in James Altucher’s wealth column is that utilities like electricity and the Internet are linked in this way:

Electricity greatly improved our quality of life. But I’m not going to get excited about buying a basket of utility companies. Same for the Internet. Can’t live without it, but can’t live with it (in my portfolio).

I recall reading a business monograph The Mind Of The Strategist: The Art of Japanese Business by Kenichi Ohmae. Now more than a decade old, I recall the case analysis of the bulk chemical business. I wonder if that discussion of an uninteresting, commodity business holds some truths for Mr. Altucher and newspapers thinking along the same lines as the Wall Street Journal. The Daily Telegraph may benefit as well. There were many discarded Telegraphs in the lounge at Heathrow. Online economics requires a recalibration of some business yardsticks. Is Internet investment dead like the company who hit the jackpot with bulk chemicals? Glittering generalities are useful but may reveal more about the thinking of a newspaper’s editorial team beliefs, not the opportunities utilities and commodities represent.

Stephen Arnold, July 19, 2009

Digital Revision and Online

July 18, 2009

Amazon has whipped up a cloud computing thunderstorm. You can tackle this story by entering the word “Kindle” in almost any news search system. One interesting post is MG Giegler’s article for TechCrunch, “Amazon, Why Don’t You Come in Our Houses and Burn Our Books Too?” For me, the key passage was:

This remote deletion issue is an increasingly interesting one. Last year, Apple CEO Steve Jobs confirmed that the company has a remote “kill switch” to remove apps from your device if it thinks that is necessary. To the best of my knowledge, they have yet to use such functionality, and would only do so if there was a malicious app out there that was actually causing harm to iPhones. They have not even used it to kill some poor taste apps that were quickly removed from the App Store, like Baby Shaker.

The addled goose wants to remain above the fray. The core issues from his perspective are different. For instance, as online services roll up via buy outs and the failure of weaker competitive services, a “natural monopoly” emerges. One can see this in the 1980s in the growth of Dialog Information Services and LexisNexis as the big dogs in online search. Over time, options emerged and now there are a handful of “go to” services. As these big dogs respond to challenges and issues, the Amazon deletion event becomes more visible. In my opinion what’s at work is an organization that makes a situational decision and then discovers that its “natural monopoly position” creates unanticipated problems. The ability of some online services to make informed decisions increases after an event such as deleting information. The deletion may be nothing more than a pointer to an object. Metadata and its persistence are more important in some cases than the digital content itself.

The second issue is the increasing awareness users and customers have about utility type services. The customer sees the utility as benign, maybe making decisions in favor of the individual user. The Kindle deletion scenario makes clear that paying customers are not the concern of the organization. I know that the ripples from the deletion of content will not subside quickly. A single firm’s decision becomes a policy issue that is broader than the company’s business decision.

Now shift gears from digital objects that one can find on such sites as Project Gutenberg in Australia to other content. When online services consolidate, the likelihood that digital revisionism will become more widespread seems a likely outcome to me. Policy decisions in commercial entities pivot on money. The policy, therefore, does not consider an individual user.

I know that most government agencies don’t worry about me, paddling around my duck pond. The impact of a decision taken by an online organization seems to send shock waves that may not be on the radar of corporate executives.

The issue, in my opinion, is the blurring of a commercial entity’s decision made for its benefit with broader public policy issues. What happens when an online service becomes a virtual monopoly. Who will regulate the entity? Legal eagles will flock to this issue, but digital revisionism is not new. Digital revisionism now gains importance as more people rely on a commercial entity to deliver a utility service.

Stephen Arnold, July 18. 2009

InQuira IBM Knowledge Assessment

July 18, 2009

A happy quack to the ArnoldIT.com goose who forwarded the InQuira Knowledge Assessment Tool link from one of my test email addresses. InQuire, a company formed from two other firms in the content processing space, has morphed into a knowledge company. The firm’s natural language processing technology is under the hood, but the packaging has shifted to customer support and other sectors where search is an enabler, not the electromagnet.

The survey is designed to obtain information about my knowledge quotient. The url for the survey is http://myknowledgeiq.com. The only hitch in the git-along is that the service seems to be timing out. You can try the survey assessment here. The system came back to life after a two minute delay. My impressions as I worked through this Knowledge IQ test appear below:

Impressions as I Take the Test

InQuira uses some interesting nomenclature. For example, the company asks about customer service and a “centralized knowledge repository”. The choices include this filtering response:

Yes, individuals have personal knowledge repositories (e.g., email threads, folders, network shared drives), but there isn’t a shared repository.

I clicked this choice because distributed content seems to be the norm in my experience. Another interesting question concerns industry best practices. The implicit assumption is that a best practice exists. The survey probes for an indication of who creates content and who maintains the content once created. My hunch at this point in the Knowledge IQ test is that most respondents won’t have much of a system in place. I think I see that I will have a low Knowledge IQ because I am selecting what appear to me to be reasonable responses, no extremes or categoricals like “none” or “all”. I note that some questions have default selections already checked. Ideal for the curious survey taker who wants to get to the “final” report. About mid way through I get a question about the effectiveness of the test taker’s Web site. In my experience, most organizations offer so-so Web sites. I will go with a middle-of-the road assessment. I am now getting tired of the Knowledge IQ test. I just answered questions about customer feedback opportunities. My experience suggests that most companies “say” feedback is desirable. Acting on the feedback is often a tertiary concern, maybe of even lower priority.

My Report

The system is now generating my report. Here’s what I learned: my answers appear to put me in the middle of the radar chart I have a blue diagram which gives me a personal Knowledge IQ.

inquira report 01

Read more

Kapow Technologies

July 17, 2009

With the rise of free real time search systems such as Scoopler, Connecta, and ITPints, established players may find themselves in shadows. Most of the industrial strength real time content processing companies like Connotate and Relegence prefer to be out of the spotlight. The reason is that their customers are often publicity shy. When you are monitoring information to make a billion on Wall Street or to snag some bad guys before those folks can create a disruption, you want to be far from the Twitters.

A news release came to me about an outfit called Kapow Technologies. The company described itself this way:

Kapow Technologies provides Fortune 1000 companies with industry-leading technology for accessing, enriching, and serving real-time enterprise and public Web data. The company’s flagship Kapow Web Data Server powers solutions in Web and business intelligence, portal generation, SOA/WOA enablement, and CMS content migration. The visual programming and integrated development environment (IDE) technology enables business and technical decision-makers to create innovative business applications with no coding required. Kapow currently has more than 300 customers, including AT&T, Wells Fargo, Intel, DHL, Vodafone and Audi. The company is headquartered in Palo Alto, Calif. with additional offices in Denmark, Germany and the U.K

I navigated to the company’s Web site out of curiosity and learned several interesting factoids:

First, the company is a “market leader” in open source intelligence. It has technology to create Web crawling “robots”. The technology can, according to the company, “deliver new Web data sources from inside and outside the agency that can’t be reached with traditional BI and ETL tools.” More information is here. Kapow’s system can perform screen scraping; that is, extracting information from a Web page via software robots.

Second, the company offers what it calls a “portal generation” product. The idea is to build new portals or portlets without coding. The company said:

With Kapow’s technology, IT developers [can]: Avoid the burden of managing different security domains; eliminate the need to code new transaction; and bypass the need to create or access SOA interfaces, event-based bus architectures or proprietary application APIs.

Third, provide a system that handles content migration and transformation. With transformation an expensive line item in the information technology budget, managing these costs becomes more important each month in today’s economic environment. Kapow says here:

The module [shown below]  acts much as an ETL tool, but performs the entire data extraction and transformation at the web GUI level. Kapow can load content directly into a destination application or into standard XML files for import by standard content importing tools. Therefore, any content can be migrated and synchronized to and between any web based CMS, CRM, Project Management or ERP system.

image

Kapow offers connections for a number of widely used content management systems, including Interwoven, Documentum, Vignette, and Oracle Stellent, among others.

Kapow includes a search function along with application programming interfaces, and a range of tools and utilities, including RoboSuite (a block diagram appears below):

image

Source: http://abss2.fiit.stuba.sk/TeamProject/2006/team05/doc/KapowTech.ppt

Read more

Big Data, Big Implications for Microsoft

July 17, 2009

In March 2009, my Overflight service picked up a brief post in the Google Research Web log called “The Unreasonable Effectiveness of Data.” The item mentioned that three Google wizards wrote an article in the IEEE Intelligent Systems journal called “The Unreasonable Effectiveness of Data.” You may be able to download a copy from this link.

On the surface this is a rehash of Google’s big data argument. The idea is that when you process large amounts of data with a zippy system using statistical and other mathematical methods, you get pretty good information. In a very simple way, you know what the odds are that something is in bounds or out of bounds, right or wrong, even good or bad. Murky human methods like judgment are useful, but with big data, you can get close to human judgment and be “right” most of the time.

When you read the IEEE write up, you will want to pay attention to the names of the three authors. These are not just smart guys, these are individuals who are having an impact on Google’s leapfrog technologies. There’s lots of talk about Bing.com and its semantic technology. These three Googlers are into semantics and quite a bit more. The names:

  • Alon Halevy, former Bell Labs researcher and the thinker answering to some degree the question, “What’s after relational databases”?”
  • Peter Norvig, the fellow who wrote the standard textbook on computational intelligence and smart software
  • Fernando Pereira, the former chair of Penn’s computer science department and the Andrew and Debra Rachleff Professor.

So what do these three Googlers offer in their five page “expert opinion” essay?

First, large data makes smart software smart. This is a reference to the Google approach to computational intelligence.

Second, big data can learn from rare events. Small data and human rules are not going to deliver the precision that one gets from algorithms and big data flows. In short, costs for getting software and systems smarter will not spiral out of control.

Third, the Semantic Web is a non starter so another method – semantic interpretation – may hold promise. By implication, if semantic interpretation works, Google gets better search results plus other benefits for users.

Conclusion: dataspaces.

See Google is up front and clear when explaining what its researchers are doing to improve search and other knowledge centric operations. What are the implications for Microsoft? Simple. The big data approach is not used in the Powerset method applied to Bing in my opinion. Therefore, Microsoft has a cost control issue to resolve with its present approach to Web search. Just my opinion. Your mileage may vary.

Stephen Arnold, July 17, 2009

« Previous PageNext Page »

  • Archives

  • Recent Posts

  • Meta