Knewco: Community Tags
May 29, 2008
Peter Suber offers a clear, detailed post about a new approach to community tags. You can read his post “Combining OA, Wikis, Community Annotation, Semantic Processing, and Text Mining” here. Mr. Suber includes a link to a discussion of the idea in Genome Biology here.
What’s interesting to me is the specialist nature of the effort. Although anyone can tag, the focus is STM (scientific, technical, and medical). The idea is to create rich indexing for technical information. I think this is a good idea. I think there will be challenges because a small number of people do most of the work. Nevertheless, these types of projects are sorely needed.
The company responsible for the technology is Knewco, founded by several academics. You can learn more about the firm here. Knewco has developed some tag options that are interesting. I think the value will come from POW or “plain old words”.
Why do I care about this and what’s the wiki variant have to do with search? Well, a lot. First, technical information has long been in the hands of a small number of multi-national firms. If you want to search engineering or chemical information, you have to use specialist files and sometimes pay big, big online access charges. This type of project is one more example of the research community feeling its oats. Good for researchers and potentially threatening to the oligopolies in the STM information business.
Second, I like the idea that information innovation is coming from thinkers outside the traditional IR (information retrieval) community. When I go to conferences, there are 20-somethings who have an opportunity to lecture me on their major insight, Use For references. Okay, been there. Done that. Fresh thinking is important, and I am delighted that Knewco is trying pop ups, colors, and other bells and whistles that may point to some new directions in tagging.
Finally, the larger the body of publicly accessible tags, the better the next-generation systems will be. Google, as I point out in my new study due out in September 2008, is focused on making its software smarter. Humans play a role, but the GOOG knows the value of indexing, taxonomies, tags, and their breathern.
On the downside, I don’t like the company name “Knewco”. In fact, Knewco uses coinages for its different functions; for example, a “knowlet”. I hate having to memorize a neologism for something I call a cross reference. But that’s a personal preference. Check the company’s Web technology here.
Stephen Arnold, May 29, 2008
Good Enough Means Trouble for Commercial Database Publishers
May 28, 2008
I began work on my new Google monograph. (I’m loath to reveal any details because I just yesterday started work on this project.) I will be looking at Google’s data management inventions in an attempt to understand how Google is increasing its lead over search rivals like Microsoft and Yahoo while edging ever closer to providing data services to organizations choking on their digital information.
As part of that research, I came across several open source patent documents that explain how Google uses the outputs of several different models to determine a particular value. Last week a Googler saw my presentation which featured a Google illustrative output from a patent application and, in a Googley way, accused me of creating the graphic in Photoshop.
Sorry, chipper Googler, open source means that you can find this document yourself in Google if you know how to search. Google’s system is pretty useful for finding out information about Google even if Googlers don’t know how to use their own search system.
How does Google make it possible for my 86-year-old father to find information about the town in Brazil where we used to live and allow me to surface some of Google’s most closely-guarded secrets? These are questions worth considering. Most people focus on ad revenues and call it a day. Google’s a pretty slick operation, and ads are just part of the secret sauce’s ingredients.
Running Scenarios In my experience, it’s far more common for a team to use a single model and then run a range of scenarios. The high and low scenario outputs are discarded, and the results are averaged. While not perfect, the approach yields a value which can be used as is or refined as more data become available to the system. Google’s twist is that different models generate an answer.
This diagram shows how Google’s policy of incremental “learnings” allows one or more algorithms to become more intelligent over time.
The outputs of each model is mathematically combined with other models’ outputs. As I read the Google engineers’ explanations, it appears that using multiple models generates “good enough” results, and it is possible, according to the patent document I am now analyzing, to replace models whose data are out of bounds.
Lawyers: Mixed Opinions about Law and Online Giants
May 26, 2008
I’m no attorney (thank goodness). I don’t understand lawyers, lawmakers, or the pundits who explain what legal eagles do, don’t do, and won’t do.
Two news items caught my attention. Despite my feeling lousy, I decided to urge you to read both. The CNet story explains that Viacom is suing Google for one billion dollars. Google argues that it complies with applicable copyright laws. The old media company and the new media company are going to meet in court. (Someone told me that 95 percent of litigation is resolved before going to trial.) You can read this clear write up here.
The other story is from ZDNet Australia about a judge in that country who opines that “Google, Yahoo Make Lawmakers Impotent”. You can read it here.
The Australian judge offers the opinion that technology is changing too fast for the courts. Technology allows some companies to “beat the legal system”.
Lawyers, based on my limited experience, are not good technologists. My sample is small, but the attorneys whom I have known also have trouble with math. Suggest that a Riemann zeta function is a use procedure, and I experienced a nervous chuckle. The sidekick of blind justice were not sure if I was kidding, or I were serious.
In my first Google study “The Google Legacy” and in my second “Google Version 2.0” I argued that lawyers could kill Google. Both are available from Infonortics Ltd. in Tetbury, Glou.
I’m not sure if “kill” is the correct word. A legal process can suck money, management attention, and public perception at prodigious rates. A sufficiently bad run of luck in the courts could slap a weight jacket on the GOOG.
On the other hand, lawyers, if the Australian judge’s observation is somewhat accurate, might be their own bear trap. A lawyer trying to explain how algorithms and teenagers undermine a traditional media giant could confuse matters in an interesting way.
My view is that technology is not just outpacing the legal system. Technology is in the process of redefining some of the principles that are codified in many countries’ laws. The problem is analogous to the wrenching of the Roman legal system before Julius Caesar and the wild and crazy mess that followed his brief term in office. Roman law never adapted. One might point out that Italy’s present legal system is still pretty wacky. Nevertheless, the Italian technologists in Modena, Bologna, and Rome seem to be innovating without much friction from Italian courts.
Yahoo could be taken out by the courts. The company is in “transition”. Google, on the other hand, may have the resources to deal with lawyers who want to put technology in its place, snap a shock collar on Google, and keep the clueless traditional giants paying those fat, fat fees. In law, attorneys’ math is good enough to get those bills in the mail.
Stephen Arnold, May 27, 2008
Government High-Tech Investments: IN-Q-TEL
May 26, 2008
I received an email from a colleague new to the Federal sector. Her email included comments and links about US government funding of high technology companies. I was surprised because I assumed that most people knew of the IN-Q-TEL organization. As US government urls go, IN-Q-TEL’s will baffle some people. First, the hyphens throw off some folks. Then the group’s use of the Dot Org domain is another.
In a nutshell, IN-Q-TEL makes clear what it does and why:
IN-Q-TEL identifies, adapts, and delivers innovative technology solutions to support the missions of the Central Intelligence Agency and the broader US intelligence community.
I’m not interested in whether IN-Q-TEL is doing a great job or a lousy job. I’m not concerned about its mission, its funding, or its management team.
What I find fascinating is the organization’s choice of companies in which to invest. I don’t know the budget range of IN-Q-TEL, but my sources tell me that the investments stick close to $1 million, sometimes more, sometimes less. You can read more about IN-Q-TEL at these links:
- The Wikipedia entry, and I am not vouching for the accuracy of this entry
- The CIA’s own description here
- KMWorld’s write up here. (I am a paid columnist for KMWorld, but I did not contribute to this story.)
The purpose of this feature is to provide a snapshot of the companies in which IN-Q-TEL has invested. I’ve identified more than 70 companies. This is too many to put in one posting, so I will break up the list and cover the period 2000 to 2003 here and do each subsequent year in additional Beyond Search postings.
In the period from 2000 to 2003, IN-Q-TEL invested in 25 companies. Keep in mind that I may have overlooked some in my research. If you know of a company I missed, please, use the comment section of this Web log to update my information. These appear in the table below:
Usability: A Must Read from the WSJ
May 24, 2008
I’m too nerdy to read the Wall Street Journal every day. A few minutes with most stories, and I feel as if an MBA consultant were telling me that investment bankers are really nice people. I made an exception this morning, and I strong urge you to navigate to the Journal’s “Business Technology” Web log here.
The post “Business Software Vendor Finds Business Software impossible to Use” carries a date of May 21, 2008, but I just saw it. Nevertheless, this is an important story. The main point for me is this statement:
IFS, a business-software vendor, sent us [the Wall Street Journal] … the results of a survey earlier this year of more than 1,000 respondents. Its findings: “A full 60 percent of respondents said their enterprise software was somewhat difficult, very difficult or almost impossible to use. Only 9 percent characterized their applications as very easy to use.” The biggest time wasters, IFS found, were the need to search through complex navigation systems to find information and the need to learn how to use many programs that all worked differently.
Why is this important? First, the data substantiate my research, Sinequa’s data, and Jane McConnell’s information about enterprise search. High dissatisfaction rates and wasted time–these are the cripplers of some organization’s efficiency and decision making.
I’m going to try and get my hands on the full study from IFS. The Web log post doesn’t tell me how to get a copy. The WSJ provides this link to a study summary here. I filled in the IFS form here, but as of 7 am Eastern on May 24, 2008, I don’t have a copy. I want one. If you come across the full report, let me know.
Usability, not technology, is the key to success it seems. Hasn’t this been Steve Jobs’s mantra for a long time? Good work, Vauhini Vara. A content quack to you from the Beyond Search goose.
Stephen Arnold, May 24, 2008
Enterprise Search Endnote
May 22, 2008
A surprised squawk from the Beyond Search goose. In the end note (the wrap up talk for the two day conference about enterprise search in New York City), the data bunny made a brief and long-awaited appearance. The introduction to the end note made a note that her name was henceforth ‘search bunny’. After the laughter subsided, I made these points to summarize the more than 36 presentation dilivered over the 14 hours of the formal program.
The Data Bunny makes an appearance
First, this conference marked the first major meeting about enterprise that discussed ways to improve usability of existing systems and move beyond key word retrieval. The point was that most enterprise search users don’t feel comfortable sticking words and phrases in a search box and then perusing lists of results to see if the magic answer has been delivered.
Second, research data from my work and other industry experts substantiate the need for alternative interfaces. Graphics, although tasty eye candy, are not the answer. Information must be provided in a manner that meets the needs of individual users and the work task at hand. Forcing a laundry list of results on every user is out of step with today’s information environment.
Third, interfaces can use mobile functionality such as that available from Coveo’s mobile mail search service. An interface can combine a search box with a list of topics and categories like the one on the Oracle Technology Network’s Web site which is based on Siderean Software’s technology. Google has disclosed in a patent application an interface that presents a dossier or report. Instead of a list of topics, the output includes facts about the topic. One feature is a hot link to a map showing the location of the subject. There is no one-size-fits-all solution.
To wrap up the conference, the audience was challenged to:
- Demand more from vendors. Passivity, which allows the vendor to lead the licensee, has to give way to the licensee getting the vendor to deliver the solution the organization needs to succeed.
- Recognize that Google’s more than 9,500 Google Search Appliance licensees are buying into the idea that complex search is expensive, expensive, and prone to problems. Simplicity, stability, and extensibility are more important than 1,001 meaningless features.
- Embrace the opportunity to take a clean sheet of paper and redeine search in terms of information access.
The novelty of typing in two words and getting some results has given way to a greater appetite to solutions that work in the context of a work task.
After the formal presentation, several of the more than 300 in the audience, posed some questions. Here’s a summary of the questions that were asked more than one time:
Question: What’s the future of metadata and taxonomies?
Answer: Metadata is becoming more important. News types of metadata — who changed what in a document when? Who were the recipients of the document? — and similar trans-meta types of tagging are the next wave in metadata. So, the work associated with metadata is in its infancy.
Question: Why do you say ‘Search is dead?’
Answer: That’s shorthand for saying, “Most users don’t want to be shackled to a search box. Some users will want the search box as a primary means of access. A larger number of users want options for access. The search box, therefore, can no longer be the primary access vehicle in my opinion.
Question: Why did you mention only three companies?
Answer: I speak extemporaneously. I don’t read my talks. The examples–Coveo, Google, and Oracle–seemed relevant when I put together my examples 30 minutes before my remarks. I would like to have time to name many other vendors. I track the search and content processing technology of more than 50 organizations. In 20 minutes, there are hard limits on what I can do.
Question: Are you paid for these types of remarks?
Answer: I charge for everything. If organizations did not pay me, I would not be able to fund my research, pay for dog food, or buy airline tickets. As a consultant, my product is my time. Therefore, to use my time means that the conference organizer has to pay me.
Stephen Arnold, May 21, 2008
Payola Pony or Nerd Stallion: Who Will Win the Search Derby?
May 21, 2008
The software maker plans on Wednesday to launch a cash back program to those who buy things after using its search.
Silobreaker: When Intelligence Officers Solve Their Own Info Problems
May 20, 2008
“The Holy Grail”, one former intelligence officer told me, “is to walk in my office and have what I need on my desk, on the computer monitor, and on the screen of my secure telephone.” (You can recognize these whizzy mobile phones because some have an extra light and other features to make it hard for the bad guy to listen in on the call.)
I forget that most people in the online business don’t have experience working in intelligence, the military and law enforcement. When I see an allegedly “hot new semantic search system”, I often take a cursory look and then walk on by. The reason is that the idea of searching is not where the action is for serious intelligence.
If you do a search on Mother Google, you will find more than 300,000 references to the company. To give you a benchmark, if you search for this Web log, you get about 230,000 references with most of them to a search engine optimization company with the same name. The point is that certain services or resources, no matter how useful, are tough to find unless you know exactly what to enter in the search box.
Let me illustrate. Here’s a screen shot of a system that has been available for several years.
The query “semantic search” returned a main story, secondary items in smaller “newspaper” style boxes, an embedded live video from CeBIT, a bar chart about term frequency, and an “In Focus” section that provides the names of people and things the Silobreaker system identified as important. (If you look at the people in the “In Focus” box, you’ll see me (Stephen Arnold) identified despite my <230,000 Web log references in Google.)
Notice that Silobreaker’s default display is a report. The system delivers a synthesis of what’s important. There’s no result list. No single graphic gizmo floating in the browser without meaningful context. Silobreaker looks great but it contains a significant amount of go juice. Navigate here to explore the system yourself.
Silobreaker doesn’t do plain vanilla laundry lists. You can see a list of documents, but you see them in context; that is, a specific knowledge setting. You don’t have to ask, “What the heck does that mean?” Silobreaker presents the meaning of each item in a display.
Most of the search systems I see or get asked to review don’t do what I need done. I want to comment on a basic Silobreaker output and point out a few facts about the system. Once that housekeeping is done, I will make several observations in an effort to spark discussion about the sorry state of enterprise search and commercial business intelligence systems. For a reader who finds my criticism of the best that Silicon Valley has to offer offensive, stop reading now. If you want to see where the rubber meets the race track in the intelligence community, keep reading. Read more
Microsoft to ‘Innovate and Disrupt in Search’–Again
May 19, 2008
My newsreader popped this info tart in front of me this morning: “Kevin Johnson’s Memo On Yahoo & Their Strategy”. The focus of Gigaom’s Web log post is a memo, allegedly by Kevin Johnson. By the time you read this, my pathetic posting will be very old news. You need to read the memo and determine for yourself it it’s the real deal.
I’m commenting because of a series of emails I exchanged this morning about Microsoft’s search strategy. Among the points I made to the eager journalist who was, as my mother used to say, an empty vessel:
- Microsoft is implementing reactions, not a strategy. The cause of these knee-jerk reactions: mostly the Google and a business model challenge. Cloud services are coming round the mountain, and Microsoft can hear the whistle blowing.
- Yahoo has some sharp people and a truck load of search systems–Inktomi, Stata Labs, AllTheWeb.com (provided by Fast Search & Transfer), Flickr’s system, Overture’s search, and more). I’ve been told the company is rushing to be more like Google, which is not perfect, obviously. But Yahoo is grossly heterogeneous, and Google is more homogeneous in architecture.
- Google keeps on grinding forward. In Israel a day ago, Mr. Brin referenced Google’s multi dimensional database progress. My sources tell me that it is not progress; it is a leap frog play.
So “innovate and disrupt in search” is going to boil down to tackling these problems, forcefully, squarely, and well.
First, how many search platforms will Microsoft support? SharePoint, whizzy technology from Microsoft Research, Fast Search & Transfer’s ESP, and the legacy systems that just won’t die. Each search platform is a money hog. Get too many of these critters chomping on the cash, and you will be one poor data farmer.
Second, if–and this is a big if–Microsoft cuts a deal with Yahoo, exactly how will two shot up World War I biplanes contend with Google’s F-35? Time is running out because the GOOG keeps gobbling market and mind share. It is the number one site on the Internet and the world’s top brand. Quite a one-two punch for piston powered aircraft to shoot down.
Third, Google’s business model is based on advertising. Google wants to diversify, and Mr. Brin’s comments in Israel a day ago suggest that he wants to put a rocket booster on Google Apps. Interest in cloud-based services continues to creep up, and Google is in a good position to innovate and disrupt in that sector. The company already is innovating and disrupting in search.
We’re watching a clash of cultures and business models. When Microsoft swizzled IBM in the 1980s, it was clever. Google’s not just clever; Google has the technical platform to redefine search and enterprise applications.
Mr. Johnson’s memo does little to convince me that Microsoft–with or without Yahoo–can do much to stop Googzilla from doing Googzilla-type things.
Stephen Arnold, May 19, 2008
Semantra and Conversational Analytics
May 15, 2008
Semantra asserts that it is a “pioneering developer of conversational analystics software”, or so it says in the news release a helpful person sent me.
The companies “conversational analytics” application pushes “beyond key word search” because a user can use “common language commands to retrieve specific information from back end databases”. You can read the Semantra announcement here: www.semantra.com/library/Semantra%202.0%20GA%20FINAL.pdf
The lingo “common language commands” means natural language processing or NLP. A number of vendors have embraced this approach in order to [a] eliminate the need for a specialist to intermediate between an enduser with a question and the database with the answers and [b] allow faster interaction with a database. After all, in business intelligence, the idea is to get the information quickly. Calling up an SAS or SPSS analyst, having that person understand what’s needed, creating the queries, pulling down the data cube, and providing that chunk of info to a manager on a deadline is generally viewed as a problem.
What’s interesting about the Semantra approach is that its tool is designed for Microsoft Dynamics CRM. Microsoft’s push into CRM or customer relationship management has been erratic. To make the situation more interesting, Microsoft is working to move Dynamics (an unhappy amalgam of several products) into the Live.com or “cloud” environment. Semantra is hoping that Microsoft’s CRM offerings will generate even greater demand for third-party tools that tame the Dynamics’ beastie.
ArnoldIT.com analyzed the Dynamics product and technology late in 2007 and found that it was even more complex than Microsoft SharePoint Search. Given the multiple products that make up SharePoint Search, we were surprised to find that the Dynamics team had bested the SharePoint team on this important yardstick. The Dynamics product line up consists of Microsoft’s own technology, Axapta, Great Plains, Navision, and Solomon components. These are mixed-and-matched into a somewhat complex suite of products.
We wish Semantra great success with their system. There will be strong demand for a product that can simplify the Microsoft CRM system. You can get more information about Semantra at wwwsemantra.com. The splash page for Microsoft Dynamics is at www.microsoft.com/dynamics. If you are interested in the ArnoldIT.com analysis of the Dynamics suite, contact seaky2000 at Yahoo dot com. The report costs US$125 via online payment for a password protected PDF.
Stephen Arnold, May 15, 2008