Wikia Search: Social Search Is Blooming
June 4, 2008
I haven’t done much thinking about social search. Years ago when I saw a demonstration of Eurekster, now Euereksterswicki. I thought sites suggested by users was interesting. As the Internet expanded, a small collection of recommended sites would be useful. We built Point (Top 5% of the Internet) in 1993, eventually selling the property to CMGI’s Lycos unit. Social search was a variation on Point without the human editorial staff we relied upon 15 years ago.
Wikia: User-Modifiable Results
The big news in the last 24 hours is the sprucing up of the Wikia Search system. The venture is a result of Jim Wales’ creative nature. If you have not tried the system, navigate here and fire several queries at the system. It’s much more comprehensive than the system I tested several months ago. I still like the happy cloud logo.
I ran the query “enterprise search” on the system. The result was a pointer to Northern Light. The second result was a pointer to the enterprise search entry in Wikipedia. So far so good. What sets Wikia apart is that I can use an in-browser editing function to change a hit’s title. I can also move results up and down the page. I can see how that would be useful, but I save interesting hits to a folder. I then return to these saved files and conduct more in-depth investigations. So, the system generates results that are useful to me, contains a dollop of community functionality, and sports a larger index. You can read more about the system on Webware.com, which has a useful description of the service here.
Vivisimo’s Social Search
In New York at the Enterprise Search Summit, someone asked me, “Have you seen Vivisimo’s new social search system?” My answer was, “No, I don’t know much about it.” When I returned to my office, I have a link to Vivisimo’s explanation of social search. Vivisimo announced this function in October 2007, and I think that the catchphrase hooked some people at the New York show, and You can read the announcement here.
The point that resonated with me is:
Enabling users to vote on, rate, tag, save and share content within the search interface is just the first step in creating a collaborative information-enriching environment. Velocity 6.0 allows users to add their own knowledge about information found via search directly into the search result itself in the form of free-text annotation.
In this context, social search means that I can add key words or tags to an item processed by Vivisimo. The term is added to the index. If I provide that term to a colleague, the index term can be used to retrieve the document. An interactive tagging feature is useful, but it was not the type of functionality that I use. Others may find the feature exactly what is needed to make behind-the-firewall search less frustrating.
![]() |
![]() |
Social search taps into the wisdom of crowds. Some crowds are calm, even thoughtful. Others can be a management opportunity.
Baynote
Today I received an email from a colleague asking, “Did you see the social search study published by Baynote, Inc. Once again, the answer was, “No, I don’t think so.” I clicked on a link and went through a registration process (easily spoofed) to download PDF of the six-page report.
Baynote is a company specializing in “on demand recommendations and social search for Web sites.” You can explore the company’s Web site here. I didn’t read the verbiage on the Web page. I clicked in the search box and entered my favorite test query, beyond search. No joy The three hits were to information about Baynote. (The phrase beyond search sent to Clusty.com delivers a nice link to this Web log, however.)
I clicked back to the PDF report and scanned it. The main idea I garnered from the white paper is:
Baynote combines a site’s existing search engine results with community wisdom to produce a set of optimized results that is proven to yield greater conversions, longer engagement, and improved satisfaction. Thus, Social Search can be thought of as a community layer on top of the site’s existing search engine. The original search results may be re-ordered in the process, and the augmented results may include additional results that weren’t originally produced by the search engine, but proven to be valuable to your Web site visitors. Because Baynote is delivered as SAAS (software as a service), it can be live on a Web site in as little as 30 days with little or no development, installation or configuration.
If you have an existing search system, you can use Baynote as an add-on. With minimal hassle, you can rank results using the Baynote algorithms, monitor user behavior to shape search results, generate See Also references, and merge results from different collections.
I’m going to update my mental inventory about search, adding social search to list of search types that I lug around in my head.
Observations
I do have reservations about social anything. I’m 85 percent convinced that the Vivisimo and Baynote approaches have merit. But I want to end this short item with these observations:
- Social anything can be spoofed. When I visited Los Alamos National Labs, people with access to the facility fiddled with hard drives and other digital assets. If this stuff can happen at a security-conscious facility, imagine what a summer intern can do with social search in your organization.
- Users often have very good ideas about content. Other users have very bad ideas about content. When there are lots of clicks, then the likelihood of finding something useful edges up. The usefulness of Delicious and StumbleUpon are evidence of this. However, when there are comparatively few clicks, I’m inclined to exercise some extra caution. Tina in the mail room is a great person, but I’m not sure I trust her judgment on the emergency core cooling system schematics.
- The lightweight approach to tagging is not going to yield the type of information that a system like Tacit Software’s provides. If you want social, then take a look at Tacit’s Active Net system here.
- My hunch is that nearly invisible monitoring systems will yield more, higher quality insights about information. In some of my work, I’ve had access to outputs of surveillance systems. The data are often quite useful and generally bias-free. Human systems have humanity’s fingerprints on the data, which can obscure some important items.
Social search can be quite useful. Its precepts work quite well in high traffic environments. In more click sparse environments, a different type of tool is required to ferret out the important people and information.
Stephen Arnold, June 4, 2008
Changes in Store for Microsoft Live.com Search
June 4, 2008
Jessica Mintz filed an information-charged story on June 3, 2008. Titled “Microsoft Exec Says Live Search Needs Image Fix.” Please, read her article here. These AP pieces have wacky urls and can be tough to find a day or two after the stories appear.
There were several points in her write up of Kevin Johnson’s talk at a conference operated by Third Door Media. Mr. Johnson is the president of Microsoft’s platforms and services division, and he is one of Microsoft’s top dogs in the search-and-retrieval sector.
The points that struck me as particularly important were:
- There is brand confusion. A fix or a change may be in the cards
- Microsoft is working to convince stakeholders that it has a plan in the aftermath of the bolloxed Yahoo deal
- Microsoft is focusing on “commercial intent queries”, which I think means buying something.
What’s tough for me to convey in this short commentary is the tone of Mr. Johnson’s remarks. For some reason, I heard this highly-paid wizard expressing himself with a tinge of frustration.
Google’s been chugging along for a decade, and the company shows few signs of losing steam. When old Google wizards become Xooglers, young wizards take a close look at Google as the equivalent of a cow being stamped “Grade A Prime”. Legal documents hurled at Google have done little to slow the GOOG.
With Microsoft’s Web search market share sliding, maybe I am reading emotion into Mr. Mintz’s summary of Mr. Johnson’s remarks. Check out the story and let me know if you agree or disagree. Try to locate this story using http://search.live.com. When I checked, the story wasn’t in the Live.com index. The GOOG had indexed it. I think Google makes an effort to index Microsoft-related stories. What do you think?
Stephen Arnold, June 4, 2008
How Much Info Is There? The Answer Is Coming
June 3, 2008
A happy quack to the colleague who sent me the link to this story: “Groundbreaking UC San Diego Research Study to Measure ‘How Much Information?’ Is in the World”. You can read this story here.
What hopped off the screen was this statement:
We have designed this research as a partnership between industry and academics to take the next steps in understanding how to think about, measure, and understand the implications of dramatic growth in digital information,” said Professor Roger Bohn of UC San Diego, co-leader of the new program. “As the costs per byte of creating, storing, and moving data fall, the amounts rise exponentially. We know that overall information technology increases productivity and human welfare, but not all information is equally valuable.”
Wizards from many high-profile organizations will work to answer this question. In the meantime, I’ll keeping upgrading my storage devices and parking data on cloud storage services. My data grows 2X each year. I wonder how much data my neighbor’s 14-year-old video music collector stores. I’m certain he’ll provide hard data. Maybe it will be easier to ask his parents. Neither uses a computer. Also, I bet the folks in Brazil, China, India, and Thailand, among other data centric countries will be particularly forthcoming.
I’m looking forward to the results of this study.
Stephen Arnold, June 4, 2008
Coveo: Beyond a Billion Documents
June 3, 2008
Most licensees of enterprise search systems don’t know how many documents the system must index. Coveo can handle more than 1,000,000,000 documents.
Even fewer search system licensees know that many enterprise search systems have hard limits on how many documents a system can index before choking, sometimes expiring without warning. For example, Microsoft SharePoint has a hard limit significantly below the Coveo billion document target. Microsoft acquired Fast Search & Transfer, in part, to have a work around for this scaling problem.
Coveo’s G2B Information Access solutions deliver security, relevant results, and very strong ease of use. You can “snap in” Coveo to SharePoint, Documentum, and IBM FileNet environments without custom coding. For more information, navigate to the Coveo Web site. A free trial is available.
Stephen Arnold, June 3, 2008
Search: Habits vs Environments
June 2, 2008
In 1980, when you launched the Dialog Information Service search function, the system dumped you into a database about education. From that starting point, you entered a file number. Experienced searchers memorized file numbers; type b 15 and you would be “in” the ABI / INFORM business information file. Type b 16 and you would be able to search PROMT, a collection of business data. Dialog never saw bulletin board systems or the Internet coming.
People fortunate enough to have the money and technical savvy could become online searchers. The technology was sufficiently clumsy and the entire process so unfamiliar to most people as to make online searching an arcane art. Searching in those early days was handled by an intermediary. When I first learned about online databases at Booz, Allen & Hamilton in 1976, the intermediary was the New York office’s boss. I would call the intermediary, explain what I needed, provide a project number, and pick up the outputs on weird thermal paper later that day. As clumsy and expensive as the process was, it was more efficient than doing research with paper journals, printed books, and the horrific microfilm.
By 1983, Dialog had found a market for its mainframe-based search system–librarians. Librarians had two characteristics that MBAs, lawyers, and folks trained in brochure making lacked. First, librarians chose a discipline that required an ability to think about categories. Librarians also understood the importance of having a standard way to identify authors, titles, and subjects.
Second, librarians had a budget to meet the needs of people described as an “end user”. Some of my Booz, Allen colleagues would rush into our corporate library and demand, “Give me everything on ECCS!”
The approach taken by Systems Development (SDC Orbit), BRS (Bibliographic Retrieval Service), DataStar, and the handful of other online vendors was monetized in clever ways. First, a company would pay money to sign up to get a password. Second, the company would send the librarian to training programs. Most programs were free and taught tips and tricks to tame the naked command line. No graphical user interface.
You had to memorize command strings like this one.SS UD=9999 and CC=76?. The system then spit out the most recent records about marketing. The key point is not the complexity. The point is that you had to form specific habits to make the system work. Make an error and the system would deliver nothing useful. Search and retrieval was part puzzle, part programming, and part memorization. At the time, I believed that these habits would be difficult to break. I think the vendors saw their users as hooked on online in the way a life long smoker is hooked on nicotine.
The vendors were wrong. The “habit” was not a habit. The systems were confining, hellishly expensive, and complicated to such a degree that change was hard for vendor. Change for the people who knew how to search was easy. The automatic behavior that worked so well in 1980 began to erode when PCs became available. When the first browser became available, the old solid gold revenue streams started to slip. The intermediaries who controlled online were disintermediated. The stage was set for the Internet, lowest-common-denominator searching, and graphical interfaces. The Internet offered useful information for free. I have dozens of examples of online budgets slashed or eliminated because neither the vendor nor the information professional could explain the value of online information. A visible, direct cost with no proof of payback crippled the original online industry. Many of the companies continue to hang on today, but these firms are in a race that is debilitating. Weaker companies in the commercial database business will find survival more and more difficult.
The notion of online habits persists. There’s a view that once a user has learned one way to perform an online or digital task, it’s game over for competitors. That’s not true. New customer constituencies come into being, and the people skilled in complex, specialized systems can carve out a niche. But hockey stick growth and fat margins are increasingly unlikely for traditional information companies.
Google Tells Everyone: We Are Human
June 2, 2008
Techmeme has a link to the New York Times’ story “The Human Hands behind the Google Money Machine”. There’s also a link to the useful commentary by Henry Blodget, Silicon Valley Insider. By the time you read this, the comments and analyses of Google’s summer openness will be one of day’s key stories.
Last week there were the interviews and postings about Google I/O conference for developers. The best summary I’ve seen is by CNet. Stephen Shankland’s “We’re All Guinea Pigs in Google’s Search Experiment” and his “Google Spotlights Data Center Inner Workings.” Anand Rajaraman provided a technically-significant scoop about Google’s reluctance to rely exclusively on autonomous software. My post is here. The Datawocky piece is here. (I’ve heard that some Googlers call the Google infrastructure “the borg”.)
The flow of information is useful. As I thought about stream of information, I forced myself to step back and ask, “Why now?” Google has never been particularly forthcoming, and its public-facing representatives “run the game plan”. If you haven’t heard that phrase, it means, “Stick to the script.” At conferences, I’ve watched Googlers thrust into a presentation at the last minute struggle through the script.
Here are my thoughts about this new direction:
- The Google sees an opportunity to position itself a thoughtful leader. The emphasis on people shifts the discussion from monitoring clicks and algorithms to people who think about the implications of technology and market needs.
- The messages focus on what Google is doing. The examples say to me, “Hey, guys, we’re doing these things now.” For a competitor, the positioning of activities as actions based on what’s in place may be chilling. It begs the question, “What’s next?”
- Google is maturing, and its management is confident that messages for users, developers, advertisers, and competitors will increase Google’s presence in the market.
What do you think is behind this new transparency? It’s visible in Eric Schmidt’s remarks about mobile advertising , reported by Seeking Alpha, and his earlier comment in the U.K. Telegraph that Google’s founders have grown up. You can read this story here. and enjoy its now-obligatory picture of Messrs. Brin and Page lounging on some of Google’s signature fluffy furniture.
My take is that Google’s management is not behaving in a spontaneous manner. Just as a series of steps makes an algorithm work, this flood of information has my radar oscilloscope flickering. I think the mathematical logic so prized at Google is at work. I’m watching for signs of a big event in the Googlesphere. Semantic Web? Data management? Major buyout close to completion? Maybe.
Controlled transparency is a signal, not an end in itself.
Stephen Arnold, June 2, 2008
IBM: Watching Cloud Patterns
June 1, 2008
Last week, IBM announced a cloud-based, software as a service initiative. IBM has partners in this venture, which appears to focus on the insurance niche. The announcement appeared in a news release, and you can read it here.
IBM has teamed with Millbrook, Inc., whose core business is software integration for the insurance industry. Another party to the deal is Sapiens America Corp. Sapiens (whose corporate family tree is pretty complicated) is another specialist with a core competency in property and casualty.
IBM will use its Cognos 8 Business Intelligence system and the Sapiens Insight software. Both systems will make use of the Millbrook property and casualty model.
The idea is that small- and mid-sized insurance agencies will be have access to industrial-strength business intelligence systems without any on premises software. The three companies said in their release:
Business intelligence and predictive analytics tools are becoming the strategic mainstay of how service enterprises in general, and insurance carriers in particular, conduct their daily business. Companies that have near real-time ability to analyze the entirety of their captured business data and extract key performance indicators and accurate answers to “what-if” scenarios can be more responsive to a rapidly evolving business environment and can competitively maximize profitable operations while moving away from risky propositions.
The announcement struck me as significant step for IBM. IBM has been a player in online and cloud-based services for quite a while. In the late 1990s, IBM Global Network ramped up as an Internet service provider eventually selling that business to AT&T. IBM made some noise several years ago about its grid computing capability. Its AlphaWorks initiative has pushed cloud computing as well. Now IBM is testing the water for niche-focused SaaS or Software as a Service. IBM and its new pal Google are working cooperatively on an educational project to stimulate the flow of programmers with expertise in writing programs for distributed systems.
My thought is that this SaaS warrants observation. On paper and in white board “what if” sessions, IBM could deploy a number its software systems as cloud-based services. The question is, “What’s next in online services for IBM?” Will IBM, like Google, sit on the sidelines and watch Amazon.com, Salesforce.com, and other companies push this market forward?
Stephen Arnold, June 1, 2008
Related story from InfoWorld here.
IN-Q-TEL Investments: 2006 to April 2008
June 1, 2008
This table brings the summary of IN-Q-TEL investments through April 2008. You can access the investments from 2000 to 2003 here. The investments from 2004 and 2005 are here.
IN-Q-TEL Investments: 2004-2005
May 31, 2008
I’m delighted with the response to my table and links of IN-Q-TEL’s investments up to 2002. If you want to review this information, click here. In this essay, I want to provide the list of companies receiving funding in the two year period from 2004 to 2005. As one of the people reviewing my list pointed out, there are some companies associated with IN-Q-TEL that do not appear in my table. My source is the publicly-accessible information on the IN-Q-TEL Web site. If you know of an investment that I have omitted, please, use the comments section of this Web log to share your information. I appreciate the numerous suggestions to make the list more useful. There is a limit to what we have time to assemble for a no-cost information resource. Please, tell me what you think would improve the utility of the list. If it’s light weight, then I will consider altering the basic information in the table. The table appears after the jump.
Fast Financials: Three Day Old Fish Should Be Discounted
May 30, 2008
You may want to download the revised financials that are available today–May 30, 2008–on the Fast Search & Transfer Web site here. Information that I recall seeing on various Web sites is either no longer available or I lack the skills to locate the data. Mary-Jo Foley in her All about Microsoft Web log wrote a useful description of the implications of the deal when it was first announced. You can read this story here.
Some Fast Search corporate and general business information has been deleted because it was old or because it was deemed no longer of interest. Fortunately, I have a habit of downloading interesting documents when I first see them. Fast Search information is tough to locate using public Web sites for some reason. You can get these PDF documents directly from http://www.newsweb.no/index.jsp?messageId=209172. The explicit link from the Fast Search Web site with the pointer is here: http://www.fastsearch.com/news.aspx?m=329. Note: I am reluctant to post these documents because I am not certain of the Norwegian guidelines for this type of information.
A screen shot of the restated FY2007 data. I used this information plus the data in the FY2006 restated financials to make the table of numbers below.
A Walk Through
Fast Search’s top line revenues for the period from 2004 to 2007 are now reported as increasing from $66.4 million in 2004 to $143.0 million in 2007. That’s a jump of 115.4 percent. In the search engine game, the increase is good, but it does not match Google’s performance with its Google Search Appliance in the same period. Google went from zero revenue in 2004 to an estimate $400 million in the same period. (Note: that Google reported $188 million for its enterprise unit, but I have calculated monies from its educational initiative, maps, and partner contributions in the form of sign up fees, among other enterprise revenue flows.)
Year | Revenue (Restated) | Original Revenue Statement |
2004 |
$66,374,000 |
$66.300,000 |
2005 |
$98,069,000 |
$100,300,000 |
2006 |
$133,741,000 |
$162,200,000 |
2007 |
$142,979,000 |
n.a. |
Nothing too dramatic in this run down except the sharp decrease in FY2006 numbers. But what’s $30 million in today’s loosey goosey financial world? However, when you look at the Fast Search restatements in terms of revenue, I found the losses interesting.
A Warning Signal from Fast Search
I have a copy of the Fast Search & Transfer Mid Quarter Presentation by Joseph J. Lacson, dated December 2006. That document has some optimistic comments about Fast Search’s opportunities. The presentation is no longer available on the Fast Search Web site, but I have made a couple of screen shots from the presentation to give you a sense of what caught my attention. (Since the document is no longer available on the Web, you may want to skip my discussion of this information. I wish I could provide a link to the full document, but I don’t have permission to do that. I wrote Fast Search’s PR department, but I haven’t heard anything from them.)