Microsoft Hosted Exchange Security and Archiving

July 6, 2008

I lost track of Microsoft’s 2005 acquisition, FrontBridge. The company, as you may recall, was a provider of comprehensive secure messaging services. FrontBridge’s “Total Message Management” services ensure the security, compliance and continuity of electronic messages. The system provided managed services for email and instant message archiving, spam filtering, virus scanning, encrypted email, policy enforcement and disaster recovery.

I had in my files a schematic that shows the FrontBridge architecture, which remains largely intact within the hosted Exchange service.

frontbridge diagram

With search vendors morphing into eDiscovery, you may want to update your links to Microsoft’s Exchange Hosted Services, where FrontBridge plays a role. You can find the start page here. At the time of the acquisition, I understood that FrontBridge would be a Microsoft subsidiary. By 2006, FrontBridge became part of the Exchange product. The original 2006 pricing for email and message filtering was pegged at at $1.75 per user, per month; Archiving at $17.25 per user, per month with an unlimited retention period and 3.6 gigabytes of storage; Continuity at $2.50 per user, per month; and Encryption at $1.90 per user, per month. I am not sure how the pricing operates as Microsoft evolves its hosted services.

As you try to determine the value of licensing a third-party secure messaging service or use the hosted Exchange solution, you may find the diagram useful in getting your bearings.

Stephen Arnold, July 6, 2008

SurfRay AB Update

July 6, 2008

In the first two editions of Enterprise Search Report, I profiled Mondosoft, a Microsoft-centric search system. I gave it a favorable review. Like most Microsoft-centric products, unless properly resourced, performance can become an issue. By the time I started work on the third edition of ESR in 2006, I had heard rumors of some changes underway at the firm. By late 2007, Mondosoft became part of SurfRay, a Danish search conglomerate. I found the search system implemented for the Vatican quite interesting. Hit boosting and multi-lingual support added zest to what could have been a sinfully bad (no pun intended) search experience. You can try it here.

In 2004, Mondosoft caught my attention because it was one of the first search vendors to offer analytics for licensees. Mondosoft, when deployed in a SharePoint environment, brought much needed usage data into the SharePoint picture. Instead of flying blind, Mondosoft gave the system administrator useful information about user actions. With Mondosoft’s analytics, SharePoint sites could be tuned to improve the user’s experience. Microsoft talked about SharePoint user experience; Mondosoft delivered technology that addressed user experience.

Mondosoft then acquired Ontolica, a company that made better use of SharePoint metadata and generated other useful tags. With Ontolica 3.2 installed and properly resourced, a SharePoint administrator could provide a useful set of hot links related to the user’s query. Microsoft delivered a blunt instrument; Ontolica provided a precision tool.

SurfRay’s product line includes an advanced, multi-lingual search engine suite with three components [a] MondoSearch,  [b] BehaviorTracking, and [c] InformationManager, SurfRay’s Speed Index search and retrieval system, and Ontolica  Search for SharePoint, providing business intelligence on information creation, search, retrieval and use. SurfRay also owns technology that can speed up searches of traditional relational database tables. In addition, SurfRay provides consulting services to its licensees. Plus, the company offers SurfRay XP search for Xerox’s multifunction document systems.

SurfRay/Mondosoft customers include Bosch, Burger King Corporation, Coleman. Hilton Hotels, Honeywell Process Solutions, Microsoft, Overnight Transportation, People’s Bank, Shell Oil, Siemens, SimCorp, The Swiss Army, TDC, The Vatican Holy See and United Technologies. SurfRay’s CEO and founder is Martin Veise. The president of the company is Steffen Saxil.

SurfRay has offices in New York, Stockholm, Bangkok and Copenhagen. You can learn more about the company here.

Stephen Arnold, July 6, 2008

SeeWhy: Real Time Business Intelligence without Search

July 6, 2008

SeeWhy came on my radar with its “no search” marketing angle. I poked around and was, at first, confused. The company appeared to occupy a no-man’s-land between search engine optimization and business intelligence that I avoid. A quick look revealed that the company has a business event system with some interesting twists.

Real Time and My Concern with the Phrase

“Real time” has been promoted from technical impossibility to buzz word. The general notion of “real time” among computer scientists is that simultaneity across linked systems is impossible outside of the bizarre world of high-energy physics. No matter how minute, latencies exists even if measured in picoseconds. But to a marketer, “real time” connotes a software, gentler world far from the “batch oriented” or human-intermediated world familiar to most professionals.

Now, real time is coming to the enterprise. Exegy, based in St. Louis, Missouri, offers an appliance that can ingest content by the megabyte per second and spit out processed content without much latency. To achieve this, Exegy has done some hardware engineering, but the gizmo works. When you shift to “real time” in the types of server environments found in a trucking company or a consulting company where capital investment is mostly out of the question, “real time” is not in Exegy’s league.

Let me be clear: to deliver near real time content processing Exegy style, you need specialized infrastructure. The average Dell server is not able to deliver no matter how insistent Bill Trucking Company’s information technology consultant becomes.

A number of text and content processing companies are asserting that their systems operate in “real time”. They don’t. Against this background, let’s look at one interesting company. I will not comment on this firm’s emphasis on real time processing, preferring to provide some basic information about this single firm and then offering, as a wrap up, a handful of generalized observations.

SeeWhy Software: Operational Business Intelligence

SeeWhy is one of the ?rst “open source” real time Business Intelligence platform for the event driven enterprise. SeeWhy continuously analyzes and interprets streams of individual business events, to alert you immediately to opportunities and risks and enable everyday decisions to be automated.

basic idea

The marketing angle that snared my attention.

This company Incorporated in 2003 by BI industry veteran Charles Nicholls, SeeWhy is backed by several venture capital investors, including LogiSpring, Pentech Ventures, Delta Partners and  handful of private folks. SeeWhy is headquartered in Windsor, United Kingdom.

The Charles Nicholls, founder and CEO, said here:

I began to ponder on the Business Intelligence industry with all its unfulfilled promise, often long on vision and short on delivery. The more that you challenge the status quo, the faster that you can see the opportunities to make the world a better place. It was this process that started me on a journey that led inevitably to create SeeWhy.

The basic premise of the company is summarized in this diagram from “In Search of Insight,” a 43 page document from Mr. Nicholls:

bi 2

The Web 2.0 Angle

You can download a monograph “In Search of Insight” about the company’s approach to business intelligence here, no annoying registration, thank you, SeeWhy.

Read more

Email Analysis

July 5, 2008

This summer I have been asked about email analysis on two different occasions. In order to respond to these requests, I had to grind through my archive of email-related information. I wrote about Clearwell Systems and its approach earlier this year. You can read this essay here.

I cannot reproduce the information my paying customers received. I can take a representative company–in this case, Stratify, a unit of Iron Mountain–and show you two different screen shots. These layouts and representations are the property of Stratify, and I am including them in this essay for two reasons:

  1. Stratify has been one of the early players in text analytics. First as Purple Yogi and then as Stratify, the company was engaged in the difficult missionary marketing needed to make non believers into believers
  2. The company has gained some traction in the legal market, which in the US, is a booming sector. The problems of the economy translate into a harvest of riches for some legal firms. Email is a big deal in discovery, and few have the resources to get a human to read all the baloney that zooms around an organization involved in a legal matter.

The Problem

You know the problem. Email was once ASCII shot between two people on Arpanet. Today email is the bane of the knowledge worker. The volume is high. The storage systems antiquated. The attachments madden the sane. The people using email forget that the messages live on different servers and can, in the process of discovery, be copied to a storage device and delivered to the attorney or attorneys who have to find something germane to the legal matter in the terabytes of digital data.

To summarize the challenges:

  • Email volume (lots of it, maybe a billion messages in a mid-sized organization every year)
  • Email attachments (tough to find the “right” one)
  • Email crashes (restores don’t always work, which you probably know first hand)
  • Email sent as if it were a one-time, secret communication
  • Email with recipients who, by definition, have some relationship.

For a lawyer, email is good and bad. It’s good if one finds a smoking gun or better yet a gun in the act of shooting. It’s bad if the bullets are coming at the opposing side’s legal eagles, worse if the bullet shoots a legal eagle out of the sky with a slug through the brain.

Ergo: email is a big, big deal in the information world of litigation.

The Solution

The fix is obvious–search. Actually to be precise, the conundrums of email invite text processing, text analytics, link analysis, relationship extraction, entity extraction, and other nifty methods.

The basics of email analysis are actually simple on the surface, more complicated under the hood and out of sight of non-technical types like lawyers: [a] copy email to a storage device that is fast, [b] tell email analysis program to index the email, [c] key word search or browse outputs, [d] make notes, print out email, and read individual documents of interest, [e] repeat taking care to bill for the time. (That’s the best part of email analysis. It’s quicker than manual methods, but the systems have to have a baby sitter. Those operating these systems can bill without working up too much of mental headache. Automated processes do make some legal thinking less painful. The best part is billing for this less stressful time.)

What do these systems show the user? The illustration below shows a Stratify search screen. Since I obtained this screen shot, Stratify has probably updated the interface. The main features are our interest. Take a look at what the Stratify system user sees when analyzing processed email:

stratify email analytics

Stratify’s email visualization

The principal features of this display are:

  1. Simplicity. You don’t want to confuse attorneys
  2. A picture showing people and their relationships as discerned by the system. Remember, an email can be sent to a person unrelated to a subject either by accident or for some other reason such as an “this is what I am doing” courtesy
  3. Links on the right hand panel to make it easy for the user to poke around by sender, topic, etc.

Let’s assume that the email is one part of a discovered collection of information. Stratify provides a richer interface. This one includes the bells and whistles that warrants the Stratify system price tag which is in six figures in case you want to license the system.

Read more

Fast Cash, Faster Crash

July 4, 2008

On July 3, 2008, Erick Schonfeld summarized the continuing saga of Fast Search & Transfer’s fastest move ever. The story “Did the Enron of Norway Pull a Fast One on Microsoft? More Details about the Mess at Fast Search $ Transfer? is here.

The story is quite thorough, according to my sources in Norway, and there is little I can add to the TechCrunch write up.

I would like to highlight one point, provide the links to my analysis of the Fast Search saga, and offer several observations about the nature of enterprise search. Before I start, take a look at this graphic because this is the wild bobsled ride that many vendors are queued to take:

bobsled fixed 01

Once a vendor starts down the sales bobsled run, it is tough to stop. The vendor has to ride to the bottom of the hill, hoping that he will not crash, rising serious injury and maybe death.

The Key Point for Me

After reading the TechCrunch essay, one segment gnawed at me; specifically:

…It [Microsoft’s paying $1.2 billion for Fast Search & Transfer] does point to a certain blindness on the part of Microsoft, or at least a willingness to look the other way, in its obsessive quest to become a player in search (see Yahoo and Powerset). It also raises questions about Fast’s underlying search technology. If Fast was having trouble closing deals for its products, how good can its technology really be?

Yes, this is the key question. The Fast Search & Transfer core technology was purpose built to index static Web sites. At the time Google started operations, AltaVista.com was an orphan, quickly losing its leadership position due to the voracious demand for resources that public Web search engines demand. The mantra is “Feed me computing resources or dies”.

Fast Search offered a Web site called AllTheWeb.com, and it was pretty good. At the time of 9/11, the AllTheWeb.com news indexing system was among the first to have reasonably timely information. Fast Search made a fateful decision in 2002 which led to Fast Search & Transfer’s exiting the Web indexing business. Fast Search sold its Web indexing business to Overture for $70 million with more money promised if certain goals were achieved. Fast Search took the money and focused on enterprise search.

The decision, as I recall my conversations with Fast Search & Transfer executives, when I was involved in the Fast Search deployment for a government project was that enterprise search was a great opportunity. Fast Search’s executives suggested to me that the company could move quickly to dominate the search market. At the time, there was little reason to doubt the confidence of the Fast Search team. A Fortune 50 was backing the Fast Search system in the government-wide indexing program. In the 2002-2003 time period, there were not too many systems that could demonstrate an index of 40 million documents. Even today, licensees of search systems do not grasp the hurdles that indexing large amounts of text puts in front of an organization. I have written extensively about this elsewhere, and I have little to add to the ignorance about search scaling that continues to plague organizations.

Read more

Google Working on Dynamic Runtime

July 3, 2008

A colleague called to my attention Microsoft wizard James Hamilton’s post about a possible Google initiative. You can read the full note here. For me the most interesting point in the note was:

…The popular speculation is that Google will be announcing a dynamic language runtime with support for Python, JavaScript, and Java. A language runtime running on both server-side and client-side with support for a broad range of client devices including mobile phones would be pretty interesting.

Why is this important? More flexibility for developers. Google’s programming innovations continue to percolate.

Stephen Arnold, July 3, 2008

Yahoo’s Semantic Search Still Available

July 3, 2008

In the firestorm of publicity burning through blogland, Yahoo’s semantic search system has been marginalized. I admit, the url is not the easiest to remember: http://www.yr-bcn.es/demos/microsearch/. The moniker Microsearch seems to be intended to tell the astute user that Yahoo processes microformat information. A microformat is a Web-based data formatting approach that seeks to re-use existing content as metadata.

The site is labeled a demonstration, and the Yahoo logo is visible in a funereal black, which I quite like. The service is called Microsearch. The system supports supports RDFa marked-up pages plus some other semantic formats. Yahoo says:

Microsearch is a richer search experience combining traditional search results with metadata extracted from web [sic] pages. At the moment your Yahoo! Search is enriched in three ways: [a] by showing ‘smart’ snippets that summarize the metadata inside the page and allow to take action without actually visiting the page; [b] by showing map and timeline views that aggregate metadata from various pages, [c] by showing pages related to the current result.

I had to dig a bit to find the explicit connection with the Semantic Web, but the site offers a version of semantic search. Yahoo includes a link to the Semantic Web page at the World Wide Web consortium.

Let’s look at the system. Yahoo provides some suggested queries, but I prefer my own.

My first query was “enterprise search”. The system returned the following result page:

ymicro ent search 01

The map was visually arresting, but it was irrelevant to the query and the result set. I looked at the results and was surprised to find Microsoft was the number two result. The other results were okay. The same query on Google returned more Microsoft links. My conclusion was that the “semantic” feature on Yahoo worked about as well as regular Google. The other conclusion I drew was that Microsoft is working hard to come up at the top of the results list for the word pair “enterprise search”. Too bad I don’t think of Microsoft and enterprise search as sector leaders.

My second query was for the phrase “Michael Lynch Autonomy”. Here’s what Microsearch displayed:

ymicro lynch

For this query, the map did not render. I assumed that the system would show me the location of Autonomy’s headquarters in the United Kingdom. Sigh. Microsearch is at version 1.4 on July 3, 2008, and whizzy features should be working. The results were stale. The top ranked hit was a 2006 interview. My recollection is that the Financial Times ran an essay by Mr. Lynch a few days ago. Alas, the system seems unable to factor time into its results ranking. News stories often carry time and date data, and News XML includes explicit tags for these data. I ran the same query on standard Google. Google returned the results set more quickly than Yahoo. Google’s results were poor. The first hit was to someone other than Autonomy’s Mike Lynch. The other hits were more stale than Yahoo’s. Autonomy may want to emulate Microsoft’s search engine optimization push.

Observations

The semantic features of Microsearch did not appear front and center. The mapping function did not work. Compared to Google, Yahoo performed as well as market leader Google. To be fair, Google’s results were not too good and Yahoo hit that benchmark.

Agree? Disagree? Let me know.

Stephen Arnold, July 3, 2008

Texas: A Clever Twist on Computer Consulting

July 2, 2008

Working as an expert witness, I was in a big shot Houston, Texas, law firm. One of the legal eagles had screwed up his laptop. He asked me if I could resolve the problem. I looked at the machine, checked the size of his Outlook PST file (the cause of the problem), did a little nerd magic, and pronounced the machine battle ready.

According to an essay posted at Institute for Justice: Litigating for Liberty, “Magnum PC? New Texas Law Limits Computer Repair to Licensed Private Investigators”, I would have been guilty of a crime. You can read the story here.

The most interesting point in the write up for me is:

The law also criminalizes consumers who knowingly use an unlicensed company to perform any repair that constitutes an investigation in the eyes of the government.  Consumers are subject to the same harsh penalties as the repair shops they use: criminal penalties of up to one year in jail and a $4,000 fine, and civil penalties of up to $10,000—just for having their computer repaired by an unlicensed technician.

So, not only was I a bad buy, the lawyer was a bad guy too. I am not sure if this is a hoax or if it is one more example of how interesting the legal system is. A number of scenarios are buzzing through my little mind now. I wonder if consultants working for Booz, Allen & Hamilton involved in systems work will have to be licensed. Somehow a consultant licensed as a private investigator and being paid to root through a client’s computer tickles my funny bone. Texas will need to clarify its consultant monitoring policies, I suppose. The State can’t allow an unlicensed technical SWAT team to fix a computer without the right paperwork.

Next time I am in Texas, I won’t fix your Macbook, Windows notebook, your AS/400–not even your mobile phone with email access. I wonder how much a private investigator’s license is in Texas? Will I have to pass a physical?”

Stephen Arnold, July 2, 2008

Google and Capillary Action

July 2, 2008

I think it was Dr. Snow’s Biology 101 class in 1962 when I had to perform an experiment related to capillary action. Capillary action, as I recall, the ability of a substance to draw another substance into it. My experiment involved a beaker of some foul smelling substance, a chunk of a mop, and a scale. I had to calculate how quickly the stinky stuff moved from the beaker into the mop. I did the experiment, got an A, and continued through life indifferent to this fundamental physical principle so essential to life.

InfoWorld, a great online publication compared to its last days as a failing print publication, has an important essay “Can Google Apps Move Up Market?” The author is Tom Kaneshige, and he does a good job of explaining that Google Apps, while not quite toy applications, are likely to face some resistance in organizations. The most important observation in his write up for me was:

Although Google Apps may carve out niches, it’s unlikely that basic applications in the cloud will play a major role in the way giants of industry conduct business. Imagine sensitive business documents being shared in the cloud without comprehensive enterprise controls. Not only is Google Apps not ready … companies aren’t either.

I don’t want to dispute the InfoWorld essay. I agree with most of its points.

However, I think one important observation may be germane. Google is working like a little beaver to get developers to create software for Google. Google is dating Salesforce.com. There’s the Android initiative. There’s the Google partner ecosystem cranking out scripts via the OneBox API. There’s the mapping crowd extending Google’s ubiquitous geospatial footprint. Developers are a longer term investment, but over a two or three year span, Google’s jejune developer program will have an impact.

Also, Google, as you probably are aware, is chomping on the wooden doors at colleges and universities. I am surprised when I meet a person from Arizona State University who said to me in April 2008, “Google is all over the campus. It’s Gmail. It’s Google Calendar. It’s all Google all the time.” ASU is not alone. The GOOG has its snout into more than 300 major academic institutions. One deal is for 1.5 million students someplace in Australia that I wrote about here.

Google’s approach to the enterprise is a variant of capillary action. As these seemingly uncoordinated activities take place, time–not technology or aggressive salesmanship–will deliver for Google. Google is betting that as its most avid developers mature and its college users enter the work force, these folks will pull Google along. Why beat your head against a concrete wall as Mr. Ballmer did in one of his famous motivational presentations? Why not let capillary action pull Google Apps, the Google Search Appliance, and Google data management services into organizations. It’s easier and doesn’t create YouTube.com video moments.

Stephen Arnold, July 2, 2008

Answering Questions: Holy Grail or Wholly Frustrating

July 2, 2008

The cat is out of the bag. Microsoft has acquired Powerset for $100 million. You can read the official announcement here. The most important part of the announcement to me was:

We know today that roughly a third of searches don’t get answered on the first search and first click…These problems exist because search engines today primarily match words in a search to words on a webpage [sic]. We can solve these problems by working to understand the intent behind each search and the concepts and meaning embedded in a webpage [sic]. Doing so, we can innovate in the quality of the search results, in the flexibility with which searchers can phrase their queries, and in the search user experience. We will use knowledge extracted from webpages [sic] to improve the result descriptions and provide new tools to help customers search better.

I agree. The problem is that delivering on these results is akin to an archaeologist finding the Holy Grail. In my experience, delivering “answers” and “better results” can be wholly frustrating. Don’t believe me? Just take a look at what happened to AskJeeves.com or any of the other semantic / natural language search systems. In fact, doubt is not evident in the dozens of posts about this topic on Techmeme.com this morning.

So, I’m going to offer a different view. I think the same problems will haunt Microsoft as it works to integrate Powerset technology into its various Live.com offerings.

Answering Questions: Circa 1996

In the mid 1990s, Ask Jeeves differentiated itself from the search leaders with its ability to answer questions. Well, some questions. The system worked for this query which I dredged from my files:

What’s the weather in Chicago, Illinois?

At the time, the approach was billed as natural language processing. Google does not maintain comprehensive historical records in its public-facing index. But you can find some information about the original system here or in the Wikipedia entry here.

How did a start up in the mid-1990s answer a user’s questions online? Computers were slow by today’s standards and expensive. Programming was time consuming. There were no tools comparable to python or Web services. Bandwidth was expensive and modems, chugged along south of 56 kilobits per second, eagerly slowing down in the course of a dial up session.

jeeves 1997

I have no inside knowledge about AskJeeves.com’s technology, but over the years, I have pieced together some information that allows me to characterize how AskJeeves.com delivered NLP (natural language processing) magic.

Humans.

AskJeeves.com compiled a list of frequently asked questions. Humans wrote answers. Programmers put data into database tables. Scripts parsed the user’s query and matched it to the answers in the tables. The real magic, from my point of view, was that AskJeeves.com updated the weather table, so when the system received my query “What is the weather in Chicago, Illinois?”, the system would pull the data from the weather table and display an answer. The system also showed links to weather sites in case the answer part was incorrect or not what the user wanted.

Over time, AskJeeves.com monitored what questions users asked and added these to the system.

What happened when the system received a query that could not be matched to a canned answer in a data table? The system picked the closest question to what the user asked and displayed that answer. So a question such as “What is the square of aleph zero plus N?” generated an answer along the lines “The Cubs won the pennant in 1918?” or some equally crazy answer.

AskJeeves.com discovered several facts about its approach to natural language processing:

  1. Humans were expensive. AskJeeves.com burned cash. The company tried to apply its canned question answering system to customer support and ended up part of the Barry Diller empire. Humans can answer questions, but the expense of paying humans to craft templates, create answer tables, and code the system were too high then and remain cash hungry today.
  2. Humans asked questions but did not really mean what they asked? Humans are perverse. A question like “What’s a good bar in San Francisco?” can go off the rails in many ways. For example, what type of bar does the user require? Biker, rock, blue collar? What’s San Francisco? Mission, Sunset, or Powell Street? The problem with answering questions, then, is that humans often have a tough time formulating the right question.
  3. Information changes. The answer today may not be the answer tomorrow. A system, therefore, has to have some way of knowing what the “right” answer is in the moment. As it turns out, the notion of “real time”–that is, accurate information at this moment–is an interesting challenge. In terms of stock prices, the “now quote” costs money. The quote from yesterday’s closing bell is free. Not only is it tricky to keep the index fresh, to have current information may impose additional costs.

This mini-case sheds light on two challenges in natural language processing.

Read more

« Previous PageNext Page »

  • Archives

  • Recent Posts

  • Meta