Microsoft Cloud Economics
August 17, 2008
Richi Jennings is an independent consultant and writer, specializing in email, spam, blogging, and Linux. His article “On Microsoft Online Services” is worth reading. You can find it here. His assertion is that Microsoft’s pricing for its online services will weaken the service. Mr. Jennings identifies information technology managers’ lack of knowledge about the cost of running machines and software on premises. He notes:
vendors would tell potential purchasers that they [the vendors] could provide the service for less money than it was currently costing to run it in-house, but when it came time to actually quote for the service, most IT managers simply didn’t believe it cost them that much.
The point is that basic knowledge of what enterprise software costs may be a factor in the success or failure of cloud services. He contrasts Microsoft’s online service pricing with Google’s. Google is less expensive. A happy quack to Mr. Jennings for this analysis.
Stephen Arnold, August 17, 2008
Wired Weighs in about Google and Privacy
August 16, 2008
Much of the information in the article by Ryan Singel in Wired here has been floating at conferences and in lunch conversations for almost a month. Mr. Singel in his “Google Privacy Practices Worse than ISP Snooping AT&T Charges” pulls together threads about AT&T’s view of Google here. You will want to read the article. For me, the most interesting point was this quote from the reassembling Ma Bell:
AT&T does not at this time engage in practices that allow it to track a consumer’s search and browsing activities across multiple unrelated websites for the purpose [of] developing a profile of a particular consumer’s online behavior.
Permit to offer several personal observations about the notion of monitoring by companies who intermediate digital flows:
- Monitoring can be narrowly defined or more broadly defined. The fact is that monitoring is performed at multiple points by multiple parties. Without precise definitions, assertions about what an intermediary does or does not do are subject to interpretations.
- Intermediaries want to know about users for the purpose of “owning” the customer. In the present environment, security and ad monitoring are “in addition to” not “instead of” a long standing characteristic of intermediaries to obtain information in order to “serve” customers better.
- Today any intermediary can use a variety of mechanisms to monitor, track, and use tracking data. These data can be fine grained; that is, about a specific user with a stateful session. Alternatively, an anonymous user can be placed in one or more clusters and then be “refined” as more data arrive.
Wired has taken an important step. More information about the data models in use for usage data are needed. More information about tracking and usage methods available to large intermediaries is also needed. Finally, with the emergence of “janitor” technology that can automatically clean up ambiguities, more information about this suggestive innovation is needed as well. I want more information, not just assertions.
Stephen Arnold, August 16, 2008
The Future of Search Layer Cake
August 14, 2008
Yesterday I contributed a short essay about the future of search. I thought I was being realistic for the readers of AltSearchEngines.com, a darn good Web log in my opinion. I wanted to be more frisky than the contributions from SearchEngineLand.com and Hakia.com too. I’m not an academic, and I’m not in the search engine business. I do competitive technical analysis for a living. Search is a side interest, and prior to my writing the Enterprise Search Report, no one had taken a comprehensive look at a couple dozen of the major vendors. I now have profiles on 52 companies, and I’m adding a new one in the next few days. I don’t pay much attention to the university information retrieval community because I’m not smart enough to figure out the equations any more.
From the number of positive and negative responses that have flowed to me, I know I wasn’t clear about my focus on behind the firewall search and Google’s enterprise activities. This short post is designed to put my “layer cake” image into context. If you want to read the original essay on AltSearchEngines.com, click here. To refresh your memory, here’s the diagram, which in one form or another I have been using in my lectures for more than a decade. I’m a lousy teacher, and I make mistakes. But I have a wealth of hands on experience, and I have the research under my belt from creating and maintaining the 52 profiles of companies that are engaged in commercial search, content processing, and text analytics.
I’ve been through many search revolutions, and this diagram explains how I perceive those innovations. Furthermore, the diagram makes clear a point that many people do not fully understand until the bills come in the mail. Over time search gets more expensive. A lot more expensive. The reason is that each “layer” is not necessarily a system from a single vendor. The layers show that an organization rarely rips and replaces existing search technology. So, no matter how lousy a system, there will be two or three or maybe a thousand people who love the old system. But there may be one person or 10,000 who want different functionality. The easy path for most organizations is to buy another search solution or buy an “add in” or “add on” that in theory brings the old system closer to the needs of new users or different business needs.
Daticon’s Invenio
August 14, 2008
eDiscovery continues to bubble despite the lousy economy in North America. Several weeks ago we started the update procedure for our eDiscovery vendors. I made a mental note to post a short item about Daticon, a company supporting organizations engaged in electronic discovery. You can learn more about this company here. What interests me is the firm’s search technology, called Invenio. The technology is based on a neural network, and when I reviewed the system, some of its features reminded me of an outfit called Dolphin Search, but I may be wrong on this point. If Invenio is Dolphin Search, let me know.
Invenio is integrated with Daticon’s management tools. These tools are among the most fine grained I have seen. Once deployed, a manager can track most of the metrics associated with processing, reviewing, and screening email, documents, and other content associated with eDiscovery processes.
Here’s a representative display of system metrics.
There are similarities between Daticon’s approach and that of other eDiscovery specialists such as Stratify and Zantaz. Daticon bundles eDiscovery with a work flow, data capture, metrics, and a range of content processing functions.
The search and content processing system support concept searching, duplicate detection and duplicate removal, email threading, non text objects, and case management tools. Essentially this is a case management function that allows analysis of activities associated with a matter.
The company makes an interesting series of demonstrations available. I did not have to register to get walk throughs of the Invenio system. Try them yourself by clicking here.
Stephen Arnold, August 14, 2008
Autonomy Lands Fresh Content Tuna
August 13, 2008
Northcliffe Media, a unit of the Daily Mail and General Trust, publishes a score of newspapers mostly in the UK. Circulation is nosing towards a million. Northcliffe also operates a little more than two dozen subscription and ad support weekly regional tabloids and produces 60 ad supported free week shopper type publications. The company also cranks out directories, runs a printing business, and is in the newsstand business. The company whose tag line is “At the heart of all things local” has international interests as well. Despite the categorical affirmative “all”, I don’t think Northcliffe operates in Harrod’s Creek, Kentucky. Perhaps it should?
Autonomy, the Cambridge-based search and Okana (an Autonomy partner) have landed the Northcliffe Media search business; thus, a big content tuna. Okana describes big clients like Northcliffe as “platinum” not “tuna” but I wanted to keep my metaphor consistent and Okana rhymes with tuna.
Okana was Autonomy’s “Innovative Partner of the Year” in 2007. Okana says, “Based around proven architectures, Okana’s range of Autonomy appliance products provide instantly deployable and scaleable [sic] solutions for Information Discovery, Monitoring, and Investigation.Compliance.”
This description of Okana’s offering an “appliance” was new information for me. Also, my research suggests that many Autonomy IDOL installations benefit from training the IDOL system. If training is required, then I ask, “What if any trade off is required for an instant Autonomy IDOL deployment?” If anyone has details for me, please, use the comments section of this Web log.
Autonomy’s IDOL (Intelligent Data Operating Layer) will index an initial 40 million documents and then process new content. The new system will use Autonomy’s “meaning-based computing” for research, information retrieval and trend spotting.
You can read the complete Autonomy news release here. Once again Autonomy is making sales as word reaches me of competitors running into revenue problems. Autonomy’s a tuna catcher while others seem fated to return to port with empty holds.
Stephen Arnold, August 13, 2008
The Future of Search? It’s Here and Disappointing
August 13, 2008
AltSearchEngines.com–an excellent Web log and information retrieval news source–tapped the addled goose (me, Stephen E. Arnold) for some thoughts about the future of search. I’m no wizard, being an a befuddled flow, but I did offer several hundred words on the subject. I even contributed one of my semi-famous “layers” diagrams. These are important because each layer represents a slathering of computational crunching. The result is an incremental boost to the the underlying search system’s precision, recall, interface outputs, and overall utility of the system. The downside is that as the layers pile up so does complexity and its girl friend costs. You can read the full essay and look at the diagram here. A happy quack to the AltSearchEngines.com team for [a] asking me to contribute an essay and [b] having the moxie to publish the goose feathers I generate. The message in my essay is no laughing matter. The future of search is here and in many ways, it is deeply disappointing and increasingly troubling to me. For an example, click here.
Stephen Arnold, August 13, 2008
MarkLogic: The Army’s New Information Access Platform
August 13, 2008
You probably know that the US Army has nicknames for its elite units. Screaming Eagle, Big Red One, and my favorite “Hell on Wheels.” Now some HUMINT, COMINT, and SIGINT brass may create a MarkLogic unit with its own flash. Based on the early reports I have, the MarkLogic system works.
Based in San Carlos (next to Google’s Postini unit, by the way), MarkLogic announced that the US Army Combined Arms Center or CAC in Ft. Leavenworth, Kansas, has embraced MarkLogic Server. BCKS, shorthand for the Army’s Battle Command Knowledge System, will use this next-generation content processing and intelligence system for the Warrior Knowledge Base. Believe me, when someone wants to do you and your team harm, access to the most timely, on point information is important. If Napoleon were based at Ft. Leavenworth today, he would have this unit report directly to him. Information, the famous general is reported to have said, is nine tenths of any battle.
Ft. Leavenworth plays a pivotal role in the US Army’s commitment to capture, analyze, share, and make available information from a range of sources. MarkLogic’s technology, which has the Department of Defense Good Housekeeping Seal of Approval, delivers search, content management, and collaborative functions.
An unclassified sample display from the US Army’s BCKS system. Thanks to MarkLogic and the US Army for permission to use this image.
The system applies metadata based on the DOD Metadata Specification (DDMS). The content is managed automatically by applying metadata properties such as the ‘Valid Until’ date. The system uses the schema standard used by the DOD community. The MarkLogic Server manages the work flow until the file is transferred to archives or deleted by the content manager. MarkLogic points to savings in time and money. My sources tell me that the system can reduce the risk to service personnel. So, I’m going to editorialize and say, “The system saves lives.” More details about the BCKS is available here. Dot Mil content does move, so click today. I verified this link at 0719, August 13, 2008.
Is There a Mainframe in Your Future
August 13, 2008
Brian Womack’s article “Big Iron Anything But Rusty For Mainframe Pioneer IBM” brought a tear to my eye. Writing in Investor’s Business Daily here, Mr. Womak says:
IBM says revenue for its mainframe business rose 32% in the second quarter compared with a year earlier, easily outpacing overall sales growth of 13%. A big driver was February’s launch of IBM’s next-generation mainframe line, the z10, its first big upgrade since 2004. IBM spent about $1.5 billion on the new line.
The core of the article is an interview with David Gelardi, a 52-year-old mainframer. I don’t want to spoil your fun. I love mainframers who explain why big iron is as trendy as Heidi Klum’s catchphrase, “One day you’re in. And one day you’re out.” For example, consider this comment by Mr. Gelardi:
If I take (1,500 Intel) servers . . . and put them on a single mainframe, I’ll have no performance problems whatsoever. But I’m taking all of that workload that was on 1,500 separate servers and consolidating them on one mainframe. While it may be a million-dollar machine and up, it’s actually cheaper than those 1,500 servers.
This is pretty compelling data. I wonder if Google is aware of what it might gain if it were to abandon its decade of effort with commodity servers? Google and IBM are best buddies now. Maybe IBM will convince the GOOG to change its ways? Is there a mainframe in your future?
Stephen Arnold, August 13, 2008
More Search without Search
August 13, 2008
Google wizard Stephen R. Lawrence and sub-wizard Omar Khan invented a what I probably too simplistically characterize as meta-data vacuum cleaner. Useful for mobile devices, this addition to Google’s “search without search” arsenal is quite interesting to me. The invention is disclosed in US7,412,708, granted on August 12, 2008, with the title “Methods and Systems for Capturing Information.” If you are interested in how Google can deliver information before a user types a query or what type of data Google captures, you will want to read this 14 page document. Think email addresses and more.
The invention is not new, which is important. The GOOG is slow in integrating whizzy new monitoring technology in its public-facing systems. This invention was filed on on March 31, 2004. Figure nine to 12 months of work, I think that this is an important chunk of Google’s metadata vacuum cleaner. I cover a number of these inventions in Google Version 2.0. I discussed one exemplary data model for usage tracking data in my for-money July August column for KMWorld. I won’t rehash those documents in this Web log article. You can download a copy of the document from the good, old USPTO here. Study those syntax examples. That wonderful USPTO search engine is a treat to use.
What’s this invention do? Here’s the official legal eagle and engineer description:
Systems and methods for capturing information are described. In one embodiment, an event having an associated article is identified, article data associated with the article is identified, and a capture score for the event is determined based at least in part on article data. Article data can comprise, for example, one or a combination of a location of the article, a file-type of the article, and access data for the article. Event data associated with the event is compiled responsive at least in part to a comparison of the capture score and a threshold value.
The GOOG’s Gmail plumbing may need some patch ups, but once those pin hole leaks are soldered, US7,412,708 portends some remarkable predictive services. I can’t type on my mobile phone’s keyboard now. Google knows that I will be one of the people eager to let Google anticipate my needs. I wonder if there’s a link analysis routine running across those extracted metadata. I think I need to reread this patent document one more time. Join me?
Stephen Arnold, August 13, 2008
Data Centers: Part of the Cost Puzzle
August 11, 2008
The “commentary” is “Servers: Why thrifty Isn’t Nifty” which appears here. The “commentary” is by a wizard, Kenneth G. Brill, and he takes a strong stand on the topic of data center costs. The “commentary” is sponsored by SAP, an outfit that exercises servers to the max. Mr. Brill is the executive director of the highly regarded Uptime Institute in Santa Fe, New Mexico. Santa Fe is a high-tech haven. The Santa Fe Institute and numerous think tanks populate this city, a reasonable drive from LANL (Los Alamos National Laboratory). LANL is world famous for its security as you may know. With chaos theory and technical Jedis in every nook and cranny of the city except the art galleries, I am most respectful of ideas from that fair city’s intelligentsia.
The hook for the “commentary” is a report called Revolutionizing Data Center Efficiency. The guts of the report are recommendations to chief information officers about data centers. With the shift to cloud computing, data centers are hotter than a Project Runway winner’s little black dress. For me the most interesting part of this “commentary” was this statement:
One of these recommendations is to dramatically improve cost knowledge within IT…The facility investment required to merely plug-in the blades was an unplanned $54 million. An additional unplanned $30 million was required to run the blades over three years. So what appeared to be a $22 million decision was really an enterprise decision of over $106 million.
The “commentary” includes a table with data that backs up his analysis. The data are useful but as you will learn at the foot of this essay, offer only a partial glimpse of a more significant cost issue. You may want to read my modest essay about cost here.
What baffles me is the headline “Servers: Why Thrifty Isn’t Nifty”. Forbes’s editors are more in the know about language that I. I’m not sure about the use of the word thrifty because the “commentary” uses servers as an example of the cost analysis problem facing organizations when folks make assumptions without experience, adequate accounting methods, and a rat pack of 25 year old MBAs calculating costs.
Let me make this simple: cost estimations usually have little connection to the actual expenditures required to make a data center work. This applies to the data centers themselves, applications, or the add ons that organizations layer on their information technology infrastructure.
Poor cost analysis can sink the ship.
Mr. Brill has done a fine job of pointing out one cost hockey stick curve. There are others. Until folks like the sponsors of Mr. Brill’s “commentary” spell out what’s needed to run bloated and inefficient enterprise applications, cost overruns will remain standard operating procedure in organizations.
Before I close this encomium to Santa Fe thinking, may I point out:
- Engineering data centers is not trivial
- Traditional methods don’t work particularly well nor economically in the world of multi core servers and peta-scale storage devices stuffed into poor engineered facilities
- Buying high end equipment increases costs because when one of those exotic gizmos dies, it is often tough to get a replacement or a fix quickly. The better approach is to view hardware like disposable napkins?
Which is better?
[a] Dirt cheap hardware that delivers 4X to 15X the performance of exotic name brand servers or [b] really expensive hardware that both fails and runs slowly at an extremely high price? If you picked the disposable napkin approach, you are on the right track. Better engineering can do more than reduce the need for expensive, high end data center gear. By moving routine tasks to the operating system, other savings can be found. Re engineering cooling mechanisms can extend drive and power supply life and reduce power demands. There are other engineering options to exercise. Throwing money at a problem works if the money is “smart”. Stupid money just creates more overruns.
Mr. Brill’s “commentary” provides one view of data center costs, but I trust that he has the brand name versus generic costing in the report he references. If not, there’s always an opportunity in Santa Fe for opening an art gallery or joining LANL’s security team.
Stephen Arnold, August 11, 2008
Stephen Arnold, August 11, 2008