SharePoint: One Day You Are In, One Day Out

September 25, 2009

The Google is flexing its muscles. Microsoft is reveling in a 10 share of the Web search market and trying to deal with Zune shortages. Windows 7 is approaching like a runaway freight train. SharePoint 10 is locked and loaded. So what does the Google Labs team do to add to Microsoft’s task list? Google releases a dead easy way to move info from SharePoint to the Google platform. To make life interesting for Redmond, Google shakes the open source sauce bottle. You can read a useful description the API in “Google Sites API Opens SharePoint Escape Route”. Important move which complements the WAC attack I wrote about in KMWorld. WAC means Wave, Android, and Chrome. The acronym is now WACS. Ouch.

Stephen Arnold, September 25, 2009

Why SEO Experts Ruffle Goose Feathers

September 25, 2009

I am not an SEO enthusiast. In fact, I am worse than old fashioned. I am absolutely ossified. I believe that Web sites should be focused on solving a user’s problem. That means clear information, useful functions, and no “aren’t we smart” tricks to spoof my 88 year old father. When I read “Experts Offer Search Marketing Tips to Quickly Boost Sales”, I knew that most of the observations were made in a sincere attempt to help people with Web sites that don’t pay the bills. I even agreed with some of the observations. A case in point was the suggestion to include user generated content on sites. But some of the recommendations and the approach taken in the article troubled me. I did not like the phrase “quickly boost sales.” Web site changes may produce some immediate pay off, but the notion that taking Action A will lead to instant cash (Outcome B) is misleading. For some sites, Action A may produce an unexpected event such as dropping in a Google results list. There are a couple of hundred factors in the Google PageRank algorithm and with smart software doing the heavy lifting, not even Google’s wizards can figure out what may have caused an unexpected event. Even more annoying was the lack of qualification in the experts’ statements. If these folks knew exactly what to do to hit the Google home run, would these folks be giving lectures at SEO conferences or would they be sitting home watching the AdSense money roll in?

Stephen Arnold, September 25, 2009

Wave Drowns Internet Explorer

September 25, 2009

In my lingo, Google has marginalized Internet Explorer. One take on this dismissal of Microsoft’s technology appears in “Google Plug-In Makes IE8 10x Faster. Chrome Frame Instantly Boosts Microsoft Browser’s JavaScript Performance.

I don’t want to walk do the worn path in the grove of grief that is Internet Explorer 3, 4, 5, 6, and 7. I cannot comment about IE 8 because I used it once on new machines only to download Chrome and Opera. Firefox has some weird memory and thrashing behavior that forced me to dump Firefox. You can read about Chrome finding IE6 and resolving its issues by making IE6 into Chrome. But the big news for me was in the article “We Give Up on Internet Explorer say Google Wave Team” which appeared on IT Wire. The article stated:

Google assert, Internet Explorer has not kept up with recent developments in Web technology. The Google team claim Internet Explorer’s JavaScript performance is many times slower than that of Firefox 3.5, Google’s own Chrome browser and Apple’s Safari 4. Additionally, Google state Internet Explorer’s support for the HTML 5 standard is also far behind these browsers. The Google Wave team explain they have spent countless hours solely on improving the Google Wave experience within Internet Explorer but have decided to just cut their losses. By producing Chrome Frame future development effort can be expended solely in core development for all users, but without leaving Internet Explorer users behind.

In my opinion, Google’s action makes clear how Google’s engineers will deal with Microsoft’s technical inadequacies. Google, as I see it, will identify a Microsoft stumble and then write code that handles the problem the way medical workers deal with an Ebola patient. Exercise appropriate caution and use isolation to minimize risk to patient, other patients, medical staff, and family members.

This approach signals a change in the way in which Google will deal not just with Microsoft, but it tells me that Web page designers who work to spoof Google will face similar treatment. Companies that develop software to fiddle in unacceptable ways with functions Google has crafted will find themselves sitting in an isolation chamber with no electrical power.

How can Google take this type of action?

Easy. Its engineers perceive Google as the winner in online, search, and the platform services the company is dribbling out bit by bit. Few know that Google has thousands of these “grains of sand” and even fewer know that with a little lime and ash can build some impressive computational structures.

The isolation ward tactic or IWT makes clear that Google’s management understands that the Rubicon has been crossed and the digital Caesar is marching to Rome. The folks in Rome were nervous 2000 years ago and the folks in the traditional computing world are nervous today. Caesar slaughtered a city as a good bye message, and he ignored the “tradition” which kept generals and their armies well away from the seat of power. Gloogle looks a lot like a digital Caesar to me.

I made this point in late 2008 at the poorly attended Enterprise Search Summit and got a truck load of crap from people who did not agree with my statement: “Google has won in search and the enterprise.” I stand by that statement one year later. Marginalization is proof of the soundness of my observation about Google’s strength.

At this time, Google is growing via capillary action. Each service diffuses into space, pulled by molecular forces that most choose not to measure or quantify. Yesterday at the briefing in which I participated at the National Press Club, I heard Somat Engineering’s president state that government agencies must develop methods that bridge the gap between their existing systems and Google’s services. That was echoed by a former CIA intelligence officer who is no friend of any major vendor. His view was practical and pragmatic. Google’s premier government partner—Adhere Solutions—listed a number of features and functions that can reduce costs and improve delivery of citizen services. The audience, which consisted of high ranking government officials, made clear in the Q&A session that Google’s presence was indeed a significant one.

Now Microsoft is finding itself pushed from the center of the stage. If Google can dodge potentially lethal legal and management bullets fired at the company, Microsoft may find itself watching the Google show from the balcony or on Turner Classic Movies from a trailer park in Hardin County, Kentucky.

Gentle reader, feel free to disagree. Just bring facts, not uninformed views and recycled punditry.

Stephen Arnold, September 24, 2009

Scaling SharePoint Could Be Easy

September 24, 2009

Back in the wonderful city of Washington, DC. I participated in a news briefing at the National Press Club today (September 23, 2009). The video summary of the presentations will be online next week. During the post briefing discussion, the topic of scaling SharePoint came up. The person with whom I was speaking sent me a link when she returned to her office. I read “Plan for Software Boundaries (Office SharePoint Server)” and realized that this Microsoft Certified Professional was jumping through hoops created by careless system design. I don’t think the Google enterprise applications are perfect, but Google has eliminated the egregious engineering calisthenics that Microsoft SharePoint delivers as part of the standard software.

I can deal with procedures. What made me uncomfortable right off the bat was this segment in the TechNet document:

    • In most circumstances, to enhance the performance of Office SharePoint Server 2007, we discourage the use of content databases larger than 100 GB. If your design requires a database larger than 100 GB, follow the guidance below:
      • Use a single site collection for the data.
      • Use a differential backup solution, such as SQL Server 2005 or Microsoft System Center Data Protection Manager, rather than the built-in backup and recovery tools.
      • Test the server running SQL Server 2005 and the I/O subsystem before moving to a solution that depends on a 100 GB content database.
    • Whenever possible, we strongly advise that you split content from a site collection that is approaching 100 GB into a new site collection in a separate content database to avoid performance or manageability issues.

Why did I react strongly to these dot points? Easy. Most of the datasets with which we wrestle are big, orders of magnitude larger than 100 Gb. Heck, this cheap net book I am using to write this essay has a 120 Gb solid state drive. My test corpus on my desktop computer weighs in at 500 Gb. Creating 100 Gb subsets is not hard, but in today’s petascale data environment, these chunks seem to reflect what I would call architectural limitations.

As I worked my way through the write up, I found numerous references to hard limits. One example was this statement from a table:

Office SharePoint Server 2007 supports 50 million documents per index server. This could be divided up into multiple content indexes based on the number of SSPs associated with an index server.

I like the “could be.” That type of guidance is useful, but my question is, “Why not address the problem instead of giving me the old “could be”? We have found limits in the Google Search Appliance, but the fix is pretty easy and does not require any “could be” engineering. Just license another GSA and the system has been scaled. No caveats.

I hope that the Fast ESP enterprise search system tackles engineering issues, not interface (what Microsoft calls user experience). In order to provide information access, the system has to be able to process the data the organization needs to index. Asking my team to work around what seem to be low ceilings is extra work for us. The search system needs to make it easy to deliver what the users require. This document makes clear that the burden of making SharePoint search falls on me and my team. Wrong. I want the system to lighten my load, not increase it with “could be” solutions.

Stephen Arnold, September 24, 2009

Data Transformation and the Problem of Fixes

September 24, 2009

I read “Fix Data before Warehousing It” by Marty Moseley and came away with the sense that some important information was omitted from the article. The essay was well written. My view is that the write up should have anchored the analysis in a bedrock of cost analysis.

Data shoved into a data warehouse are supposed to reduce costs. Stuffing inconsistent data into a warehouse does the opposite. My research as well as information I have heard suggests that data transformation (which includes normalization and the other “fixing tasks”) can consume up to one third of an information technology budget. Compliance is important. Access is important. But the cost of fixing data can be too high for many organizations. As a result, the data in the data warehouse are not clean. I prefer the word “broken” because that word makes explicit one point—the outputs from a data warehouse with broken data may be misleading or incorrect.

The ComputerWorld article is prescriptive, but it does not come right out and nail the cost issue or the lousy outputs issue. I think that these two handmaidens of broken data deserve center stage. Until the specific consequences of broken data are identified and made clear to management, prescriptions won’t resolve what is a large and growing problem. In my world, the failure of traditional warehousing systems to enforce or provide transformation and normalization tools makes it easier for a disruptive data management system to overthrow the current data warehousing world order. Traditional databases and data warehousing systems allow broken data and, even worse, permit outputs from these broken data. Poor data management practices cannot be correct by manual methods because of the brutal costs such remediation actions incur. My opinion is that data warehousing is reaching a critical point in its history.

Automated methods combined with smart software are part of the solution. The next generation data management systems can provide cost cutting features so that today’s market leaders become very quickly tomorrow’s market followers. Just my opinion.

Stephen Arnold, September 24, 2009

Guha’s Most Recent Patent: Enhanced Third Party Control

September 24, 2009

I am a big fan of Ramanathan Guha’s engineering. From his work on the Programmable Search Engine in 2007 to this most recent invention, he adds some zip to Google’s impressive arsenal of smart methods. You may want to take a look at US 7,593,939, filed in March 2007, a few weeks after his five PSE inventions went to the ever efficient USPTO. This invention “Generating Specialized Search Results in Response to Patterned Queries”

Third party content providers can specify parameters for generating specialized search results in response to queries matching specific patterns. In this way, a generic search website can be enhanced to provide specialized search results to subscribed users. In one embodiment, these specialized results appear on a given user’s result pages only when the user has subscribed to the enhancements from that particular content provider, so that users can tailor their search experience and see results that are more likely to be of interest to them. In other embodiments the specialized results are available to all users.

What I find interesting is that this particular method nudges the ball forward for third party content providers so certain users can obtain information enhancements. The system makes use of Google’s “trust server,” answers questions, and generates a new type of top result for a query. The invention provides additional color for Dr. Guha’s semantic systems and methods which nest comfortably within the broader dataspace inventions discussed at length in Google: The Digital Gutenberg. For a more detailed explanation of the invention, you can download the open source document from the USPTO or another US patent provider. When will Google make a “Go Guha” T shirt available. Oh, for those of you new to my less-than-clear explanation of Google’s technology, you can find the context for this third party aspect of Google’s PSE and publishing / repurposing semantic system in my Google Version 2.0, just click on Arnold’s Google studies. This invention makes explicit the type of outputs a user may receive from the exemplary system referenced in this open source document. This invention is more substantive than “eye candy” user experience as defined by Microsoft and light years ahead of the Yahoo “interface” refresh I saw this morning. The Google pushes ahead in search technology as others chase.

Stephen Arnold, September 23, 2009

Coveo and Email Search

September 24, 2009

My two or three readers felt compelled to send me links to a number of Web write ups about Coveo’s email search system. I have tested the system and found it quite good, in fact, excellent. For forensic search of a single machine, I have been testing a “pocket search” product from Gaviri, and I find that quite effective as well. If you are not familiar with the challenges email search presents, you may want to take a look at one of the Coveo-centric news stories, which does quite a good job of explaining the challenge and the Coveo solution. The article is “Coveo Brings Enterprise Search Expertise to Email” by Chelsi Nakano. For me the key passage was:

There’s at least one happy customer to speak of: “Other solution providers require you to spend tens if not hundreds of thousands in service fees to customize the enterprise search solution and make enterprise search work for your employees,” said Trent Parkhill, VP, Director IT of Haley and Aldrich. “With Coveo […] enterprise search now meshes seamlessly with classification and email archiving to give us a full email management solution.”

Happy customers are more important to me than megabytes of marketing PDFs and reports from azure chip consultants who try too, too hard to explain a useful, functional system. More info is available directly from Coveo.

Google Waves Build

September 24, 2009

I am a supporter of Wave. I wrote a column about Google WAC-ing the enterprise. W means wave; A is Android, and C represents Chrome. I know that Google’s consumer focus is the pointy end of the Google WAC thrust, but more information about Wave is now splashing around my webbed feet here in rural Kentucky. You take a look at some interesting screenshots plus commentary in “Google Wave Developer Preview: Screenshots.” Perhaps you will assert, “Hey, addled goose, this is not search.” I reply, “Oh, yes, it is.” The notion of eye candy is like lipstick on a pig. Wave is a new animal that will carry you part of the way into dataspace.

Stephen Arnold, September 24, 2009

Mobile News Aggregation

September 23, 2009

I wrote an essay about the impending implosion of CNN. The problem with traditional media boils down to cost control. Technology along won’t keep these water logged outfits afloat. With demographics working against those 45 years of age and above, the shift from desktop computers to portable devices creates opportunities for some and the specter of greater marginalization for others. I saw a glimpse of the future when I looked at Broadersheet’s iPhone application. You can read about the service in “Broadersheet Launching “Intelligent News Aggregator” iPhone App”. The app combines real time content with more “traditional” RSS content. The operative words for me are “intelligent”” and “iPhone”. More information is available on the Broadersheet Web site. Software that learns and delivers information germane to my interests on a mobile device is not completely new, of course. The Broadsheet approach adds “time” options and a function that lets me add comments to stories. This is not convergence; the application makes clear the more genetic approach of blending DNA from related software functions.

Stephen Arnold, September23, 2009

Google News Yaggs

September 23, 2009

Short honk: Just passing along the allegation that Google News went down for one hour on Tuesday, September 22, 2009. The story “Google News Back Up after Outage” asserted that Google News went offline. The interest in cloud and blended cloud and on premises computing continues to creep upwards. If the allegation is true, the problems at Google News are yet another Google glitch. That old time failover failed if the assertion is true.

Stephen Arnold, September 23, 2009

« Previous PageNext Page »

  • Archives

  • Recent Posts

  • Meta