Clearpace RainStor Supports Queries

June 11, 2009

A happy quack to the reader in Australia who alerted me to an outfit called Clearpace Software. According the the company’s Web site, Clearpace is

a software company that provides data archive store solutions for the long-term retention of structured data within the enterprise. Clearpace has become a pioneer in the database archiving market by providing archive stores that are the optimal destination for inactive data that has been removed from production systems. The Clearpace NParchive software enables organizations with large and growing data estates to massively reduce the cost and complexity of storing historical information while making archived data easily accessible for regulatory, legal and business purposes. Using NParchive, companies are able to store as much as 60x more historical information on commodity hardware.

The angle that interested me was that Clearpace includes a query tool with its system. The idea is that a Clearpace client can search the data in the Clearpace RainStor archive. Here’s what the company says about the Rainstor cloud storage service:

RainStor is a cloud-based archiving service for simply, securely and cost-effectively preserving historical structured data. The RainStor archive service enables companies to send an unlimited amount of inactive data from databases or event logs to a hosted storage platform where it can be retained and searched on demand. RainStor compresses data by 40x before transferring encrypted data files to the cloud, providing rapid load times and reduced storage cost, while also supporting full SQL access to the archived data files using industry standard reporting tools and interfaces. RainStor is delivered on a Software-as-a-Service (SaaS) basis, leveraging cloud infrastructure. The RainStor cloud archive service requires no upfront investment offering a pay-as-you-use model based on the volume of raw data that is sent to the cloud. Rainstor is provided as a service by Clearpace Software.

You can read the company’s news release here.

I don’t have too much information about the search function. My questions will focus on latency and the user interface.

Stay tuned.

Stephen Arnold, June 11, 2009

Every Cloud Has a Tin Lining

June 9, 2009

I found the article “Microsoft Exec Sees Lower Margins from “Cloud” suggestive. You can read the article here (I hope), because this is the type of document that can come back and nibble at one’s ankles. The idea is that selling cloud services yields less revenue than selling shrink wrapped software. The article reported:

Microsoft Corp’s chief software architect said on Thursday the profit margins on providing online services — broadly known as cloud computing — would likely yield a lower profit margin than the company’s existing software business. “The margins on services are not like the margins on software, so it (cloud computing) will increase our profit and it will increase our revenue, but you won’t have that margin,” said Ray Ozzie on Thursday at a Silicon Valley technology event.

Several observations:

  1. If Microsoft deploys a cloud based enterprise search solution, the payback on the $1.2 billion purchase price, the engineering rework, and the marketing of the Fast ESP system may take a longer time to get into the black
  2. Stakeholders looking for a jet boost to the MSFT share price gets their feet placed in a bucket of ice water
  3. If the MSFT assertion is accurate, cost control becomes a much more significant for MSFT going forward in a lousy economy.

Stephen Arnold, June 8, 2009

Clouds Part, Truth Peeks Out

June 8, 2009

Ned Batchelder’s “Real World Cloud Computing is worth reading. You can find the article here. The information is a summary of key and interesting observations made by six start ups’ top guns about cloud computing. I don’t want to spoil your enjoyment of the original post. I would like to quote one passage to motivate you to read the article:

Here are five points about Amazon’s cloud services:

  • “Can’t send lots of emails, since you need a spam-white listed server
  • Disk I/O in the cloud is a lot slower than in real machines (“punishingly slow”).
  • Want a db server with 32Gb RAM? Amazon doesn’t offer it.
  • Want dual hardware load balancers? Amazon doesn’t have it.
  • PCI (credit card) compliance is a problem: use a 3rd-party cart or PayPal instead of doing it yourself in the cloud.”

Very useful.

Stephen Arnold, June 8, 2009

Exalead’s Vision for Enterprise Search

June 4, 2009

I had a long conversation with Exalead’s director of marketing, Eric Rogge. We covered a number of topics, but one of his comments seemed particularly prescient. Let me summarize my understanding of his view of the evolution of search and offer several comments.

First, Exalead is a company that provides a high performance content processing system. I profiled the company in the Enterprise Search Report, Beyond Search, and Successful Enterprise Search Management. Furthermore, I use the company’s search system for my intelligence service Overflight, which you can explore on the ArnoldIT.com Web site. Although I am no expert, I do know quite a bit about Exalead and how it enables my competitive intelligence work.

Second, let me summarize my understanding of Mr. Rogge’s view of what search and content processing may be in the next six to 12 months. The phrase that resonated with me was, “Search Based Applications.” The idea, as I understand it, is to put search and content processing into a work process. The “finding” function meshes with specific tasks, enables them, and reduces the “friction” that makes information such an expensive, frustrating experience.

Mr. Rogge mentioned several examples of Exalead’s search base applications approach. The company has a call center implementation and an online advertising implementation. He also described a talent management solution that combines search with traditional booking agency operations. The system manipulates image portfolios and allows the agency to eliminate steps and the paper that once was required.

The company’s rich media system handles digital asset management, an area of increasing importance. Keeping track of rich media objects in digital form requires an high-speed, easy-to-use system. Staff using a digital asset management system have quite different needs and skill levels. Due to the fast pace of most media companies, training is not possible. A photographer and a copyright specialist have to be able to use the system out of the box.

But the most interesting implementation of the SBA architecture was the company’s integration of the Exalead methods into a global logistics company. The information required to tell a client where a shipment is and when it will arrive. The Exalead system handles 5GB of structured data to track up to 1M shipments daily. Those using the system have a search box, topics and clients a click away, and automated reports that contain the most recent information. Updating of the information occurs multiple times each hour.

Finally, my view of his vision is quite positive. I know from my research that most people are not interested in search. What matters is getting the information required to perform a task. The notion of a search box that provides a way for the user to key a word or two and get an answer is desirable. But in most organizations, users of systems want the information to be “there”. That’s the reason that lists of topics or client names are important. After all, if a person looks up a particular item or entity several times a day, the system should just display that hot link. The notion of Web pages or displays that contain the results of a standing query is powerful. Users understand clicking on a link and seeing a “report” that mashes up information from various sources.

Exalead is winning enterprise deals in the US and Europe. My hunch is that the notion of the SBA will be one that makes intuitive sense to commercial enterprises, government agencies, and not-for-profit organizations. More important, the Exalead system works.

Stephen Arnold, June 5, 2009

Finding Info about Tsunami Named Google Wave

May 30, 2009

If you are want to ride the Google Wave, you need to get up to speed. I found a couple of resources that may be useful to you. I don’t recommend the Google Web site or the Web log posts. These are breezy and are not as comprehensive as some of the third party write ups. I looked at a number of descriptions today. I would recommend that you read Ben Parr’s Google “Wave: A Complete Guide” here. Then you can sit back and check out the official video. You can find an easy link on Google Blogoscoped here or look at the Google Channel on YouTube. Once you have this information under your belt, head on over to my Overflight service here and read the posts about Wave on the Google Web logs. If you are into code instead of marketing frazzle, click here. I want to reiterate what I wrote earlier. The Wave swamped the new Microsoft Web surfer, Bing Kumo.

Stephen Arnold, May 30, 2009

More about Exalead and Its Miiget Technology

May 30, 2009

I mentioned Exalead’s Miiget not long ago. I received a couple of questions about the technology. To provide more color to that reference you may want to look at Same Story here. That company has licensed the Exalead technology. The announcement of the deal is here. The system provides content from the Same Story repository and from other sources. The system incorporates profiles so that information is tailored to the user. You can get more information about the Miiget technologies here.

Stephen Arnold, May 30, 2009

Cyberwarfare Attack Devices

May 26, 2009

If you worry about enterprise search, you won’t find much of interest in this Aviation Week. The addled goose, on the other hand, sees the story “Network Attack Weapons Emerge” here by David Fulghum as a precursor of similar information initiatives in the business arena. Information is a strategic asset and methods to locate, disrupt, intercept, and analyze those assets are going to remain and become increasingly significant. The core of the Aviatiion Week story was this comment:

Devices to launch and control cyber, electronic and information attacks are being tested and refined by the U.S. military and industry in preparation for moving out of the laboratory and into the warfighter’s backpack.

Mr. Fulghum added:

The Russians conducted a cyberattack that was well coordinated with what Russian troops were doing on the ground,” says a longtime specialist in military information operations. “It was obvious that someone conducting the cyber[war] was talking to those controlling the ground forces. They knew where the [cyber]talent was [in Russia], how to use it, and how to coordinate it. “That sophisticated planning at different levels of cyberwarfare surprised a lot of people in the Defense Dept.,” he says. “It looked like a seamless, combined operation that coordinated the use of a range of cyberweapons from the sophisticated to the high school kids that thought it was cool to deface official web sites. The techniques they used everybody knows about. The issue was how effective they were as part of a combined operation.”

I found interesting his description of the components of a cyberattack toolkit:

The three major elements of a cyberattack system are its toolbox, planning and execution capabilities. The toolbox is put together by the hardware and software experts in any organization to address specific missions. They maintain the database of available capabilities.

Worth reading.

Stephen Arnold, May 26, 2009

Amazon to DC

May 25, 2009

With the Army embracing Windows Vista and the Google moving appliances, Amazon has, if this news report is accurate, decided to chow down at the Federal feed bag. TechFlash here reported that Amazon wants to hire a government savvy manager. If you are tracking Amazon’s non book activities, you will want to read Eric Engleman’s “Amazon Targets New Web Services Customer: Uncle Sam”. Mr. Engleman wrote:

There are certainly lots of technology possibilities emerging with the incoming Obama administration, including the president-elect’s proposal to digitize the nation’s health care records (Microsoft and Google have projects to put personal health records online). Is Amazon lining up to tap federal dollars?

The answer may be yes.

Stephen Arnold, May 25, 2007

Copyright and the Real Time Microblog Phenom

May 24, 2009

Liz Gannes’ “Copyright Meets a New Worth Foe: The Real Time Web” is an interesting article. You can find it on NewTeeVee.com here. Her point is that copyright, the Digital Millennium Copyright Act, and other bits and pieces of legal whoopdedoo struggle with real time content from Twitter-like services. She wrote:

If you’re a copyright holder and you want to keep up with your pirated content flitting about the web — well, good luck. The way the DMCA is set up means you’re always chasing, and the real-time web is racing faster than ever before. Analytics services are only just emerging that will tell you where your views are coming from on a semi-real-time basis. That’s especially true for live video streaming sites such as Ustream and Justin.tv. Justin.tv, in particular, has come under fire by sports leagues for hosting camcorded streams of live game broadcasts. The company says it takes down streams whenever it is asked to. But the reality is, often the moment has passed.

In short, information flows move more quickly than existing business methods. An interesting illustration of this flow for video is Twiddeo here. Government officials have their work cut out for them with regard to ownership, copyright, and related issues.

But…

As I read this article, I thought about the problem Google has at this time with real time content. Google’s indexing methods are simply not set up to handle near instantaneous indexing of content regardless of type. In fact, fresh search results on Google News are stale when one has been tracking “events” via a Twitter like service.

As important is the “stepping back” function. On Google’s search results displays, how do I know what is moving in near real time; that is, what’s a breaking idea, trend, or Tweet? The answer is, “I don’t.” I can hack a solution with Google tools, but even then the speed of the flow is gated by Google’s existing indexing throughput. To illustrate the gap, run a query for American Idol on Google News and then run the query on Tweetmeme.com.

Two different slants biased by time. In short, copyright problem and Google problem.

Stephen Arnold, May 24, 2009

Google, YouTube, and Digital Volume

May 22, 2009

Short honk: A year or so ago, I learned that Google received about one million new video objects per month. TechCrunch reported here that Google’s YouTube.com ingests about 20 hours of video every minute. I don’t know if this estimate is spot on, but it is clear that YouTube is amassing one of the world’s largest collections of rich text content in digital form. For me, the most interesting information in the write up was:

Back in 2007, shortly after Google bought the service, it was 6 hours of footage being uploaded every minute. As recently as January of this year, that number had grown to 15 hours, according to the YouTube blog. Now it’s 20 — soon it will be 24.

Lots of data means opportunity for the GOOG. I am looking forward to having the audio information searchable.

Stephen Arnold, May 22, 2009

« Previous PageNext Page »

  • Archives

  • Recent Posts

  • Meta