Amazon: Insight into Search, Engineering, and Cloud Computing
April 28, 2011
In order to locate data, one must be able to search for it. If search does not work, data are lost. Seems obvious but one of the consequences of the Amazon cloud outage was that I had to think about the online big box store again. Amazon is, to me, a convenient way to get books and buy a gift or a replacement BlackBerry battery. Even when the A9 service was a priority, Amazon’s ability to make information findable was hit and miss.
Even today, I have a tough time thinking of Amazon as giant, reliable, low cost information utility. I have difficulty finding lists of books “about” a subject. Sometimes I stumble upon this user created content; other times, I have no idea how to find this useful information. When I want a book, I don’t know how to NOT out books that are available from those that will be published in the future. I cannot find information about the credits I “earn” when I buy Kindle books or products using my Amazon credit card. The snail mail coupons I used to get have disappeared, and I don’t have a clue about “finding” this information.
Several years ago, we did a close look at how Amazon handled glitches. The information was not that different from other companies we had examined. However, one approach was interesting. When an outage took place, a small team was assembled to figure out what happened and to fix it. This approach has its upside such as speed and fluid problem solving. The downside, in my opinion, was that solutions could be ad hoc. In my view, the next time a problem cropped up, the Amazon approach I probed three years ago meant that the next problem solving team had to figure out what the previous team did. No big deal until the problem of figuring out everything consumed lots of time.
We are not using Amazon Web services. Call me old fashioned but I prefer to have data storied on local devices with appropriate backups on media in an off site location.
For another, unrelated project we ran a series of tests in 2010 on the take up of the phrase “cloud computing.” What we learned was that the actual traffic generated by the phrase “cloud computing” was far less than our client anticipated.
After a six month text, we concluded:
- There was a large amount of information about cloud computing from a bewildering range of vendors big and small
- The interest in cloud computing was less than in some other words and bound phrases we tested
- The information about cloud computing was a cloud of semantic fuzziness; that is, it was difficult to pin down specifics within the documents written about cloud computing.
What happens when you combine a retail store with a cloud computing service? You get an anchor point. Amazon becomes associated with certain words and phrases, but these may not have much meaning. Examples range from acronyms from S3 to EC2.
What happens when a company which has associated itself with this difficult to define subject has an outage? The problems of Amazon immediately diffuse across other products and services available in the cloud.
You can see an example of this semantic drift in “Amazon: Some Data Won’t Be Recovered after Cloud Outage.” The article points out that the Amazon “outage” has resulted in data that “won’t be recovered.” The problem is no one that Amazon and its customers must resolve.
Amazon’s close association with cloud computing has made the Amazon incident the defining case for the risks of cloud computing. Even worse, unrecoverable data cannot be found. Search and retrieval does little good if the data no longer exist. Services which depend on their customers locating information are effectively stranded. Those affected include “Quora, Sencha, Reddit, and FourSquare.”
So what?
This problem at Amazon provides some insight into the firm’s engineering approach. In a larger arena, the close association of Amazon with cloud computing has had a somewhat negative impact on the concept of cloud computing. To sum up:
- You can’t find information if it is not “there”
- Amazon’s engineering methods are interesting and may give some companies some additional analysis to perform
- The impact of the outage has created some pushback for other cloud computing vendors.
Will this be a defining moment for Amazon? Probably not, but it is an interesting moment. Non-recoverable is a disturbing notion to those who have to find a fact, entity, or a concept. Amazon has figured out some aspects of eCommerce. Other areas warrant additional investment which may be why Amazon’s costs are skyrocketing.
Stephen E Arnold, April 28, 2011
Freebie
Will Oracle Conspire to Cause Cloud to Pelt Hail?
April 7, 2011
Since Oracle‘s large sales increase last quarter, driven in part by cloud computing demands, ReadWrite Cloud ponders: “Oracle Had a Killer Quarter- What Does That Mean for Open Source in the Cloud?”
Oracle’s success may be a hindrance for open source cloud applications. Because so many companies already use Oracle’s database management software, and because migrating from it can be costly and difficult, many choose to stick with Oracle rather than seeking choices.
Ed Boyahjian, CEO of support and management tool provider EnterpriseDB, has hope for open source:
‘We’re engaged with every major cloud provider today on how they can have an open source database alternative,’ he says. Boyajian says a lot of customers are tired of being locked into Oracle’s databases and are looking for an alternative.”
I hope he’s right. It would be a shame to see Oracle’s lock on our databases dwindle the opportunity for more open source solutions.
Cynthia Murrell, April 7, 2011
Freebie unlike Oracle’s enterprise software and services
Recorded Future in the Spotlight: An Interview with Christopher Ahlberg
April 5, 2011
It is big news when In-Q-Tel, the investment arm of the US intelligence community, funds a company. It is really big news when Google funds a company. But when both of these tech-savvy organizations fund a company, Beyond Search has to take notice.
After some floundering around, ArnoldIT was able to secure a one-on-one interview with the founder of Recorded Future. The company is one of the next-generation cloud-centric analytics firms. What sets the company apart technically is, of course, the magnetism that pulled In-Q-Tel and Google to the Boston-based firm.
Mr. Ahlberg, one of the founders of Spotfire which was acquired by the hyper-smart TIBCO organization, has turned his attention to Web content and predictions. Using sophisticated numerical recipes, Recorded Future can make observations about trends. This is not fortune telling, but mathematics talking.
In my interview with Mr. Ahlberg, he said:
We set out to organize unstructured information at very large scale by events and time. A query might return a link to a document that says something like “Hu Jintao will tomorrow land in Paris for talks with Sarkozy” or “Apple will next week hold a product launch event in San Francisco”). We wanted to take this information and make insights available through a stunning user experiences and application programming interfaces. Our idea was that an API would allow others to tap into the richness and potential of Internet content in a new way.
When I probed for an example, he told me:
What we do is to tag information very, very carefully. For example, we add metatags that make explicit when we locate an item of data. We tag when that datum was published. We tag when we analyzed that datum. We also tag when we find it, when it was published, when we analyzed it, and what actual time point (past, present, future) to which the datum refers. The time precision is quite important. Time makes it possible for end users and modelers to deal with this important attribute. At this stage in our technology’s capabilities, we’re not trying to claim that we can beat someone like Reuters or Bloomberg at delivering a piece of news the fastest. But if you’re interested in monitoring, for example, the co-incidence of an insider trade with a product recall we can probably beat most at that.
To read the full text of the interview with Mr. Ahlberg click here. The interview is part of the Search Wizards Speak collection of first person narratives about search and content processing. Available without charge on the ArnoldIT.com Web site, the more than 50 interviews comprise the largest repository of first hand explanations of “findability” available.
If you want your search or content processing company featured in this interview series, write seaky2000 at yahoo dot com.
Stephen E Arnold, April 5, 2011
Freebie
People and Big Data: Analytics for Mr and Ms Couch Potato
March 24, 2011
I have to admit that the idea of big data and the “people” was a concatenation new to me. I just read “Data Science Tookit Brings Big Data Analysis to the People.” Let’s look at this snippet:
Data Science Toolkit offers OCR functionality to convert PDFs or scanned image files to text files, filter geographic locations from news articles and other types of unstructured data or find political district and neighborhood information for any given location. Data Science Toolkit is available as a web service online, but it can also be downloaded and run on an Amazon EC2 or VM virtual machine.
I live in Harrod’s Creek, Kentucky. The “people” in this metropolis of a couple of thousand people consists of folks who use the Internet to look at pictures, send email, and maybe check out some online information about the local basketball scene. The sophisticated data consumers mostly work in my office. I know from my good morning chats at the local filling station cum junk food outlet that I am skewing the demographics with my generalization about Internet usage. Close enough for horse shoes as my grandfather used to say.
I think the idea of “big data” is interesting. We publish a curated blog called Inteltrax that covers some of the interesting companies in the data fusion market. But if you think interest in a $1.0 million enterprise search system appeals to a narrow readership, data fusion has the same magnetism. There are not any “people.” There are college graduates with mathematical expertise and an compelling need to process information. Here in Harrod’s Creek, the “people” are more likely to check email and then fire up the flat screen to watch hoops.
Maybe the observation about “people” is a variant of Potomac Fever; that is, those exposed to the craziness of power and money in Washington, DC, think that “everyone” has the same visceral reaction to political push ups. I once heard a person who worked in a think tank describe the firm’s discussions about client engagements as “drinking our own Kool-Aid.” Tastes great, but the Kool-Aid is not enjoyed with the same lip smacking elsewhere. When was the last time you guzzled pumpkin or red bean Kool-Aid?
My view:
- A useful service such as the one described in the write up looks a heck of a lot more magnetic than it may be. That’s the unsupported assertion about “people” when the reality is that a tiny percentage of savvy folks will get with the big data program as a Web service.
- The notion that “people” can manipulate big data and find a pot of gold at the end of the analytics rainbow is charming, but essentially incorrect. There are quiet a few considerations to evaluate in the big data game. A shortcut can save time but also put the rental car in the ditch.
- Big data are the norm in many online operations. What is helpful to me is to explain that a tiny percentage of those with big data know what to do to squeeze nuggets from the log files.
Quite a story for me: I thought it was one of those PR, promo, search engine optimization type write ups. I then realized it was a Kool-Aid break after a lunch break in Silicon Valley where there is no Internet bubble. Absolutely not.
Stephen E Arnold, March 25, 2011
Freebie
Microsoft Sends Google a Relay Valentine
February 20, 2011
Navigate to the Microsoft Web page “Why Microsoft”. The story that caught my attention was “Dear Google.” The guts of the valentine is the full text of a letter from an outfit in Paducah, Kentucky. The point of the post is to provide a case example of a Google customer’s experience with Google enterprise applications. According to the letter, the Google cloud service had some issues. The writer of the letter said:
I tried contacting you several times but each one was unsuccessful. Oh, the unrequited love! I can’t stay in a relationship if I can’t get through to you – my business needs are important too. I’ve done everything I could. We spent 18 months trying to make it work. You said you were free but in the end, the time and frustration I experienced made me see how costly you truly are. I just can’t keep pretending anymore.
Heart felt? Nasty? I don’t know. The customer support complaint is one I have heard before. The other problems? No clue. Interesting though. Kentucky. Who would have thought?
Stephen E Arnold, February 20, 2011
Freebie
Microsoft Has a Fan at Forbes
February 15, 2011
Our initial reaction to this write up was, public relations coup. The Wall Street Journal runs similar fluff on its online services as well. But the Forbes’ love fest with Microsoft warrants documenting.
Forbes’ “Why Microsoft Will Win the Small Business Cloud War” finds no fault with Redmond. The article lauds the company’s steady addition of key applications to the cloud, though products we’re used to, such as Word, Excel, and Outlook, aren’t scheduled to be there until later this year. Reasonable cost is another plus. Say Forbes, the real advantage Microsoft holds over its cloud competitors, however, lies in brand familiarity:
“[Microsoft’s] products are used and liked by millions of small business people around the world. We don’t want to change. We don’t want to learn new products to do the same things we’re already doing. We just want to do things quicker and better. As long as Microsoft makes it easy for us to adapt to the cloud we’ll go along with them.”
Fair enough. However, one key component was left out of this analysis: search was not mentioned. We want to know- how effectively will we be able to find what we’re looking for in Microsoft’s cloud? We wonder, “Will Forbes’ staff will be able to answer that question too?”
Cynthia Murrell February 15, 2011
Freebie
Microsoft Israel and the Cloud
February 11, 2011
Stories in The Globes are often wonderful, but they do go dead. Navigate to “Microsoft Israel to Recruit 100 for R&D”. The Globes reports that “Microsoft Israel president Moshe Lichtman is pumping up its work in cloud computing. The story said:
“The Microsoft Israel R&D center has 600 R&D employees. “Cloud computing is at the center of our vision. About 70% of the center’s development activity is focused on cloud computing,” Lichtman said. “This year, we will complete development of the first versions of 11 products, which will be launched on the global market.”
The Israeli branch also created a free security product that has seen millions of downloads and dominates a portion of the global free products market. The company has also developed technologies for Bing Mobile. Another star product is a feature that enables check-in via Facebook, Foursquare, and Messenger. Users can also receive messages when they’re nearing specific location, similar to a GPS. Like Google, Microsoft is expanding its footprint in Israel.
Whitney Grace, February 14, 2011
Freebie
Reading the Cloud
February 10, 2011
At the recent New England Database Summit held at MIT, a popular topic was the cloud revolution, and pundits efforts to paint a bright color on its grayish lining.
One speaker in particular, UMass Senior Researcher Emmanuel Cecchet, introduced a “system focused on dynamic provisioning of database resources in the cloud.” Named for the now noteworthy sheep, Dolly is database platform-agnostic and uses virtualization-based replication for efficiently spawning database replicas. The research, a joint venture between Cecchet, a colleague and two graduate students, identifies flaws in the way current databases engage cloud services. The group claims their creation will correct those issues; for example, by improving efficiency in the name of metered pricing.
Another area of interest in the cloud conversation covered at the conference was the increasing strain cloud computation places on databases. James Starkey, whose solution is an SQL based relational database to share the workload among varied clouds, is a former MySQL designer and founder of NimbusDB. Some interesting choices for new terms are tossed out there, all of which can be found in the linked presentation.
While versions from both presenters have been prepared for release, no date has been set, leaving the industry and users alike to speculate on the success of these endeavors. We’ve got the hype, now we just need the technology to back it up. Amazon is taking Oracle to the cloud. Salesforce is moving with Database.com. There is progress. Let’s hope that database Dolly is more robust than cloned Dolly.
Stephen E Arnold, February 10, 2011
Freebie
Synthesys Platform Beta Available
February 7, 2011
Digital Reasoning alerted us last week that a new beta program for the Synthesys Platform is available. Digital Reasoning has emerged as one of “the leader in complex, large scale unstructured data analytics.” The Synthesys platform is one of the “leaders in complex, large scale unstructured data analytics.” We have interviewed the founder of Digital Reasoning in our Search Wizards Speak series. These interviews are available on ArnoldIT.com’s Search Wizards Speak series here and here. Digital Reasoning is one of the leaders in making next-generation analytics available via the cloud, on premises, and hybrid methods.
© Digital Reasoning, 2011
This platform version of Digital Reasoning’s software will provide beta users immediate API-level access to the firm’s analytics software and access to tools that will be added through the beta program.
Matthew Russell, vice president of engineering at Digital Reasoning said:
We are excited to introduce Synthesys Platform to the market. By allowing users to upload their data into the cloud for analysis, many more users will get the opportunity to experience next generation data analytics while exploring their own data.
Digital Reasoning Systems (www.digitalreasoning.com) solves the problem of information overload by providing the tools people need to understand relationships between entities in vast amounts of unstructured and structured data.
Digital Reasoning builds data analytic solutions based on a distinctive mathematical approach to understanding natural language. The value of Digital Reasoning is not only the ability to leverage an organization’s existing knowledge base, but also to reveal critical hidden information and relationships that may not have been apparent during manual or other automated analytic efforts. Synthesys is a registered trademark of Digital Reasoning Systems, Inc.
Digital Reasoning will be exhibiting at the upcoming Strata Conference on February 28 and March 1, 2011. For more information about Digital Reasoning, navigate to the company’s Web site at www.digitalreasoning.com.
Stephen E Arnold, February 7, 2011
Reading Clouds for the Future of Databases
February 5, 2011
At the recent New England Database Summit held at MIT, a popular topic was the always controversial Cloud and the industry attempts to color its lining.
One speaker in particular, UMass Senior Researcher Emmanuel Cecchet, introduced a “system focused on dynamic provisioning of database resources in the cloud.” Named for the now noteworthy sheep, Dolly is database platform-agnostic and uses virtualization-based replication for efficiently spawning database replicas. The research, a joint venture between Cecchet, a colleague and two graduate students, identifies flaws in the way current databases engage cloud services. The group claims their creation will correct those issues e.g. by improving efficiency in the name of metered pricing.
Another area of interest in the cloud conversation covered at the conference was the increasing strain cloud computation places on databases. James Starkey, whose solution is an SQL based relational database to share the workload among varied clouds, is a former MySQL designer and founder of NimbusDB. Some interesting choices for new terms are tossed out there, all of which can be found in the linked presentation.
While versions from both presenters have been prepared for release, no date has been set, leaving the industry and users alike to speculate on the success of these endeavors. We’ve got the hype, now we just need the technology to back it up. We also want to see more information about search and retrieval. New cloud, old problems—only modest advancement.
Sarah Rogers, February 5, 2011
Freebie