Greenplum, Big Data, and an Open Source Card

February 13, 2011

Wichita Business Journal has a write-up called “EMC Greenplum Introduces Free Community Edition of ‘Big Data’ Tools for Developers and Data Scientists.” EMC Corporation is one of the world’s leaders in information infrastructure solutions and it is releasing a free edition of the EMC Greenplum Database. This database product offers massively parallel processing, analytic algorithms, and data mining tools. This was announced at the 2011 O’Reilly Strata Conference. The key point was:

“Building on earlier Greenplum “Big Data” breakthroughs, like the EMC Greenplum Data Computing Appliance, the new EMC Greenplum Community Edition removes the cost barrier to entry for big data power tools empowering large numbers of developers, data scientists, and other data professionals. This free set of tools enables the community to not only better understand their data, gain deeper insights and better visualize insights, but to also contribute and participate in the development of next-generation tools and solutions.”

EMC Corporation is geared towards first-time users and current Greenplum customers. First-time users will benefit from business analytics environment and experimenting with its tools. Current customers can easily update their older versions. Our opinion, EMC and “free.” We are curious about commercial companies jumping on the “community” bandwagon.

Whitney Grace, February 12, 2011

Reading the Cloud

February 10, 2011

At the recent New England Database Summit held at MIT, a popular topic was the cloud revolution, and pundits efforts to paint a bright color on its grayish lining.

One speaker in particular, UMass Senior Researcher Emmanuel Cecchet, introduced a “system focused on dynamic provisioning of database resources in the cloud.”  Named for the now noteworthy sheep, Dolly is database platform-agnostic and uses virtualization-based replication for efficiently spawning database replicas.  The research, a joint venture between Cecchet, a colleague and two graduate students, identifies flaws in the way current databases engage cloud services.  The group claims their creation will correct those issues; for example, by improving efficiency in the name of metered pricing.

Another area of interest in the cloud conversation covered at the conference was the increasing strain cloud computation places on databases.  James Starkey, whose solution is an SQL based relational database to share the workload among varied clouds, is a former MySQL designer and founder of NimbusDB.  Some interesting choices for new terms are tossed out there, all of which can be found in the linked presentation.

While versions from both presenters have been prepared for release, no date has been set, leaving the industry and users alike to speculate on the success of these endeavors.  We’ve got the hype, now we just need the technology to back it up. Amazon is taking Oracle to the cloud. Salesforce is moving with Database.com. There is progress. Let’s hope that database Dolly is more robust than cloned Dolly.

Stephen E Arnold, February 10, 2011

Freebie

Nexeo Embraces PostgreSQL

February 10, 2011

Nuxeo, a software manufacturer specializing in Enterprise Content Management, recently expressed criticism for one particular open source object-relational database system and in the process praised another.

Per the aptly titled post “Why Avoid MySQL?”, the case is made against MySQL in a joint effort by Nuxeo’s Founder and head of R&D.  This is how the post opens: “Nuxeo can work with many databases (PostgreSQL, MySQL, Oracle, Microsoft SQL Server, and others could be added). But MySQL should be avoided if at all possible, because it has major deficiencies that Nuxeo cannot really work around.”  What follows is a bullet list of fourteen points reinforcing that claim.  The list cites dropped connections, row size limits as well as poor full text configuration, among other issues.  They end saying, “All these problems lead us to recommend not using MySQL in production, and using PostgreSQL instead which is a much nicer database engine.”

A quick jump to the PostgreSQL site will provide ample information on the product for any reader to contrast.  A few differences I found include MySQL’s row size, which peaks at 64 KB, whereas PostgreSQL’s extends to 1.6 TB.  The former’s triggers fail to activate by cascading key actions.  The latter’s on the other hand can be written in C and loaded as a library providing flexibility in pushing capabilities.  The information page also includes links to testimonials and a list of awards spanning over a decade.

Our take: Is Oracle’s approach to MySQL giving some folks an added incentive to look at PostgreSQL?

Stephen E Arnold, February 10, 2011

Freebie

Suggest.io Database

February 7, 2011

Here’s an interesting new idea: self-learning databases. Suggest.io is designed to track information and feedback from visitors to your Web site.

suggestio

By creating a free account, Suggest.Io will make a database that will track search content on your website. This database will then make suggestions, much like Google, when something is typed into the search box.

“With Suggest.io you can find out more about your visitors. For example, you may figure out what your visitors are more interested in by means of our powerful statistic tool, that allows you to spot the top rated search request…”

This sounds like a good add-in for any website and it allows you to be more like Google. Related information suggestions in search boxes are a handy tool to have. The graphic for the service may catch attention and give others a visual jolt.

Whitney Grace, February 7, 2011

Freebie

Reading Clouds for the Future of Databases

February 5, 2011

At the recent New England Database Summit held at MIT, a popular topic was the always controversial Cloud and the industry attempts to color its lining.

One speaker in particular, UMass Senior Researcher Emmanuel Cecchet, introduced a “system focused on dynamic provisioning of database resources in the cloud.”  Named for the now noteworthy sheep, Dolly is database platform-agnostic and uses virtualization-based replication for efficiently spawning database replicas.  The research, a joint venture between Cecchet, a colleague and two graduate students, identifies flaws in the way current databases engage cloud services.  The group claims their creation will correct those issues e.g. by improving efficiency in the name of metered pricing.

Another area of interest in the cloud conversation covered at the conference was the increasing strain cloud computation places on databases.  James Starkey, whose solution is an SQL based relational database to share the workload among varied clouds, is a former MySQL designer and founder of NimbusDB.  Some interesting choices for new terms are tossed out there, all of which can be found in the linked presentation.

While versions from both presenters have been prepared for release, no date has been set, leaving the industry and users alike to speculate on the success of these endeavors.  We’ve got the hype, now we just need the technology to back it up. We also want to see more information about search and retrieval. New cloud, old problems—only modest advancement.

Sarah Rogers, February 5, 2011

Freebie

US Census Counts with Endeca

February 4, 2011

Endeca as hit the metaphorical nail on the head with one of its latest endeavors. The US Census Bureau is now using Endeca Technologies business intelligence software, Endeca Latitude, to launch its new American FactFinder. We learned in “US Census Bureau launches New American FactFinder on Endeca”:

American FactFinder makes more than 250 billion decennial census facts available and navigable to the average American, civil servants and skilled statisticians alike.”

After a preliminary rollout with American FactFinder, The Bureau challenged itself to redo the site to provide easier access to no professional or expert users. Endeca allows users to search within specified taxonomy to receive relevant search results as well as how to access the results.

I had the chance to play with American FactFinder and I must say, go, Census! Get that 2010 data into the system. Lots of changes since Year 2000. Point-and-click, canned PDF reports, and, oh, 2010, data. Did I mentioned that? Year 2000 data.

Leslie Radcliff, February 4, 2011

Freebie

Amazon and Its Cloudy Metrics

February 4, 2011

As computing based on shared resources (with the goal of channeling high performance calculation capabilities into consumer based applications) continues to gain popularity, curiosity over long range profitability and short term pest control grows increasingly more aggressive.  Since 2002 with its development of cloud based services including storage, Amazon has remained an important player.

Amazon Web Services have released figures to Data Center Knowledge showing the number of “objects” their S3 service holds more than doubled over the last year—262 billion. The same entry goes on to state the request rate has exceeded two hundred thousand per second.  Comparable growth has been observed concerning the launching of virtual servers through the Elastic Compute Cloud (EC2).

As recently as 2009 it seemed Amazon had little interest in cultivating a partner program, content to provide the infrastructure and allow others to develop applications.  However as the cloud universe expands and Amazon remains at its center, the relationships which were inevitable given the physics of the new cosmos seem to be forged with a whimper rather than a bang.  While details are far from obscured, at times it seems one has a better chance of catching sight of a passing comet.

Our view is that it would be more meaningful to report revenues and profit/loss. I can take a single email and decompose it into lots of objects. Without a definition of substance, what’s an object? What’s 262 billion mean.

We would like to see more emphasis placed on search; for example, easy filtering of results for  certain tags such as “best selling” or “available”. Just our narrow Harrod’s Creek view sparked by the Amazon Oracle offer. How will one count Oracle metrics: data size, queries per second, index size, fairy dust, or money? We vote for money, not obfuscation.

Sarah Rogers, February 4, 2011

XML Carnage

January 31, 2011

We noted “Learning from our Mistakes: The Failure of OpenID, AtomPub, and XML on the Web.” What caught our attention was this steemtn:

So next time you’re evaluating a technology that is being much hyped by the web development blogosphere, take a look to see whether the fundamental assumptions that led to the creation of the technology actually generalize to your use case. An example that comes to mind that developers should consider doing with this sort of evaluation given the blogosphere hype is NoSQL.

The article points out that the enthusiasm for OpenID, AtomPub, and XML for “the Web” has cooled. What looks like the next big think, I concluded, may not be.

What are the implications for search and content processing vendors?

For those who don’t know what the three technologies are or do, the answer is, “Not much.” Many vendors handle security, intakes, and formats via connectors. I wrote a for fee column about the importance of connectors, filters, and code widgets that make one outfit’s proprietary or tricky file formats easily tappable / importable by anothre vendor’s system. I know that you have been following the i2 Ltd. and Palantir legal hassle closely. If you haven’t, you can get some color in the stories in www.inteltrax.com and my for fee columns.

But, if you are a vendor who has a big investment in one or more of these technologies, the loss of “enthusiasm”—if the source article is accurate—could mean higher costs. Here’s why:

  1. The marketing positioning and collateral will have to be adapted. Probably not a big deal in the pre-crash days, but now this is a cost and it can be a time sink. Not good when pressure for sales goes up each day. One vendor told me, “We’re really heads down.” No kidding. I don’t think it is work; I think it is survival. A marketing distraction is not a positive.
  2. Credibility with some customers may be eroded. If you beat a drum for one or more of these three technologies, the client assumes that everyone likes the rhythm. Articles that suggest three “next big things” are really three day old brook trout may beg for air freshener.
  3. Partners who often just buy the software vendors’ pitches have invested. Now those investments may not have the type of value one associates with certifiation from Microsoft or the sheer staying power of a wild and crazy push by IBM or Oracle. If partners bail out, recovery can be difficult in some markets.

Worth reading the article and thinking about its implications for search and content processing vendors. Might not ruffle your features; could tear off a wing.

Stephen E Arnold, January 31, 2011

Freebie

Inexpensive Oracle Utility

January 22, 2011

We actually found an inexpensive Oracle export utility. You can download a free trial version here at Downseeker. The full version costs $49. This program might be worth a look if you need to export database query results to text files. Proceed with caution, though: the site prompts a strong Warning from Google’s Web of Trust, despite its guarantee of a “100% Clean” application. Down-loader beware.

Here is the description:

“Export Query to Text for Oracle Standard 1.06.42 is regarded as a convenient as well as simple to use tool which lets you export database query results to text files. This tool supports all modern versions of Oracle Server. Oracle Client and ODBC driver required.”

The utility runs on Windows 2000 and above.

Cynthia Murrell January 22, 2011

Self-Service Business Intelligence: McDonaldization of Data

January 20, 2011

Drive into McDonald’s. Hear a recorded message about a special. Issue order. Get Big Mac and lots of questionable commercial food output. Leave.

Yep, business intelligence and “I’m loving it.”

No one pays much attention to the food production system and even less to what happens between the cow’s visit to the feedlot and the All American meal.

Ah, self-service. Convenience. Speed. Ease of use. Yes!

These days we pump our own gas (minus Oregon and NJ) and pour our own soft drinks.  We Google instead of asking the reference librarian (usually).

Is this the future of Business Intelligence?  “Soon Self-Service BI, SaaS to Dominate the Tech World” predicts that 2011 will bring an increase in self-service BI.  According to SiliconIndia News: “Numerous vendors, including IBM, SAP, Information Builders, Tibco Software, QlikTech, and Tableau Software, already offer [self-service BI]  tools, and adoption will accelerate as more companies try to deliver BI capabilities to nontechnical users, business analysts, and others.”  The question becomes: Will the users know what the outputs mean?  The SaaS part of the prediction I heartily agree with, in BI and everywhere else.

What if the data are dirty? Malformed? Selected to present a particular view of the Big Mac world? How about that user experience?

Alice Wasielewski, January  20, 2011

« Previous PageNext Page »

  • Archives

  • Recent Posts

  • Meta