SAP: Lemons from Lemonade for Search Vendors

January 18, 2012

A couple of years ago I did a series of columns about SAP, the German software company which is imbued with the DNA of IBM and the more unpredictable genes of the “let ‘er rip” approach to generating revenues. Change is difficult, and SAP interests to me because the firm’s machinations are the embodiment of the dislocations that old style software vendors face in the cloudy world of Amazon, Google, and even old Big Blue herself, IBM. Keep in mind, one of SAP’s strategic moves was to purchase Sybase.

HANA emerged two years ago as a solution to the woes of organizations struggling with big data, the need to make sense of them, and the complexity which threatens to sink traditional enterprise applications. Consider SAP itself. The company owns Business Objects, once the leader in business analytics. Today I don’t think of Business Objects, which may say more about my awareness than SAP’s marketing. But I hear zero about Inxight Software which performs entity extraction and other text operations and I have heard little or nothing about TREX, SAP’s information retrieval system. I lost track of the SAP investment in Endeca long before SAP’s rival Oracle snagged the 1998 technology to “enhance” its own struggling search solutions.

What is HANA?

According to an SAP friendly blog, SAP describes HANA in this way:

HANA is the foundation and the core of all that we do now and going forward for existing products, new products and entirely new frontiers. We are transforming enterprise software with HANA, and we are transforming our entire product portfolio,” Sikka said in a statement earlier this week announcing that SAP HANA is now generally available worldwide. “But HANA is more than a product,” Sikka continued. “It is a new paradigm, an entirely new way to build applications. It is the basis for our own intellectual renewal internally at SAP—where we rethink how we design, build, deploy, service and sell products—and the basis for our customers’ and partners’ intellectual renewal—where we help customers rethink existing business problems and help them solve entirely new challenges using design-thinking.” (Source: The Top 10 Reasons SAP HANA Is Disrupting Larry Ellison’s Grand Plans]

To me, HANA is a next generation database and it now has to differentiate itself from the XML next generation database from the likes of MarkLogic, from Cloudera, from other NoSQL solutions, and from the new and improved versions of data management systems from IBM, Microsoft, and even Amazon. Big job. Maybe an impossible job?

In December 2011, I snipped the write up “Can SAP be the #2 database vendor by 2015?” I found this passage particularly interesting:

Why doesn’t SAP HANA have deeper market penetration? Put simply it is because SAP wanted it this way. Whilst HANA truly is a general-purpose database, SAP first announced it as an analytics appliance for the 1.0 release. They also priced it really high and didn’t’ offer a discount – list pricing can be as high as €180,000 for a 64GB HANA “unit”, depending on which version you require. And what’s more, SAP sells solutions and HANA is a platform, so the global sales force doesn’t quite know how to sell it in volume – yet. They didn’t want to sell it in volume in any case because they wanted to introduce it slowly to market – building stability, references along the way and avoiding expensive and embarrassing global escalations. So by the end of 2011 we should expect $100-150m of HANA sales, which is 3-5% of SAP’s total revenue. Not particularly significant, right? Well in September they released HANA as being supported for SAP’s Business Warehouse software, which allows large-scale data warehouses. And this is where it gets interesting: there are 17,000 existing BW customers, and HANA would provide business benefit to all of them.

If you are interested in HANA, you can access SAP’s primer about the solution at this link.

In the midst of the HANA hype, Seeking Alpha’s “SAP Is No Longer The Leader It Once Was” stated in December 2011:

The current most promising innovation is SAP HANA, an appliance with columnar in-memory technology enabling fast processing and near real-time analytics. According to SAP, HANA has the potential to become the next-generation system architecture, removing the use of middleware and relational databases. However, the root causes of the downturn appear outside the perimeter of the company transformations: product development, continuous customer complaints, and the 20-year aging ERP that represents the core of the customer base seem to remain unchanged. Agile is probably not enough to address the long-term issues of product development. Most likely, Agile is not the solution to fifteen years of trying to get CRM right, or to making three platform mistakes in three on-demand initiatives (CRM on-demand in 2006, Business byDesign in 2007, and SaaS Enterprise in 2009).

The Seeking Alpha analysis then makes these machine gun like statements:

Is SAP getting it right? Here is a summary of the points to keep in mind to answer this question:

SAP R&D has yet to deliver its first truly successful product since 1992 (it could be HANA overtime)

The core of ERP that holds the customer base is outdated

There seem to be no plans to develop a modern replacement product

Development of a potential new ERP would take years

Sales have declined stepping back by 3 to 4.5 years

SAP’s leadership is questionable

According to Gartner, the revenue from relicensing R/3 to ERP 6.0 is ending

Customers and employees have lost trust

Executives have been leaving

On-demand is not making progress

The customer base is increasingly at risk

Analysts estimate that HANA could produce just 10% of the revenue by 2013.

There is a gap between the buzz and the hard facts.

What does this mean for vendors who hitch their wagons to the SAP “star” as ISYS Search Software did with the announcement “ISYS Wins Software Deal with SAP”? Three points:

  1. Search vendors are looking at their technology and packaging it in ways to generate incremental revenue. ISYS, it appears, is in the connector game, competing with firms such as EntropySoft
  2. SAP seems to be lagging further and further behind the NoSQL players who are now facing headwinds despite early market leads. My example is MarkLogic, the XML database outfit
  3. The broader market seems to be splitting into quite different segments. SAP is going to have difficulty in the IBM and Oracle space, and it is going to have trouble with the open source NoSQL crowd which seems to prefer having Hadoop on its T shirts than HANA.

SAP remains interesting, but it is now in some danger of further marginalization. SAP needs a search system still.

Stephen E Arnold, January 18, 2012

Sponsored by Pandia.com

 

 

Mr. MapR: A Xoogler

January 8, 2012

Wired Enterprise gives us a glimpse into MapR, a new distribution for Apache Hadoop, in “Ex-Google Man Sells Search Genius to Rest of World.” The ex-Googler in this case is M.C. Srivas, who was so impressed with Google’s MapReduce platform that he decided to spread its concepts to the outside world. Writer Cade Metz explains,

In the summer of 2009, [Srivas] left the company to found a startup that takes the ideas behind Google’s top-secret infrastructure and delivers them to the average business. The company is called MapR, after Google’s MapReduce, and like so many other companies, Srivas and crew are selling a product based on Hadoop, an open source incarnation of Google’s GFS and MapReduce platforms.

Srivas had the chance to get in on the ground floor of Cloudera, but he was unhappy with that project’s emphasis on support, services, and software add-ons. Instead, he wanted to directly address the core problems with the Hadoop platform. Shortly thereafter, MapR was born.

The article details some of the Hadoop hitches that MapR is addressing. We admire the drive to get to the root of the problems, rather than surrender to the temptation of shortcuts.

Cynthia Murrell, January 8, 2012

Sponsored by Pandia.com

Digital Reasoning Connects with TeraDact

January 4, 2012

Big data analytics specialist Digital Reasoning has been a regular topic of discussion here at Beyond Search, most recently for achieving series B funding for a big data intelligence push.

Now, we would like to share an exciting new development in the quest to solve the big data problem in the news release “Digital Reasoning and TeraDact Partner to Automatically Remove Sensitive Information from Big Data.”

According to the article, TeraDact Solutions, a software tools and data integration solutions provider, has integrated their TeraDactor Information Identification and Presentation capabilities with Synthesys Cloud, a software-as-a-service data analytics solution.

The news story states:

In conjunction with Synthesys, TeraDactor can automatically assist in appropriately classifying information not recognized by the original data provider. TeraDactor allows participants to push and pull information without waiting for the declassification process, assuring that formerly classified documents may be released without unintended leakages.

The innovative technology that TeraDact Solutions brings to Digital Reasoning’s table demonstrates the power of Synthesys as a cloud-based data analytics tool in building the next generation of Big Data analytic solutions. Kudos to the surging Digital Reasoning organization.

Jasmine Ashton, January 4, 2012

Sponsored by Pandia.com

Is the Union of Oracle and Endeca Merit the Hype?

January 3, 2012

The question of what to do with unstructured and semi-structured data rises to the surface again and again. Oracles’ answer to the almost mystical question is to add Endeca MDEX into the soup pot. The article, Oracle: Endeca Inside?, on Ovum’s website gives a lengthy explanation of how the new combo of Oracle and Endeca MDEX will take the world by storm – well, the unstructured and semi-structured world at least.

While the assertions in the article are quite impressive we’d like to see some proof of their existence. Much is said about how Oracle really doesn’t need Endeca but will benefit from it nonetheless. The article justifies the addition of Endeca MDEX by saying,

MDEX adds an important capability to Oracle’s analytic portfolio. It allows Oracle to target business users and provide data exploration and lightweight analysis on semi-structured data. However, it also goes against Oracle’s overarching strategy of maintaining one database for almost all kinds of analytic workloads. Over the medium term, Ovum therefore welcomes MDEX being absorbed into core Oracle technologies and offered as a piece of its larger ‘engineered systems’ strategy.

Although reluctant we do agree with the overall concept that Ovum introduces about how MDEX’s absorption into the Oracle machine is probably not a bad idea, we still wrestle with the bigger question: Is performance an issue which can be resolved with Oracle hardware?

Our view is that Ovum may have only part of the puzzle in hand. Oracle and Endeca may not be a two dimensional set up.

Catherine Lamsfuss, January 3, 2012

Sponsored by Pandia.com

Predictions on Big Data Miss the Real Big Trend

December 18, 2011

Athena the goddess of wisdom does not spend much time in Harrod’s Creek, Kentucky. I don’t think she’s ever visited. However, I know that she is not hanging out at some of the “real journalists’” haunts. I zipped through “Big Data in 2012: Five Predictions”. These are lists which are often assembled over a lunch time chat or a meeting with quite a few editorial issues on the agenda. At year’s end, the prediction lunch was a popular activity when I worked in New York City, which is different in mental zip from rural Kentucky.

The write up churns through some ideas that are evident when one skims blog posts or looks at the conference programs for “big data.” For example—are you sitting down?—the write up asserts: “Increased understanding of and demand for visualization.” There you go. I don’t know about you, but when I sit in on “intelligence” briefings in the government or business environment, I have been enjoying the sticky tarts of visualization for years. Nah, decades. Now visualization is a trend? Helpful, right?

Let me identify one trend which is, in my opinion, an actual big deal. Navigate to “The Maximal Information Coefficient.” You will see a link and a good summary of a statistical method which allows a person to process “big data” in order to determine if there are gems within. More important, the potential gems pop out of a list of correlations. Why is this important? Without MIC methods, the only way to “know” what may be useful within big data was to run the process. If you remember guys like Kolmogorov, the “we have to do it because it is already as small as it can be” issue is an annoying time consumer. To access the original paper, you will need to go to the AAAS and pay money.

The abstract for “Detecting Novel Associates in Large Data Sets by David N. Reshef1,2,3,*,†, Yakir A. Reshef, Hilary K. Finucane, Sharon R. Grossman, Gilean McVean, Peter Turnbaugh, Eric S. Lander, Michael Mitzenmacher, Pardis C. Sabet, Science, December 16, 2011 is:

Identifying interesting relationships between pairs of variables in large data sets is increasingly important. Here, we present a measure of dependence for two-variable relationships: the maximal information coefficient (MIC). MIC captures a wide range of associations both functional and not, and for functional relationships provides a score that roughly equals the coefficient of determination (R^2) of the data relative to the regression function. MIC belongs to a larger class of maximal information-based nonparametric exploration (MINE) statistics for identifying and classifying relationships. We apply MIC and MINE to data sets in global health, gene expression, major-league baseball, and the human gut microbiota and identify known and novel relationships.

Stating a very interesting although admittedly complex numerical recipe in a simple way is difficult, I think this paragraph from “The Maximal Information Coefficient”  does a very good job:

The authors [Reshef et al] go on showing that that the MIC (which is based on “gridding” the correlation space at different resolutions, finding the grid partitioning with the largest mutual information at each resolution, normalizing the mutual information values, and choosing the maximum value among all considered resolutions as the MIC) fulfills this requirement, and works well when applied to several real world datasets. There is a MINE Website with more information and code on this algorithm, and a blog entry by Michael Mitzenmacher which might also link to more information on the paper in the future.

Another take on the MIC innovation appears in “Maximal Information Coefficient Teases Out Multiple Vast Data Sets”. Worth reading as well.

Forbes will definitely catch up with this trend in a few years. For now, methods such as MIC point the way to making “big data” a more practical part of decision making. Yep, a trend. Why? There’s a lot of talk about “big data” but most organizations lack the expertise and the computational know how to perform meaningful analyses. Similar methods are available from Digital Reasoning and the Google love child Recorded Future. Palantir is more into the make pictures world of analytics. For me, MIC and related methods are not just a trend; they are the harbinger of processes which make big data useful, not a public relations, marketing, or PowerPoint chunk of baloney. Honk.

Stephen E Arnold, December 18, 2011

Sponsored by Pandia.com, a company located where high school graduates actually can do math.

Azure Chip Gartner Quadrantizes Archiving

December 17, 2011

Records management has been shown some love this month by information technology research and advisory company Gartner, Inc.

According to the Dec 8, Marketwire news release “Sonian Positioned in 2011 Magic Quadrant for Enterprise Information Archiving,” Sonian, a cloud powered archiving and search solutions company, announced that it has been positioned in the December 2011 Magic Quadrant for Enterprise Information Archiving.

Over 8,000 customers currently utilize Sonian’s cloud-powered information archiving platform which can be deployed in minutes. It enables organizations to address eDiscovery needs, achieve regulatory compliance, and reduce IT costs.

The Magic Quadrant report, which positions vendors based on their ability to execute and completeness of vision, noted:

The challenges organizations are facing with respect to the management of email data are increasingly being seen with file system data. Archiving products that can address both email and files generally provide efficiencies across these content types, versus taking a siloed approach to management. While cloud archiving can be very cost-effective, the prevailing sentiment to simply give the problem of managing this data to someone else seems to be one of the most common reasons organizations cite for selecting cloud or SaaS archiving.

The fact that Gartner is cheerleading information archiving is an excellent sign. With just a few more Corzine and Madoff incidents and archiving will be a hot topic in corporate boardrooms.

Jasmine Ashton, December 17, 2011

Sponsored by Pandia.com

Big Data a Bane of Small Businesses

December 8, 2011

Is anyone really surprised? “Big Data Strains Small-Business Bandwidth,” announces InfoWorld. Apparently this is news to some folks. Since Thanksgiving, a time to celebrate unemployed English majors and failed azure chip search consultants, I have been involved is four separate meetings about big data. To be fair, each of these meetings talked about the perception of big data, not actually whipping around a couple of copies of the Internet or a year’s worth of Twitter and Facebook gold ore.

InfoWorld is pretty excited about big data. We learned from the write up that some folks thought storage would be the biggest hurdle small businesses would face when wrangling large amounts of data. Not so, reports writer Matt Prigge. The article asserts:

Storage vendors seem to be doing a great job staying on top of the demand for ever larger data densities and software to allow you to make more efficient use of it (think dedupe and intelligent thin provisioning). But for the most part, you can’t say the same about the telcos and ISPs providing the wide area networks we’re using to acquire and share that data.

The problem is worst in rural areas, where expensive solutions like DS3, SONET, ATM, and Metro-Ethernet are simply not available. Many businesses turn to the cloud, but that won’t work for companies with certain conditions, like highly graphical work. Besides, you have to be very confident in your Internet service provider to rely on hosting services. The solution? Some companies just have to pack up and move (back) to the big city.

Yes, everything works well when there is unlimited bandwidth, unlimited technical resources, and Talking about big data is different from processing in an operational unlimited infrastructure. The real world is different from the Ivory Tower, however. Three observations:mode real time flows of content from social systems, mobile phone usage reports, etc. But talk is cheap and easy. Big data is neither.

  1. Big data usually skips over the issue of latency. There are different definitions of real time in indexing big data. Defining terms is a useful first step.
  2. Most of the big data chatter is marketing. You, gentle reader, should know what marketing means: sizzle, not sirloin.

Cynthia Murrell, December 8, 2011

Sponsored by Pandia.com

DataExplorers and Why Financial Information Vendors Fear a Storm

December 4, 2011

I am still amused that my team predicted the management shift at Thomson Reuters weeks before the news broke. Alas, that 250 page analysis of the Thomson Reuters’ $13 billion a year operation is not public. Shame. However, one can  get a sense of the weakening timbers in the publishing and information frigate in the Telegraph’s story “DataExplorers Looks for £300m Buyer.”

DataExplorers is a specialist research company. The firm gathers information about the alleged lending of thousands of institutional funds. I am not familiar with the names of these exotic financial beasties. The aggregated data are subjected to the normal razzle dazzle of the aggregation for big money crowd. The data are collected, normalized, and analyzed. The idea is that an MBA looking to snag an island can use the information to make a better deal. Not surprisingly, the market for these types of information is small, only a fraction of those in the financial services industry focus on this sector.

DataExplorer’s revenues reflect this concentration. According to the write up, the company generated less than £15 million in annual revenues in 2010 with a profit of about £3 million. The margin illustrates what can be accomplished with a niche market, tight cost controls, and managers from outfits like Thomson Reuters. That troubled outfit contributed the management team at DataExplorers.

Now here’s the hook?

The company is for sale, according to the Telegraph which is a “real” journalistic outfit, for £300 million. That works out to a number that makes sense in the wild and crazy world of financial information; that is, 100 times earnings or 20 times revenue. The flaw, which I probably should not peg to just Thomson Reuters, has these facets:

  1. The global financial “challenge” means that there may be some pruning of information services in the financial world. Stated another way, MBAs will be fired and their employers may buy less of expensive services such as DataExplorers
  2. If the financial crisis widens, the appeal of “short” information may lose a bit of its shine. Once a market tanks, what’s the incentive for those brutalized by the sectors’ collapse to stick around
  3. Thomson Reuters is pretty good at cost cutting. Innovating is not part of the usual package. This means that DataExplorers may be at the peak of its form and sea worthy for a one day cruise in good weather, and once a deal goes down, the new owners may have a tough time growing the business because marketing and research will require infusions of capital to keep the vessel from listing.

Net net: DataExplorers is an example of an information property which may be tough to get back into growth mode. The buyer will be confident that it knows how to squeeze more performance from a niche information product. And that assumption is what contributes to the woes of Thomson Reuters, Reed Elsevier, and many other high end professional content producers. Optimism is a great quality. Realism is too.

Stephen E Arnold, December 4, 2011

Sponsored by Pandia.com

France: Getting Open-Sourcey, Mais Oui

December 2, 2011

The H announces, “French government tenders for open source support.” This is an interesting shift; there are very high quality commercial software companies in France. We learned from the write up:

“The authorities are looking for a three-year support contract, worth two million euros and covering two-thirds of the country’s twenty-two ministries as well as the Court of Audit. According to Le Monde Informatique, this will include departments ranging from the Office of the Prime Minister and the Ministry of Justice and Freedom to the Ministry of Sports and Ministry of Culture and Communication.”

The list of software to be supported is extensive, including infrastructure, operating systems, desktop applications, and development environments. The ones that peaked our interest are the enterprise applications like  Lucene, Alfresco and Nuxeo and databases such as PostgreSQL and MySQL.

It will be interesting to see who the French government selects to cover all these open source bases.

Cynthia Murrell, December 2, 2011

Sponsored by Pandia.com

CouchDB Losing Ground

November 28, 2011

We understand that NoSQL, XML databases, and non relational structures are the cat’s pajamas. Oh, that was a 1920s’ reference. Sorry. We are not with it.

Has this once hot NoSQL database lost its luster? My NoSQL asks, “What Happened to CouchDB’s Popularity?

CouchDB from Apache is a document database server which subscribes to the NoSQL principle of schema-free existence. It is query-able and index-able, and is written in Erlang, a concurrency-oriented language.

Why is CouchDB’s acclaim slipping? The article asserts:

Ultimately when you enter the eco system and start digging, it is hard to figure out exactly what ‘CouchDB’ is, where to grab binaries and drivers for your platform. . . .You can figure it all out with some reading and digging, but you have to persist. It’s not like Mongo, you don’t just head to the official site, grab the official binary and install the official driver.

Ease of use  does make a big difference. The blog also suggests there is confusion in the NoSQL community over the name CouchDB  versus Couchbase, as well as CouchDB-related BigCouch and IrisCouch.

Perhaps it’s time for Apache to rearrange the furniture? And what about AtomicPR’s big radioactive plume for XM?. I will sit on the couch and think about fallout.

Cynthia Murrell, November 28, 2011

Sponsored by Pandia.com

« Previous PageNext Page »

  • Archives

  • Recent Posts

  • Meta