Talend Updates Open Studio Applications
May 19, 2012
Talend’s Open Studio platform offers more business intelligence and big data with an enhancement: master data management. The H Open describes the updates found in the most recent version in “Talend Updates Data Tools to 5.1.0.”
Based on open source Eclipse, the Open Studio environment hosts Talend’s Data Integration, Big Data, Data Quality, Master Data Management, and Enterprise Service Bus (ESB). A user-friendly GUI allows users to define processes. The write up specifies that the updates give Open Studio:
“. . .enhanced XML mapping and support for XML documents in its SOAP, JMS, File and Mom components. A new component has also been added to help manage Kerberos security. Open Studio for Data Quality has been enhanced with new ways to apply an analysis on multiple files, and the ability to drill down through business rules to see the invalid, as well as valid, records selected by the rules.
“ESB and Open Studio for ESB appear to be the most revised of the products, with the release notes documenting improvements to the REST and SOAP services, an improved route builder, and improvements to the runtime system . . . . Open Studio for Master Data Management has seen enhancements in the development environment, with searching and filtering available as ways to view an entity, and in the web user interface with improvements in visual cues, easier image storage and resizable sliding panels.”
Talend ESB and Big Data are under the Apache 2.0 License. Open Studio for ESB, Data Integration, Data Quality, and MDM are under the GPLv2.
Talend is a leading open source vendor, providing middleware for both data management and application integration. The company was already a leader in open source data management when its 2010 acquisition of Sopera boosted its standing in the open source middleware market. The company takes pride in providing powerful and flexible open solutions for all sorts of organizations, great and small.
Cynthia Murrell, May 19, 2012
Sponsored by PolySpot
MapReduce: A Summary
May 19, 2012
Want to know about MapReduce? Here you go:
Remember. Think batch processing.
Stephen E Arnold, May 19, 2012
Sponsored by IKANOW
MarkLogic: The Door Revolves
May 17, 2012
MarkLogic hit $55 or $60 million. Not good enough. Exit one CEO; enter an Autodesk exec. Hit $100 million. Not good enough. Enter a new CEO. Navigate to “Former senior Oracle exec Gary Bloom named CEO of Mark Logic.” The new CEO is either going to grow the outfit or get it sold if I understand the write up. Here’s a passage which caught my attention:
Gary Bloom has been named CEO of Mark Logic, which returns him to his database roots.
According to MarkLogic’s Web page, the company is:
an enterprise software company powering over 500 of the world’s most critical Big Data Applications with the first operational database technology capable of handling any data, at any volume, in any structure.
However, I can download a search road map. Hmmm. I thought search was dead. Well big data search is where the action is. MarkLogic is pushing forward with its XML data management system.
Stephen E Arnold, May 17, 2012
Sponsored by HighGainBlog
IBM Asserts Its i Technology Can Handle XML
May 9, 2012
IBM asserts that DB2 can do big data, including XML in IBM Systems Magazine’s “i Can Use XML in a Relational World.” Blogger and IBM employee Nick Lawrence writes:
“In this most recent round of announcements, IBM has included support for the XMLTABLE table function in SQL. XMLTABLE is designed to convert an XML document into a relational result set (rows and columns) using popular XPath expressions. This function has been referred to as the Swiss army knife for working with XML because it can help solve a wide variety of XML related problems.”
Lawrence recommends a good XML TABLE tutorial, located in the SQL XML Reference in IBM’s Info Center. He also identifies and elaborates upon areas that he says could use some more clarification. For example, a way to create an XML response document that involves creating the document “inside out.” I guess that’s a technical term?
It’s a helpful piece if that’s the route you want to travel. However, it involves lots of code, lots of fiddling. A bit like mining asteroids we think.
Our question: Why not use a NoSQL data management system? After all, big data is what those do best.
Cynthia Murrell, May 9, 2012
Sponsored by PolySpot
Oracle and SAP: The Milagro Database War
May 3, 2012
I received an email inducing me to read “Hana and Exalytics: SAP’s Hype Versus Oracle’s FUD.” The write up takes a serious or at least semi serious at Milagro database war. If you are not familiar with the Milagro Beanfield War, you might find the write up a loose allegory of what’s happening in traditional data management companies and the NoSQL farmers.

The Information Week write up does not talk about the real story, however. What we get is two giants of traditional enterprise software squabbling over which traditional data management system is most likely to keep the Fortune 1000, government agencies, and big educational institutions within the traditional enterprise software corral.
With regard to Oracle, the write up asserts:
Oracle’s Larry Ellison and Safra Catz have missed few opportunities to discredit Hana in recent months. But executive VP Thomas Kurian took the slams a level deeper on Friday with a one-hour Webinar clearly intended to sow seeds of fear, uncertainty and doubt in the minds of would-be Hana customers. The session was billed as an Exalytics seminar, but each point set up a contrast with Hana. Kurian claimed, among other things, that SAP’s product costs five times to 50 times more than Exalytics and that it doesn’t support SQL (relational) or MDX (multidimensional) query languages, requiring apps to be rewritten to run on the new database.
The Information Week write up reports:
SAP’s hype about these apps is getting a little ahead of deployed market reality. Both Hana and Oracle Exalytics can point to dramatic before-and-after differences in query speeds. (Even SAP grants that Exalytics can accelerate queries.) SAP says the real payoff from Hana will be in transforming business processes, not just accelerating queries. But we haven’t seen enough solid, real-world customer examples documenting transformed business competitiveness.
Datameer Has a New Analytics Toy
April 5, 2012
According to Marketwatch.com, Datameer, Inc, a provider of Apache built end user analytics solutions, announced the release Datameer 1.4 in “Datameer Releases a Major New Version of Analytics Platform. Datameer 1.4” improves functionality in data management, user and data security, and expanded support for data source adaptors, Hadoop, Cloudera, and IBM. We learned:
The new features in Datameer 1.4 demonstrate that Datameer is committed to delivering what customers want with an emphasis on quality and ease of use,” stated David Cornell, Software Development Manager at SophosLabs. “We are particularly excited to see support for partitioning which will dramatically enhance report generation performance.
Datameer 1.4 was released to meet the growing demands of the company’s clients. As the only Apache Hadoop analytics solution, Datameer builds solutions to aid businesses in linear scalability and cost-effectiveness to analyze/, integrate, and visualize structured and unstructured data. Datameer is a company that relies on open source software and is working hard to make a name for themselves in the business world.
The hook for this new release may be performance. Speed, more than fancy analytics, is becoming more important.
Whitney Grace, April 5, 2012
Sponsored by Pandia.com
Publishers Pose Threats to Text Mining Expansion
March 26, 2012
Text mining software is all the rage these days due to its ability to make significant connections by quickly scanning through thousands of documents. This software can recognize, extract and index scientific information from vast amounts of plain text, allowing computers to read and organize a body of knowledge that is expanding too fast for any human to keep up with. However, Nature.com recently reported on a some issues that have developed in this growing industry in the article “Trouble at the Text Mine.”
According to the article, text mining programmers Max Haeussler and Casey Bergman have run into trouble trying to get science publishers to agree to let them mine their content.
The article asserts:
Many publishers say that they will allow their subscribers to text-mine, subject to contract and the text-miners’ intentions, and point to a number of successful agreements. But like many early advocates of the technology, Haeussler and Bergman complain that publishers are failing to cope with requests, and so are holding up the progress of research. What is more, they point out, as text-mining expands, it will be impractical for individual academic teams to spend years each working out bilateral agreements with every publisher.
While some publishers are getting on board the text mining train, many are still trying to work out how to take advantage of the commercial value before signing on. Too bad it takes more than a degree in English to make text mining deliver useful results. Bummer.
Jasmine Ashton, March 26, 2012
Sponsored by Pandia.com
Big Data, Small Talent Pool
March 24, 2012
It may be big data’s biggest issue; Government Computer News asks “Big Data’s Big Question: Where Are the Data Scientists?” Writer Rutrell Yasin explains:
“Even as organizations are trying to define the role of those tasked with analyzing and managing the new phenomenon of big data, people capable of that job are already projected to be in short supply.
“The move from a network-centric to a data-rich environment requires a different skill set, John Marshall, CTO of the Directorate of Intelligence J2 with the Joint Chiefs of Staff, said March 6 during a forum on big data. . . .
“A recent study reported that shortages of qualified workers who understand the power of big data is estimated to be between 140,000 and 190,000 people by 2018, Marshall said.”
Students are beginning to exit college with data analytics and data mining skills, but there may not be enough to fill the gap, especially in the public sector. There are professionals who have developed the required subject matter, math, and programming skills, but most of them are content to retain their lucrative jobs in Silicon Valley or New York.
The article does note that the broad term “data scientist” is akin to “doctor,” in that there are specialists within the field. Michael Lazar, a former intelligence community member who is now a senior solutions architect with VMware, recommends that public sector organizations internally train their people to meet their unique data analysis and management needs.
Though the article focuses on government organizations, it is a relevant read for anyone interested in big data. Also, it suggests a potentially lucrative field for young people looking to build a career in a difficult economy.
Stephen E. Arnold, March 24, 2012
Sponsored by Pandia.com
More Data Concentration Ahead
March 18, 2012
TMCnet announces that “Smartphone Usage Eclipses ‘Dumbphone’ Usage, Fueling Unified Data Storage ‘Tipping Point.’” IceWEB, Inc., a provider of unified data storage appliances, came to this conclusion after reviewing this recent study from the Pew Research Center’s Internet & American Life Project which found that the majority of mobile phone users in the US are now smartphone users. This means a surge in demand for cloud-based unified storage. The write up quotes:
’With nearly half of all adult Americans using smartphones to capture and share billions of storage-heavy pictures and video, all that media takes up more and more storage in the cloud,’ said Steven Toole, Chief Marketing Officer at IceWEB. ‘Unstructured data such as photos and video lends itself to IceWEB’s unified data storage appliances, where data centers hosting smartphone users’ media can easily and more cost effectively manage and scale as these trends continue.’
Unified storage is a harbinger of consolidation, which is good for search and for eDiscovery. It is easier to dig through fewer bins.
IceWEB boasts that it can provide quality, enterprise-level unified data storage solutions at hefty savings over the competition. They declare that their unified storage arrays save storage costs, space, and power. The company is headquartered in Washington, DC.
Cynthia Murrell, March 18, 2012
Sponsored by Pandia.com
Monty Program Releases Version of MariaDB
March 18, 2012
Attention, NoSQL fans.
Developers at Monty Program believe they’ve finally got the formula for their MariaDB project on the right track. In the article “MariaDB 5.3.5 Delivers Faster Subqueries” we get a better idea of its functional capabilities.
MariaDB 5.3.5 is the first stable release of the touted maria DB 5.3 relational database series. Developers focused on improved performance (of course) as well as improving querying capabilities and functionality. The developers now feel that the new query optimizer is ready for more widespread production uses.
They have finally made the realm of subqueries using the Maria software usable. Users can utilize semi-join subqueries to run IN subqueries using the join optimizer to select one of five execution strategies. A subquery map shows which queries and optimizations are being utilized in the different versions of the Maria software.
One core optimization, the Table Pullout, can replace sub-queries with a join where appropriate. If the sub-query is not a semi-join, MariaDB 5.3 falls back to other methods including extracting the results of the subquery into a temporary table, or the older IN-TO-EXISTS optimization, the only one to be carried forward to MariaDB 5.3. There is also a subquery cache to reduce the number of times already optimized subqueries are re-executed.
It’s a definite step in the right direction as far as data management is concerned. By being able to map your queries and create subqueries for more relevant material, your able to maximize the potential for your software and production capabilities. Good show.
Stephen E Arnold, March 18, 2012
Sponsored by Pandia.com

