Talend Updates Open Studio Applications

May 19, 2012

Talend’s Open Studio platform offers more business intelligence and big data with an enhancement: master data management. The H Open describes the updates found in the most recent version in “Talend Updates Data Tools to 5.1.0.”

Based on open source Eclipse, the Open Studio environment hosts Talend’s Data Integration, Big Data, Data Quality, Master Data Management, and Enterprise Service Bus (ESB). A user-friendly GUI allows users to define processes. The write up specifies that the updates give Open Studio:

“. . .enhanced XML mapping and support for XML documents in its SOAP, JMS, File and Mom components. A new component has also been added to help manage Kerberos security. Open Studio for Data Quality has been enhanced with new ways to apply an analysis on multiple files, and the ability to drill down through business rules to see the invalid, as well as valid, records selected by the rules.

“ESB and Open Studio for ESB appear to be the most revised of the products, with the release notes documenting improvements to the REST and SOAP services, an improved route builder, and improvements to the runtime system . . . . Open Studio for Master Data Management has seen enhancements in the development environment, with searching and filtering available as ways to view an entity, and in the web user interface with improvements in visual cues, easier image storage and resizable sliding panels.”

Talend ESB and Big Data are under the Apache 2.0 License. Open Studio for ESB, Data Integration, Data Quality, and MDM are under the GPLv2.

Talend is a leading open source vendor, providing middleware for both data management and application integration. The company was already a leader in open source data management when its 2010 acquisition of Sopera boosted its standing in the open source middleware market. The company takes pride in providing powerful and flexible open solutions for all sorts of organizations, great and small.

Cynthia Murrell, May 19, 2012

MapReduce: A Summary

May 19, 2012

Want to know about MapReduce? Here you go:

Remember. Think batch processing.

Stephen E Arnold, May 19, 2012

MarkLogic: The Door Revolves

May 17, 2012

MarkLogic hit $55 or $60 million. Not good enough. Exit one CEO; enter an Autodesk exec. Hit $100 million. Not good enough. Enter a new CEO. Navigate to “Former senior Oracle exec Gary Bloom named CEO of Mark Logic.” The new CEO is either going to grow the outfit or get it sold if I understand the write up. Here’s a passage which caught my attention:

Gary Bloom has been named CEO of Mark Logic, which returns him to his database roots.

According to MarkLogic’s Web page, the company is:

an enterprise software company powering over 500 of the world’s most critical Big Data Applications with the first operational database technology capable of handling any data, at any volume, in any structure.

However, I can download a search road map. Hmmm. I thought search was dead. Well big data search is where the action is. MarkLogic is pushing forward with its XML data management system.

Stephen E Arnold, May 17, 2012

IBM Asserts Its i Technology Can Handle XML

May 9, 2012

IBM asserts that DB2 can do big data, including XML in IBM Systems Magazine’s “i Can Use XML in a Relational World.” Blogger and IBM employee Nick Lawrence writes:

“In this most recent round of announcements, IBM has included support for the XMLTABLE table function in SQL. XMLTABLE is designed to convert an XML document into a relational result set (rows and columns) using popular XPath expressions. This function has been referred to as the Swiss army knife for working with XML because it can help solve a wide variety of XML related problems.”

Lawrence recommends a good XML TABLE tutorial, located in the SQL XML Reference in IBM’s Info Center. He also identifies and elaborates upon areas that he says could use some more clarification. For example, a way to create an XML response document that involves creating the document “inside out.” I guess that’s a technical term?

It’s a helpful piece if that’s the route you want to travel. However, it involves lots of code, lots of fiddling. A bit like mining asteroids we think.

Our question: Why not use a NoSQL data management system? After all, big data is what those do best.

Cynthia Murrell, May 9, 2012

Oracle and SAP: The Milagro Database War

May 3, 2012

I received an email inducing me to read “Hana and Exalytics: SAP’s Hype Versus Oracle’s FUD.” The write up takes a serious or at least semi serious at Milagro database war. If you are not familiar with the Milagro Beanfield War, you might find the write up a loose allegory of what’s happening in traditional data management companies and the NoSQL farmers.

The Information Week write up does not talk about the real story, however. What we get is two giants of traditional enterprise software squabbling over which traditional data management system is most likely to keep the Fortune 1000, government agencies, and big educational institutions within the traditional enterprise software corral.

With regard to Oracle, the write up asserts:

Oracle’s Larry Ellison and Safra Catz have missed few opportunities to discredit Hana in recent months. But executive VP Thomas Kurian took the slams a level deeper on Friday with a one-hour Webinar clearly intended to sow seeds of fear, uncertainty and doubt in the minds of would-be Hana customers. The session was billed as an Exalytics seminar, but each point set up a contrast with Hana. Kurian claimed, among other things, that SAP’s product costs five times to 50 times more than Exalytics and that it doesn’t support SQL (relational) or MDX (multidimensional) query languages, requiring apps to be rewritten to run on the new database.

The Information Week write up reports:

SAP’s hype about these apps is getting a little ahead of deployed market reality. Both Hana and Oracle Exalytics can point to dramatic before-and-after differences in query speeds. (Even SAP grants that Exalytics can accelerate queries.) SAP says the real payoff from Hana will be in transforming business processes, not just accelerating queries. But we haven’t seen enough solid, real-world customer examples documenting transformed business competitiveness.

Written by Stephen E. Arnold · Filed Under Business strategy, Database, Feature, Open source, Search, Technology | Comments Off on Oracle and SAP: The Milagro Database War

Datameer Has a New Analytics Toy

April 5, 2012

According to Marketwatch.com, Datameer, Inc, a provider of Apache built end user analytics solutions, announced the release Datameer 1.4 in “Datameer Releases a Major New Version of Analytics Platform. Datameer 1.4” improves functionality in data management, user and data security, and expanded support for data source adaptors, Hadoop, Cloudera, and IBM. We learned:

The new features in Datameer 1.4 demonstrate that Datameer is committed to delivering what customers want with an emphasis on quality and ease of use,” stated David Cornell, Software Development Manager at SophosLabs. “We are particularly excited to see support for partitioning which will dramatically enhance report generation performance.

Datameer 1.4 was released to meet the growing demands of the company’s clients. As the only Apache Hadoop analytics solution, Datameer builds solutions to aid businesses in linear scalability and cost-effectiveness to analyze/, integrate, and visualize structured and unstructured data. Datameer is a company that relies on open source software and is working hard to make a name for themselves in the business world.

The hook for this new release may be performance. Speed, more than fancy analytics, is becoming more important.

Whitney Grace, April 5, 2012

Sponsored by Pandia.com

Written by Stephen E. Arnold · Filed Under Analytics, Database, News, Text analytics, Text processing | Comments Off on Datameer Has a New Analytics Toy

Publishers Pose Threats to Text Mining Expansion

March 26, 2012

Text mining software is all the rage these days due to its ability to make significant connections by quickly scanning through thousands of documents. This software can recognize, extract and index scientific information from vast amounts of plain text, allowing computers to read and organize a body of knowledge that is expanding too fast for any human to keep up with. However, Nature.com recently reported on a some issues that have developed in this growing industry in the article “Trouble at the Text Mine.”

According to the article, text mining programmers Max Haeussler and Casey Bergman have run into trouble trying to get science publishers to agree to let them mine their content.

The article asserts:

Many publishers say that they will allow their subscribers to text-mine, subject to contract and the text-miners’ intentions, and point to a number of successful agreements. But like many early advocates of the technology, Haeussler and Bergman complain that publishers are failing to cope with requests, and so are holding up the progress of research. What is more, they point out, as text-mining expands, it will be impractical for individual academic teams to spend years each working out bilateral agreements with every publisher.

While some publishers are getting on board the text mining train, many are still trying to work out how to take advantage of the commercial value before signing on. Too bad it takes more than a degree in English to make text mining deliver useful results. Bummer.

Jasmine Ashton, March 26, 2012

Sponsored by Pandia.com

Written by Stephen E. Arnold · Filed Under Data mining, Database, News, Online (general), Publishing, Text analytics, Text processing | Comments Off on Publishers Pose Threats to Text Mining Expansion

Big Data, Small Talent Pool

March 24, 2012

It may be big data’s biggest issue; Government Computer News asks “Big Data’s Big Question: Where Are the Data Scientists?” Writer Rutrell Yasin explains:

“Even as organizations are trying to define the role of those tasked with analyzing and managing the new phenomenon of big data, people capable of that job are already projected to be in short supply.

“The move from a network-centric to a data-rich environment requires a different skill set, John Marshall, CTO of the Directorate of Intelligence J2 with the Joint Chiefs of Staff, said March 6 during a forum on big data. . . .

“A recent study reported that shortages of qualified workers who understand the power of big data is estimated to be between 140,000 and 190,000 people by 2018, Marshall said.”

Students are beginning to exit college with data analytics and data mining skills, but there may not be enough to fill the gap, especially in the public sector. There are professionals who have developed the required subject matter, math, and programming skills, but most of them are content to retain their lucrative jobs in Silicon Valley or New York.

The article does note that the broad term “data scientist” is akin to “doctor,” in that there are specialists within the field. Michael Lazar, a former intelligence community member who is now a senior solutions architect with VMware, recommends that public sector organizations internally train their people to meet their unique data analysis and management needs.

Though the article focuses on government organizations, it is a relevant read for anyone interested in big data. Also, it suggests a potentially lucrative field for young people looking to build a career in a difficult economy.

Stephen E. Arnold, March 24, 2012

Sponsored by Pandia.com

Written by Stephen E. Arnold · Filed Under Analytics, Data mining, Database, News | Comments Off on Big Data, Small Talent Pool

More Data Concentration Ahead

March 18, 2012

TMCnet announces that “Smartphone Usage Eclipses ‘Dumbphone’ Usage, Fueling Unified Data Storage ‘Tipping Point.’” IceWEB, Inc., a provider of unified data storage appliances, came to this conclusion after reviewing this recent study from the Pew Research Center’s Internet & American Life Project which found that the majority of mobile phone users in the US are now smartphone users. This means a surge in demand for cloud-based unified storage. The write up quotes:

’With nearly half of all adult Americans using smartphones to capture and share billions of storage-heavy pictures and video, all that media takes up more and more storage in the cloud,’ said Steven Toole, Chief Marketing Officer at IceWEB. ‘Unstructured data such as photos and video lends itself to IceWEB’s unified data storage appliances, where data centers hosting smartphone users’ media can easily and more cost effectively manage and scale as these trends continue.’

Unified storage is a harbinger of consolidation, which is good for search and for eDiscovery. It is easier to dig through fewer bins.

IceWEB boasts that it can provide quality, enterprise-level unified data storage solutions at hefty savings over the competition. They declare that their unified storage arrays save storage costs, space, and power. The company is headquartered in Washington, DC.

Cynthia Murrell, March 18, 2012

Sponsored by Pandia.com

Written by Stephen E. Arnold · Filed Under Business strategy, Database, Mobile, News, Technology | Comments Off on More Data Concentration Ahead

Monty Program Releases Version of MariaDB

March 18, 2012

Attention, NoSQL fans.

Developers at Monty Program believe they’ve finally got the formula for their MariaDB project on the right track. In the article “MariaDB 5.3.5 Delivers Faster Subqueries” we get a better idea of its functional capabilities.

MariaDB 5.3.5 is the first stable release of the touted maria DB 5.3 relational database series. Developers focused on improved performance (of course) as well as improving querying capabilities and functionality. The developers now feel that the new query optimizer is ready for more widespread production uses.

They have finally made the realm of subqueries using the Maria software usable. Users can utilize semi-join subqueries to run IN subqueries using the join optimizer to select one of five execution strategies. A subquery map shows which queries and optimizations are being utilized in the different versions of the Maria software.

One core optimization, the Table Pullout, can replace sub-queries with a join where appropriate. If the sub-query is not a semi-join, MariaDB 5.3 falls back to other methods including extracting the results of the subquery into a temporary table, or the older IN-TO-EXISTS optimization, the only one to be carried forward to MariaDB 5.3. There is also a subquery cache to reduce the number of times already optimized subqueries are re-executed.

It’s a definite step in the right direction as far as data management is concerned. By being able to map your queries and create subqueries for more relevant material, your able to maximize the potential for your software and production capabilities. Good show.

Stephen E Arnold, March 18, 2012

Sponsored by Pandia.com

Written by Stephen E. Arnold · Filed Under Database, News, Text processing | Comments Off on Monty Program Releases Version of MariaDB

« Previous Page — Next Page »

Search the site
Subscribe to Beyond Search
Feature archive
News archive

Stephen E. Arnold monitors search, content processing, text mining and related topics from his high-tech nerve center in rural Kentucky. He tries to winnow the goose feathers from the giblets. He works with colleagues worldwide to make this Web log useful to those who want to go "beyond search". Contact him at sa [at] arnoldit.com. His Web site with additional information about search is arnoldit.com.

Categories
- 3D-Printing
- Acquisition
- Advertising
- Aggregation
- AI
- Alexa
- algorithms
- Amazon
- Amazonia
- Analytics
- Appliance
- Applications
- Audio
- Augmented Reality
- Big data
- Bing
- Bitcoin
- Bitext
- Book review
- Business intelligence
- Business process
- Business strategy
- Censorship
- Cloud computing
- Company Profile
- Conferences
- Connectors
- Consulting
- Consumer
- Content processing
- Copyright
- Corporate Concerns
- Cost
- Crawl
- Crowdfunding
- cryptocurrency
- Customer support
- Cyber OSINT
- cybercrime
- cybersecurity
- Dark Web
- DarkCyber
- Data
- Data mining
- Database
- Deepfakes
- Digital Assistant
- Digital Library
- E2EE
- ECommerce
- EDiscovery
- Editorial opinion
- Education
- Emoticons
- Enterprise
- Enterprise search
- Entity extraction
- Ethics
- Facebook
- Faceted search
- Factualities
- Feature
- Federated search
- Financial
- Fogint
- Google
- Governance
- Government
- Hackers
- healthcare
- IBM Watson
- Image search
- Indexing
- Infrastructure
- Innovation
- Integration
- intelware
- Interface
- Internet
- Interview
- Investment
- law enforcement
- Legal matters
- Library automation
- Management
- Marketing
- Mathematics
- Metadata
- Microsoft
- Mobile
- Natural language processing
- News
- NGIA
- Online (general)
- Open Access
- Open source
- OSINT
- Osint Radar
- Overflight
- Palantir
- Patents
- Personnel
- Podcast
- Policeware
- Portals
- Predictive coding
- Privacy
- Profile
- Publishing
- Quotation
- Real time search
- Reference tool
- Rich media
- Robot Writer
- Search
- Search enabled applications
- search engine
- Search quality
- Security
- Semantic
- Sentiment analysis
- SEO
- SharePoint
- Short Honks
- Smart Technology
- Social
- Social Media
- software
- Statistics
- Taxonomy
- Technology
- Text analytics
- Text processing
- Tools
- Tor
- Training
- Translation
- Twitter
- Uncategorized
- Unstructured Data
- User experience
- User Interface
- Vertical search
- Video
- visualization
- Voice search
- Voice technology
- Web 3
- Web Services
- Webinar
- Windows
- Work flow
- XML
- Yahoo

Beyond Search

Talend Updates Open Studio Applications

MapReduce: A Summary

MarkLogic: The Door Revolves

IBM Asserts Its i Technology Can Handle XML

Oracle and SAP: The Milagro Database War

Datameer Has a New Analytics Toy

Publishers Pose Threats to Text Mining Expansion

Big Data, Small Talent Pool

More Data Concentration Ahead

Monty Program Releases Version of MariaDB

Search the site

Categories

Archives

Recent Posts

Meta

Share this:

Share this:

Share this:

Share this:

Share this:

Share this:

Share this:

Share this:

Share this:

Share this:

Search the site

Categories

Archives

Recent Posts

Meta