Inventive Graduate Student Builds Breakthrough Database

April 30, 2013

For some folks, deadlines can lead to innovation. One graduate student’s efforts to speed up his research has resulted in the inspired, high-speed parallel database MapD, we learn from DataInformed‘s encouraging piece, “Fast Database Emerges from MIT Class, GPUs and Student’s Invention.” Todd Mostak’s in-a-pinch breakthrough could soon help others in business as well as academia.

The informative article contains too many specifics to cover here, but I suggest checking it out. It should be fascinating reading for anyone interested in data management. I personally think the use of graphics processors designed for gaming is a stroke of genius. Or maybe desperation (the two can be closely related). Reporter Ian B. Murphy tells us:

“While taking a class on databases at MIT, Mostak built a new parallel database, called MapD, that allows him to crunch complex spatial and GIS data in milliseconds, using off-the-shelf gaming graphical processing units (GPU) like a rack of mini supercomputers. Mostak reports performance gains upwards of 70 times faster than CPU-based systems. . . .

“‘I had the realization that this had the potential to be majorly disruptive,’ Mostak said. ‘There have been all these little research pieces about this algorithm or that algorithm on the GPU, but I thought, “Somebody needs to make an end-to-end system.” I was shocked that it really hadn’t been done.'”

Well, sometimes it takes someone from outside a field to see what seems obvious in retrospect. Mostak’s undergraduate experience was in economics, anthropology, and math, and he was in Harvard’s Middle Eastern Studies program when he was compelled to develop MapD. A database class at MITgave him the knowledge he needed to build this tool, which he created to help with the tweet-heavy, Arab Spring-related thesis he was working on.

MIT’s Computer Science and Artificial Intelligence Lab has now snapped up the innovator. Though some questioned hiring someone with such a lean computer-science education, Lab director Sam Madden knows that Mostak’s unconventional background only means he has a unique point of view. The nascent computer scientist has already shown he has the talent to make it in this field.

Though Mostak says he still has work ahead to perfect his system, he does plan to share MapD as an open source project in the near future. Is he concerned about opening his work to the public? Nope; he states:

“If worse comes to worst, and somebody steals the idea, or nobody likes it, then I have a million other things I want to do too, in my head. I don’t think you can be scared. Life is too short.”

That it is. I suspect we will be hearing more from this creative thinker in the years to come.

Cynthia Murrell, April 30, 2013

Sponsored by ArnoldIT.com, developer of Augmentext

Hadoop in Demand Yet Lacks Trained Professionals

April 24, 2013

Hadoop has been in the headlines lately for its major changes and how it is being integrated into more organizations. PR NewsWire takes a look at the open source database platform and what it predicts will happen for the company in the future in, “Global Hadoop Market 2012-2016-Lack Of Trained Professionals To Be A Major Challenge.” The article examines a recent TechNavio report that analyzes the Global Hadoop Market 2012-2016. TechNavio predicts Hadoop will grow at a CAGR 55.63%, mainly due to rise in big data analytics and the company offering Hadoop-as-a-service. While technology and service wise Hadoop is doing well, it faces a deficit in trained professionals who can do the work.

TechNavio said:

“’The demand for cost-effective Hadoop-based big data solutions is driving this market. Organizations understand the importance of big data solutions, but installing and hiring new professionals to deploy them is a costly affair. As a result, organizations and decision makers are adopting Hadoop-as-a-service (HDaaS) solutions that provide cost-effective big data management and analytics. HDaaS solutions offer the necessary hardware, software, and services required to support big data management at low subscription fees.’”

What seemed to be a straight shoot, Hadoop is facing a problem that might limit its growth and development. HaaS does take care of part of the problem, but someone still has to work with the software. Will Hadoop innovation shift to where the proficient professionals are? We think it is a strong possibility.

Whitney Grace, April 24, 2013

Sponsored by ArnoldIT.com, developer of Beyond Search

DataStax Hosts Big Data Days

April 19, 2013

DataStax is a leader in NoSQL database solutions, particularly based on Cassandra. They have made recent headlines as DataStax, and others like them, are slowly chipping away at the historically overwhelming market share of Oracle. Now they are making headlines for hosting some upcoming professional development opportunities. Read more in the article, “DataStax Announces Big Data Days by the Bays — Hosts Cassandra Summit in San Francisco and Sponsors Bloomberg Next Big Thing Summit in Half Moon Bay.”

The article begins:

“DataStax, the company that powers the big data apps that transform business, today announced two major events taking place in the Bay Area during June, the Cassandra Summit 2013 and the Bloomberg Next Big Thing Summit. Big data is today’s defining technology trend, transforming industries ranging from retail and finance to media and health care. As a leading big data platform provider, DataStax is hosting the Cassandra Summit and sponsoring the Bloomberg Summit.”

But in addition to DataStax, many value-added open source leaders offer great customer service and training opportunities. LucidWorks is another known for setting the industry standard for development support as well as customer support and training.

Emily Rae Aldridge, April 19, 2013

Sponsored by ArnoldIT.com, developer of Beyond Search

Oracle Sees Cuts as the Hands of Startup Rivals

April 18, 2013

Oracle is a company that has made a name for its self in information storage, primarily databases, and ranks third in the country as a software makers behind only Microsoft and IBM. But the tables may be turning for Oracle. Read how in the article, “Oracle Is Bleeding At The Hands Of Database Rivals.”

The article sums up the issue:

“Something is seriously wrong in Larry Land. Oracle does not command absolute control like it once did. You can see this clearly with the earnings the company posted last week and the growth that startups like Datastax are witnessing as more customers seek alternative databases for online applications.”

Startups are indeed taking a chunk out of the proprietary vendor market. Not only is this a trend in the world of content storage and management, but also in terms of enterprise search. SharePoint is the solution that developers and users are least excited about. Instead, talk turns to the up and coming open source initiatives that are more scalable, efficient, intuitive, and cost-effective. Take LucidWorks for instance. Not only does it provide open source based enterprise search on par with any proprietary solution, but it boasts award winning support and training and the power of Apache Lucene/Solr. Most companies are seeing open source value-added software as a no-brainer solution to their information needs.

Emily Rae Aldridge, April 18, 2013

Sponsored by ArnoldIT.com, developer of Beyond Search

Free Oracle XSQL Book Available for Download

April 12, 2013

We want to let you in on the chance to download a free book from the entity ebookquotessui at Taiwanese media site Pixnet. This Oracle XSQL book is older, published in 2003, but full of information that has not expired. Hey, the price is right! The description reads:

“Discover how to combine the power of SQL, XML, and XSLT to publish dynamic Web content using XSQL. XSQL isn’t just some razzle-dazzle technology. It allows you to easily leverage the most robust, mature, and usable technologies in the industry: SQL, HTML, HTTP, XML, Java, and the Oracle RDBMS. With an exciting first look at XSQL, this innovative book shows you how to bring all of these powerful technologies together in order to publish dynamic Web content. You’ll first find a comprehensive discussion of how XSQL relates to each of these technologies. Then you’ll learn how you can use XSQL to present your database data on the Web instantly. The numerous code examples will show you how to develop a complete application with just XSQL and XSLT.”

It goes on to promise a solid approach to building Web applications and services from Oracle database data. Tips on building custom action handlers are included, as is a section on using serializers to produce images and PDF files. A companion website provides all of the code examples used by the author.

Cynthia Murrell, April 12, 2013

Sponsored by ArnoldIT.com, developer of Augmentext

Newest Version of MongoDB Includes Text Search

April 6, 2013

Some welcome enhancements to MongoDB are included in the open-source data base’s latest release, we learn from “MongoDB 2.4 Can Now Search Text,” posted at the H Open. The ability to search text indexes has been one of the most requested features, and the indexing supports 14 languages (or no language at all.) The write-up supplies this handy link to a discussion of techniques for creating and searching text indexes.

The post describes a second feature of MongoDB 2.4, the hashed index and sharding:

“Hash-based sharding allows data and CPU load to be spread well between distributed database nodes in a simple to implement way. The developers recommend it for cases of randomly accessed documents or unpredictable access patterns. New Geospatial indexes with support for GeoJSON and spherical geometry allow for 2dsphere indexing; this, in turn, offers better spherical queries and can store points, lines and polygons.”

There is also a new modular authentication system, though its availability is limited so far. The project has also: added support for fixed sized arrays in documents; optimized counting performance in the execution engine; and added a working set size analyzer. See the article for more details, or see the release notes, which include upgrade instructions. The newest version can be downloaded here.

Cynthia Murrell, April 06, 2013

Sponsored by ArnoldIT.com, developer of Augmentext

MongoDB Upgrades to Enterprise

March 28, 2013

MongoDB is a go-to in the NoSQL database realm. The product has steadily gained more and more followers for its ability to house large amounts of data across several computer servers. The company behind Mongo, 10gen, is upping the game and appealing to the broader (and harder to please) enterprise crowd. Read the full details in the Wired article, “NoSQL Database MongoDB Reaches Beyond Software Coders.”

The articles states:

“But the company that develops Mongo — 10gen — is hoping to reach beyond the developers and into big businesses. On Tuesday, with this in mind, the company unveiled the ‘enterprise edition’ of the database that’s specifically designed for use in the business world. The version of the database includes a few tools you won’t find in the open source code. It’s an approach known as ‘open core’ — building proprietary features on an open source foundation.”

The open core model is a successful one. In fact, part of Mongo’s latest news is their partnership with LucidWorks, a company that pioneered the open core model. LucidWorks specializes in enterprise search through Apache Lucene and Solr. When the storage power of MongoDB’s NoSQL meets the search and discovery function of LucidWorks, enterprises are sure to find a winning combination.

Emily Rae Aldridge, March 28, 2013

Sponsored by ArnoldIT.com, developer of Beyond Search

Dataset Management for Revelytix Loom and Cloudera Navigator

March 27, 2013

A surprising article from DBMS 2 (DataBase Management System Services) about Dataset management includes an explanation of the new term, dataset. It was created for Revelytix, a big data software company, seems to have had trouble with the older term for what they do: metadata management. This term is problematic because it could refer to several types of data. Dataset management describes both Revelytix and the recently released Cloudera Navigator. The author asserts,

“My idea for the term dataset is to connote more grandeur than would be implied by the term “table”, but less than one might assume for a whole “database”. I.e.:

A dataset contains all the information about something. This makes it a bigger deal than a mere table, which could be meaningless outside the context of a database.

But the totality of information in a “dataset” could be less comprehensive than what we’d expect in a whole “database”.”

Mid-tier consultants may try to use the new problem as a revenue lever. Products to look to are Cloudera Navigator, which is from a leading Hadoop company and starts with auditing, and Revelytix Loom, which already does lineage in addition to auditing and is the main product of a company that does metadata management.

Chelsea Kerwin, March 27, 2013

Sponsored by ArnoldIT.com, developer of Augmentext

DataStax Bumps Up NoSQL with New Enterprise Edition

January 28, 2013

DataStax has increased the power of its NoSQL database by adding the latest Hadoop data muncher and Solr search. The Register covers all the new features in its article, “DataStax Cranks Up Facebook NoSQL to 3.0 with Enterprise Features.”

The article begins:

“DataStax, the company that was founded to take the Cassandra NoSQL data store created by Facebook commercial and therefore usable by mere enterprise data centers, is keeping to its cadence and is rolling up a new release of its DataStax Enterprise Edition. The company has also put out an update to its Community Edition, which is available for free and which does not include some of the proprietary integrations between the Cassandra data store and the Hadoop big data muncher and the Solr search engine that have been tweaked to run atop of Cassandra.”

Open source is the foundation of Cassandra and the enhancements to DataStax Enterprise 3.0 are also due to open source technology. Hadoop and Solr are both part of the Apache Foundation open source community. Solr is known as the best open source search option, serving as the foundation for many commercial search systems. One award-winner known for its Solr and Lucene foundation is LucidWorks. LucidWorks builds Big Data and enterprise search solutions on the strength of these trusted open source essentials.

Emily Rae Aldridge, January 28, 2013

Sponsored by ArnoldIT.com, developer of Augmentext

Elsevier Moves More Quickly than Its Competitors

January 24, 2013

Well, quite a surprise for a giant, traditional, print-centric outfit. Elsevier may be gearing up to put a head lock on ProQuest (high dollar databases) and Ebsco (databases with stunning names). Both of these competitors, along with outfits like Ovid are trying to adapt to a world in which libraries have to decide between paying the electric bills and licensing six and seven figure online databases. Can these companies continue to grow and generate profits.

Some say, “Yes.” Others say, “Not a chance.”

I am indifferent to the plight of this market sector. Isn’t everything a modern person requires available on a free Web system? If not, do today’s researchers care? With made up results and marketing taking the place of thinking, I am okay with the Google type system. The students whom I know are even more fond of Google than I am.

However, Elsevier may be hip to the new direction in revenue direction. I read “Elsevier Acquires Knovel, Provider of Web-based Productivity Application for the Engineering Community.” Knovel had funding from what I think of as venture sherpas. The K2 task was to develop a different type of electronic information which served specific market needs and warranted real dough. Think thousands for a “content object,” not $0.50 an abstract.

Elsevier, modestly described as “a world-leading provider of scientific, technical and medical information products and services,” acquired Knovel. No misspelling. Just some cute word play which may be lost on some recent college graduates.

Knovel is an electronic publisher which recycles high value content and adds value. Here’s how Knovel describes its “live PDF” innovation:

Knovel is the leading online technical reference resource for 3 reasons. First, Knovel locates more potentially relevant answers in a collection. Second, Knovel is better at quickly narrowing the potential answers to those most relevant to your search. Third, Knovel has interactive tables and graphs to help engineers use and export relevant data, making Knovel so much more than just e-books. (See http://why.knovel.com/company/about-knovel.html)

The key point is number three. The PDF instance allows the reader to plug in data and get an output. Think a baby version of Mathematica for civil engineers and others of their ilk. The idea is that an engineering text can be interacted with.

Elsevier has fired blanks in the electronic publishing sector for years. I won’t wander through the history of Elsevier, but I would like to give the company a happy quack for what looks like a reasonably good move.

How will Elsevier’s competitors respond? Raising prices is one option. Talking about doing bold actions may be another. At some point, fresh thinking and reaching for new opportunities will be necessary. Otherwise, there will be one or two commercial database outfits, one or two database aggregators, and one or two sources of professional content. Once that consolidation takes place, the revenues will be under severe pressure from folks who get advertisers to foot the bill for online information.

In short, nice move Elsevier. Time for the ProQuests and the Ebsco to do something which pumps up the top line in a meaningful and returns a healthy profit to the companies’ stakeholders. Even rich people want a return.

Stephen E Arnold, January 24, 2013

« Previous PageNext Page »

  • Archives

  • Recent Posts

  • Meta