Modus Operandi Gets a Big Data Storage Contract
March 24, 2015
The US Missile Defense Agency awarded Modus Operandi a huge government contract to develop an advanced data storage and retrieval system for the Ballistic Missile Defense System. Modus Operandi specializes in big data analytic solutions for national security and commercial organizations. Modus Operandi posted a press release on their Web site to share the news, “Modus Operandi Awarded Contract To Develop Advanced Data Storage And Retrieval System For The US Missile Defense Agency.”
The contract is a Phase I Small Business Innovation Research (SBIR), under which Modus Operandi will work on the DMDS Analytic Semantic System (BASS). The BASS will replace the old legacy system and update it to be compliant with social media communities, the Internet, and intelligence.
“ ‘There has been a lot of work in the areas of big data and analytics across many domains, and we can now apply some of those newer technologies and techniques to traditional legacy systems such as what the MDA is using,’ said Dr. Eric Little, vice president and chief scientist, Modus Operandi. ‘This approach will provide an unprecedented set of capabilities for the MDA’s data analysts to explore enormous simulation datasets and gain a dramatically better understanding of what the data actually means.’ ”
It is worrisome that the missile defense system is relying on an old legacy system, but at least it is being upgraded now. Modus Operandi also sales Cyber OSINT and they are applying this technology in an interesting way for the government.
Whitney Grace, March 24, 2015
Stephen E Arnold, Publisher of CyberOSINT at www.xenky.com
Data and Marketing Come Together for a Story
March 23, 2015
An article on the Marketing Experiments Blog titled Digital Analytics: How To Use Data To Tell Your Marketing Story explains the primacy of the story in the world of data. The conveyance of the story, the article claims, should be a collaboration between the marketer and the analyst, with both players working together to create an engaging and data-supported story. The article suggests breaking this story into several parts, similar to the plot points you might study in a creative writing class. Exposition, Rising Action, Climax, Denouement and Resolution. The article states,
“Nate [Silver] maintained throughout his speech that marketers need to be able to tell a story with data or it is useless. In order to use your data properly, you must know what the narrative should be…I see data reporting and interpretation as an art, very similar to storytelling. However, data analysts are too often siloed. We have to understand that no one writes in a bubble, and marketing teams should understand the value and perspective data can bring to a story.”
Silver, Founder and Editor in Chief of FiveThirtyEight.com is also quoted in the article from his talk at the Adobe Summit Digital Marketing Conference. He said, “Just because you can’t measure it, doesn’t mean it’s not important.” This is the back to the basics approach that companies need to consider.
Chelsea Kerwin, March 23, 2015
Stephen E Arnold, Publisher of CyberOSINT at www.xenky.com
Apache Samza Revamps Databases
March 19, 2015
Databases have advanced far beyond the basic relational databases. They need to be consistently managed and have real-time updates to keep them useful. The Apache Software Foundation developed the Apache Samza software to help maintain asynchronous stream processing network. Samza was made in conjunction with Apache Kafka.
If you are interested in learning how to use Apache Samza, the Confluent blog posted “Turning The Database Inside-Out With Apache Samza” by Martin Keppmann. Kleppmann recorded a seminar he gave at Strange Loop 2014 that explains his process for how it can improve many features on a database:
“This talk introduces Apache Samza, a distributed stream processing framework developed at LinkedIn. At first it looks like yet another tool for computing real-time analytics, but it’s more than that. Really it’s a surreptitious attempt to take the database architecture we know, and turn it inside out. At its core is a distributed, durable commit log, implemented by Apache Kafka. Layered on top are simple but powerful tools for joining streams and managing large amounts of data reliably.”
Learning new ways to improve database features and functionality always improve your skill set. Apache Software also forms the basis for many open source projects and startups. Martin Kleppman’s talk might give you a brand new idea or at least improve your database.
Whitney Grace, March 20, 2015
Stephen E Arnold, Publisher of CyberOSINT at www.xenky.com
DataStax Buys Graph-Database Startup Aurelius
February 20, 2015
DataStax has purchased open-source graph-database company, Aurelius, we learn in “DataStax Grabs Aurelius in Graph Database Acqui-Hire” at TechCrunch. Aurelius’ eight engineers will reportedly be working at DataStax, delving right into a scalable graph component for the company’s Cassandra-based Enterprise database. This acquisition, DataStax declares, makes theirs the only database platform with graph, analytics, search, and in-memory in one package. Writer Ron Miller tells us:
“DataStax is the commercial face of the open source Apache Cassandra database. Aurelius was the commercial face of the Titan graph database.
“Matt Pfeil, co-founder and chief customer officer at DataStax, says customers have been asking about graph database functionality for some time. Up until now customers have been forced to build their own on top of the DataStax offering.
“‘This was something that was on our radar. As we started to ramp up, it made sense from corporate [standpoint] to buy it instead of build it.’ He added that getting the graph-database engineering expertise was a bonus. ‘There’s not a ton of graph database experts [out there],’ he said.
“This expertise is especially important as two of the five major DataStax key use cases — fraud detection and recommendation engines — involve a graph database.”
Though details of the deal have not been released, see the write-up for some words on the fit between these two companies. Founded on an open-source model, Aurelius was doing just fine in its own. Co-founder Matthias Bröcheler is excited, though, about what his team can do at DataStax. Bröcheler did note that the graph database’s open-source version, Titan, will live on. Aurelius is located in Oakland, California, and was just launched in 2014.
Headquartered in San Mateo, California, DataStax was founded in 2010. Their Cassandra-based software implementations are flexible and scalable. Clients range from young startups to Fortune 100 companies, including such notables as eBay, Netflix and HealthCare Anytime.
Cynthia Murrell, February 20, 2015
Sponsored by ArnoldIT.com, developer of Augmentext
Apache Solr Search NoSQL Search Shines Solo
February 3, 2015
Apache Solr is an open source enterprise search engine that is used for relational databases and Hadoop. ZDNet’s article, “Why Apache Solr Search Is On The Rise And Why It’s Going Solo” explores why its lesser-known use as a NoSQL store might explode in 2015.
At the beginning of 2014, the most Solr deployments were using it in the old-fashioned way, but 2015 shows that fifty percent of the pipeline is now using it as a first class data store. Companies are upgrading their old file intranets for the enterprise cloud. They want the upgraded system to be searchable and they are relying on Solr to get the job done.
Search is more complex than basic NoSQL and needs something more robust to handle the new data streams. Solr adds the extra performance level, so users have access to their data and nothing is missing.
” ‘So when we talk about Solr, it’s all your data, all the time at scale. It’s not just a guess that we think is likely the right answer. ‘We’re going to go ahead and push this one forward’. We guarantee the quality of those results. In financial services and other areas where guarantees are important, that makes Solr attractive,’ [CEO Will Hayes of LucidWorks, Apache Solr’s commercial sponsor] said.”
It looks like anything is possible for LucidWorks in the coming year.
Whitney Grace, February 03, 2014
Sponsored by ArnoldIT.com, developer of Augmentext
Basho: A Comeback?
January 18, 2015
I read “NoSQL Pioneer Basho Scores $25M to Attempt a Comeback.” In 2012, Basho looked like a player. Then the company lost traction. The all-too-familiar “staff changes” kicked in. Now the company has gobbled another $25 million to the $32 million previously raised. My thought is that generating this much cash from a NoSQL system is going to be a task I would not undertake. I do have a profile of Basho when it was looking like a contender. I will hunt it down and post a version on the Xenky Vendor Profiles page. I will put an item in Beyond Search and provide the link to a free profile of the company in the next few days. Availability of the free report will be in Beyond Search.
Stephen E Arnold, January 18, 2015
On Commercial vs Open Source Databases
December 22, 2014
Perhaps we should not be surprised that MarkLogic’s Chet Hays urges caution before adopting an open-source data platform. His article, “Thoughts on How to Select Between COTS and Open Source” at Sys-Con Media can be interpreted as a defense of his database company’s proprietary approach. (For those unfamiliar with the acronym, COTS stands for commodity off-the-shelf.) Hayes urges buyers to look past initial cost and consider other factors in three areas: technical, cultural, and, yes, financial.
In the “technical” column, Hayes asserts that whether a certain solution will meet an organization’s needs is more complex than a simple side-by-side comparison of features would suggest; we are advised to check the fine print. “Cultural” refers here to taking workers’ skill sets into consideration. Companies usually do this with their developers, Hayes explains, but often overlook the needs of the folks in operational support, who might appreciate the more sophisticated tools built into a commercial product. (No mention is made of the middle ground, where we find third-party products designed that add such tools to Hadoop iterations.)
In his comments on financial impact, Hayes basically declares: It’s complicated. He writes:
“Organizations need to look at the financial picture from a total-cost perspective, looking at the acquisition and development costs all the way through the operations, maintenance and eventual retirement of the system. In terms of development, the organization should understand the costs associated with using a COTS provided tool vs. an Open Source tool.
“[…] In some cases, the COTS tool will provide a significant productivity increase and allow for a quicker time to market. There will be situations where the COTS tool is so cumbersome to install and maintain that an Open Source tool would be the right choice.
“The other area already alluded to is the cost for operations and maintenance over the lifecycle of project. Organizations should take into consideration existing IT investments to understand where previous investments can be leveraged and the cost incurred to leverage these systems. Organizations should ask whether the performance of one or the other allow for a reduced hardware and deployment footprint, which would lead to lower costs.”
These are all good points, and organizations should indeed do this research before choosing a solution. Whether the results point to an open-source solution or to a commercial option depends entirely upon the company or institution.
Cynthia Murrell, December 22, 2014
Sponsored by ArnoldIT.com, developer of Augmentext
Amazon and Oracle: The Love Affair Ends
November 14, 2014
I recall turning in a report about Amazon’s use of Oracle as its core database. The client, a bank type operation, was delighted that zippy Amazon had the common sense to use a name brand database. For the bank types, recognizable names used to be indicators of wise technological decisions.
I read “Amazon: DROP DATABASE Oracle; INSERT Our New Fast Cheap MySQL Clone.” Assume the write up is spot on, Amazon and Oracle have fallen out of love or at least beefy payments from Amazon for the sort of old Oracle data management system. This comment becomes quite interesting to me:
“This old-world relational database software is very expensive,” Jassy [Amazon tech VP] said. “They’re proprietary. There’s a high level of lock-in. And they’ve got punitive licensing terms, not just allowing very little flexibility in moving to the cloud the way customers want, but also in the auditing and fining of their customers.”
Several thoughts flitted through my mind as I kept one eye on the Philae gizmo:
- Amazon’s move, if it proves successful, may allow Mr. Bezos to mount a more serious attack on the enterprise market. Bad news for Oracle and possibly good news for those who want to save some Oracle bucks and trim the number of Oracle DBAs on the payroll
- Encourage outfits that offer enterprise cloud solutions. Will Amazon snap up some of the enterprise services and put the squeeze on Google and Microsoft?
- Trigger another round of database wars. Confusion and marketing hype often add a bit of spice to the Codd fest
- Cause concern among the commercial, proprietary NoSQL outfits. Think of MarkLogic and its ilk trying to respond to an Amazon package designed to make a 20 something developer jump up and down.
Interesting move by the digital WalMart.
Stephen E Arnold, November 14, 2014
Google and Images: What Does Remove Mean?
October 4, 2014
I read “After Legal Threat, Google Says It Removed ‘Tens of Thousands’ of iCloud Hack Pics.” On the surface, the story is straightforward. A giant company gets a ringy dingy from attorneys. The giant company takes action. Legal eagles return to their nests.
However, a question zipped through my mind:
What does remove mean?
If one navigates to a metasearch engine like Devilfinder.com, the user can run queries. A query often generates results with a hot link to the Google cache. Have other services constructed versions of the Google index to satisfy certain types of queries? Are their third parties that have content in Web mirrors? Is content removed from those versions of content? Does “remove” mean from the Fancy Dan pointers to content or from the actual Google or other data structure? (See my write ups in Google Version 2.0 and The Digital Gutenberg to get a glimpse of how certain content can be deconstructed and stored in various Google data structures.)
Does remove mean a sweep of Google Images? Again are the objects themselves purged or are the pointers deleted.
Then I wondered what happens if Google suffers a catastrophic failure. Will the data and content objects be restored by a back up. Are those back ups purged?
I learned in the write up:
The Hollywood Reporter on Thursday published a letter to Google from Hollywood lawyers representing “over a dozen” of the celebrity victims of last month’s leak of nude photos. The lawyers accused Google of failing to expeditiously remove the photos as it is required to do under the Digital Millennium Copyright Act. They also demanded that Google remove the images from Blogger and YouTube as well as suspend or terminate any offending accounts. The lawyers claimed that four weeks after sending the first DMCA takedown notice relating to the images, and filing over a dozen more since, the photos are still available on the Google sites.
What does “remove” mean?
Stephen E Arnold, October 4, 2014
Why Good Enough Is the New Norm in Search
September 29, 2014
Navigate to “Postgres Full Text Search Is Good Enough.” I first heard this argument at a German information technology conference a few years ago. The idea is surprisingly easy to understand. As long as a user can bang in a couple of key words, scan a result list, and locate information that the user finds helpful—job done. The search results may consist of flawed or manipulated information. The search results may be off point for the user’s query when evaluated by old fashioned methods such as precision and recall. The user may be dumb and relies on what the user finds accurate.
Whatever.
This write up explains the good enough approach in terms of PostgreSQL, a useful open source Codd type data management system. Please, note. I am not uncomfortable with good enough search. I understand that when the herd stampedes, it is not particularly easy to stop the run. Prudence suggests that one take cover.
Here’s the guts of the write up:
What do I mean by ‘good enough’? I mean a search engine with the following features:
- Stemming
- Ranking / Boost
- Support Multiple languages
- Fuzzy search for misspelling
- Accent support
Luckily PostgreSQL supports all these features.
The write up contains some useful code snippets to make use of search features. The discussion of full text search is coherent and addresses a vast swath of content. Note that proprietary vendors have tilled acres of marketing earth and fertilizer to convert search into a mind boggling range of functions.
This article includes code snippets to tackle full text within PostgreSQL.
Querying is included as well. Again, code snippets are included. (My teenage advisors said, “Very useful snippets.” Okay. Good.
The write up concludes:
We have seen how to build a decent multi-language search engine based on a non-trivial document. This article is only an overview but it should give you enough background and examples to get you started with your own….Postgres is not as advanced as ElasticSearch and SOLR but these two are dedicated full-text search tools whereas full-text search is only a feature of PostgreSQL and a pretty good one
Reasonable observation. Worth reading.
If you are a vendor of proprietary search technology, there will be more individuals infused with the sprit of open source, not fewer. How many experts are there for proprietary systems? Fewer than the cadres of open source volk I surmise.
Stephen E Arnold, September 29, 2014

