Modus Operandi Gets a Big Data Storage Contract

March 24, 2015

The US Missile Defense Agency awarded Modus Operandi a huge government contract to develop an advanced data storage and retrieval system for the Ballistic Missile Defense System. Modus Operandi specializes in big data analytic solutions for national security and commercial organizations. Modus Operandi posted a press release on their Web site to share the news, “Modus Operandi Awarded Contract To Develop Advanced Data Storage And Retrieval System For The US Missile Defense Agency.”

The contract is a Phase I Small Business Innovation Research (SBIR), under which Modus Operandi will work on the DMDS Analytic Semantic System (BASS). The BASS will replace the old legacy system and update it to be compliant with social media communities, the Internet, and intelligence.

“ ‘There has been a lot of work in the areas of big data and analytics across many domains, and we can now apply some of those newer technologies and techniques to traditional legacy systems such as what the MDA is using,’ said Dr. Eric Little, vice president and chief scientist, Modus Operandi. ‘This approach will provide an unprecedented set of capabilities for the MDA’s data analysts to explore enormous simulation datasets and gain a dramatically better understanding of what the data actually means.’ ”

It is worrisome that the missile defense system is relying on an old legacy system, but at least it is being upgraded now. Modus Operandi also sales Cyber OSINT and they are applying this technology in an interesting way for the government.

Whitney Grace, March 24, 2015
Stephen E Arnold, Publisher of CyberOSINT at www.xenky.com

Written by Stephen E. Arnold · Filed Under Analytics, Big data, Cyber OSINT, Database, News, Security | Comments Off on Modus Operandi Gets a Big Data Storage Contract

Data and Marketing Come Together for a Story

March 23, 2015

An article on the Marketing Experiments Blog titled Digital Analytics: How To Use Data To Tell Your Marketing Story explains the primacy of the story in the world of data. The conveyance of the story, the article claims, should be a collaboration between the marketer and the analyst, with both players working together to create an engaging and data-supported story. The article suggests breaking this story into several parts, similar to the plot points you might study in a creative writing class. Exposition, Rising Action, Climax, Denouement and Resolution. The article states,

“Nate [Silver] maintained throughout his speech that marketers need to be able to tell a story with data or it is useless. In order to use your data properly, you must know what the narrative should be…I see data reporting and interpretation as an art, very similar to storytelling. However, data analysts are too often siloed. We have to understand that no one writes in a bubble, and marketing teams should understand the value and perspective data can bring to a story.”

Silver, Founder and Editor in Chief of FiveThirtyEight.com is also quoted in the article from his talk at the Adobe Summit Digital Marketing Conference. He said, “Just because you can’t measure it, doesn’t mean it’s not important.” This is the back to the basics approach that companies need to consider.

Chelsea Kerwin, March 23, 2015

Stephen E Arnold, Publisher of CyberOSINT at www.xenky.com

Written by Stephen E. Arnold · Filed Under Analytics, Data, Database, Marketing, News | Comments Off on Data and Marketing Come Together for a Story

Apache Samza Revamps Databases

March 19, 2015

Databases have advanced far beyond the basic relational databases. They need to be consistently managed and have real-time updates to keep them useful. The Apache Software Foundation developed the Apache Samza software to help maintain asynchronous stream processing network. Samza was made in conjunction with Apache Kafka.

If you are interested in learning how to use Apache Samza, the Confluent blog posted “Turning The Database Inside-Out With Apache Samza” by Martin Keppmann. Kleppmann recorded a seminar he gave at Strange Loop 2014 that explains his process for how it can improve many features on a database:

“This talk introduces Apache Samza, a distributed stream processing framework developed at LinkedIn. At first it looks like yet another tool for computing real-time analytics, but it’s more than that. Really it’s a surreptitious attempt to take the database architecture we know, and turn it inside out. At its core is a distributed, durable commit log, implemented by Apache Kafka. Layered on top are simple but powerful tools for joining streams and managing large amounts of data reliably.”

Learning new ways to improve database features and functionality always improve your skill set. Apache Software also forms the basis for many open source projects and startups. Martin Kleppman’s talk might give you a brand new idea or at least improve your database.

Whitney Grace, March 20, 2015

Stephen E Arnold, Publisher of CyberOSINT at www.xenky.com

Written by Stephen E. Arnold · Filed Under Analytics, Big data, Database, News | Comments Off on Apache Samza Revamps Databases

DataStax Buys Graph-Database Startup Aurelius

February 20, 2015

DataStax has purchased open-source graph-database company, Aurelius, we learn in “DataStax Grabs Aurelius in Graph Database Acqui-Hire” at TechCrunch. Aurelius’ eight engineers will reportedly be working at DataStax, delving right into a scalable graph component for the company’s Cassandra-based Enterprise database. This acquisition, DataStax declares, makes theirs the only database platform with graph, analytics, search, and in-memory in one package. Writer Ron Miller tells us:

“DataStax is the commercial face of the open source Apache Cassandra database. Aurelius was the commercial face of the Titan graph database.

“Matt Pfeil, co-founder and chief customer officer at DataStax, says customers have been asking about graph database functionality for some time. Up until now customers have been forced to build their own on top of the DataStax offering.

“‘This was something that was on our radar. As we started to ramp up, it made sense from corporate [standpoint] to buy it instead of build it.’ He added that getting the graph-database engineering expertise was a bonus. ‘There’s not a ton of graph database experts [out there],’ he said.

“This expertise is especially important as two of the five major DataStax key use cases — fraud detection and recommendation engines — involve a graph database.”

Though details of the deal have not been released, see the write-up for some words on the fit between these two companies. Founded on an open-source model, Aurelius was doing just fine in its own. Co-founder Matthias Bröcheler is excited, though, about what his team can do at DataStax. Bröcheler did note that the graph database’s open-source version, Titan, will live on. Aurelius is located in Oakland, California, and was just launched in 2014.

Headquartered in San Mateo, California, DataStax was founded in 2010. Their Cassandra-based software implementations are flexible and scalable. Clients range from young startups to Fortune 100 companies, including such notables as eBay, Netflix and HealthCare Anytime.

Cynthia Murrell, February 20, 2015

Sponsored by ArnoldIT.com, developer of Augmentext

Written by Stephen E. Arnold · Filed Under Acquisition, Data, Database, News | Comments Off on DataStax Buys Graph-Database Startup Aurelius

Apache Solr Search NoSQL Search Shines Solo

February 3, 2015

Apache Solr is an open source enterprise search engine that is used for relational databases and Hadoop. ZDNet’s article, “Why Apache Solr Search Is On The Rise And Why It’s Going Solo” explores why its lesser-known use as a NoSQL store might explode in 2015.

At the beginning of 2014, the most Solr deployments were using it in the old-fashioned way, but 2015 shows that fifty percent of the pipeline is now using it as a first class data store. Companies are upgrading their old file intranets for the enterprise cloud. They want the upgraded system to be searchable and they are relying on Solr to get the job done.

Search is more complex than basic NoSQL and needs something more robust to handle the new data streams. Solr adds the extra performance level, so users have access to their data and nothing is missing.

” ‘So when we talk about Solr, it’s all your data, all the time at scale. It’s not just a guess that we think is likely the right answer. ‘We’re going to go ahead and push this one forward’. We guarantee the quality of those results. In financial services and other areas where guarantees are important, that makes Solr attractive,’ [CEO Will Hayes of LucidWorks, Apache Solr’s commercial sponsor] said.”

It looks like anything is possible for LucidWorks in the coming year.

Whitney Grace, February 03, 2014
Sponsored by ArnoldIT.com, developer of Augmentext

Written by Stephen E. Arnold · Filed Under Database, News, Open source, Search | Comments Off on Apache Solr Search NoSQL Search Shines Solo

Basho: A Comeback?

January 18, 2015

I read “NoSQL Pioneer Basho Scores $25M to Attempt a Comeback.” In 2012, Basho looked like a player. Then the company lost traction. The all-too-familiar “staff changes” kicked in. Now the company has gobbled another $25 million to the $32 million previously raised. My thought is that generating this much cash from a NoSQL system is going to be a task I would not undertake. I do have a profile of Basho when it was looking like a contender. I will hunt it down and post a version on the Xenky Vendor Profiles page. I will put an item in Beyond Search and provide the link to a free profile of the company in the next few days. Availability of the free report will be in Beyond Search.

Stephen E Arnold, January 18, 2015

Written by Stephen E. Arnold · Filed Under Database, News | Comments Off on Basho: A Comeback?

On Commercial vs Open Source Databases

December 22, 2014

Perhaps we should not be surprised that MarkLogic’s Chet Hays urges caution before adopting an open-source data platform. His article, “Thoughts on How to Select Between COTS and Open Source” at Sys-Con Media can be interpreted as a defense of his database company’s proprietary approach. (For those unfamiliar with the acronym, COTS stands for commodity off-the-shelf.) Hayes urges buyers to look past initial cost and consider other factors in three areas: technical, cultural, and, yes, financial.

In the “technical” column, Hayes asserts that whether a certain solution will meet an organization’s needs is more complex than a simple side-by-side comparison of features would suggest; we are advised to check the fine print. “Cultural” refers here to taking workers’ skill sets into consideration. Companies usually do this with their developers, Hayes explains, but often overlook the needs of the folks in operational support, who might appreciate the more sophisticated tools built into a commercial product. (No mention is made of the middle ground, where we find third-party products designed that add such tools to Hadoop iterations.)

In his comments on financial impact, Hayes basically declares: It’s complicated. He writes:

“Organizations need to look at the financial picture from a total-cost perspective, looking at the acquisition and development costs all the way through the operations, maintenance and eventual retirement of the system. In terms of development, the organization should understand the costs associated with using a COTS provided tool vs. an Open Source tool.

“[…] In some cases, the COTS tool will provide a significant productivity increase and allow for a quicker time to market. There will be situations where the COTS tool is so cumbersome to install and maintain that an Open Source tool would be the right choice.

“The other area already alluded to is the cost for operations and maintenance over the lifecycle of project. Organizations should take into consideration existing IT investments to understand where previous investments can be leveraged and the cost incurred to leverage these systems. Organizations should ask whether the performance of one or the other allow for a reduced hardware and deployment footprint, which would lead to lower costs.”

These are all good points, and organizations should indeed do this research before choosing a solution. Whether the results point to an open-source solution or to a commercial option depends entirely upon the company or institution.

Cynthia Murrell, December 22, 2014

Sponsored by ArnoldIT.com, developer of Augmentext

Written by Stephen E. Arnold · Filed Under Database, News, Open source | Comments Off on On Commercial vs Open Source Databases

Amazon and Oracle: The Love Affair Ends

November 14, 2014

I recall turning in a report about Amazon’s use of Oracle as its core database. The client, a bank type operation, was delighted that zippy Amazon had the common sense to use a name brand database. For the bank types, recognizable names used to be indicators of wise technological decisions.

I read “Amazon: DROP DATABASE Oracle; INSERT Our New Fast Cheap MySQL Clone.” Assume the write up is spot on, Amazon and Oracle have fallen out of love or at least beefy payments from Amazon for the sort of old Oracle data management system. This comment becomes quite interesting to me:

“This old-world relational database software is very expensive,” Jassy [Amazon tech VP] said. “They’re proprietary. There’s a high level of lock-in. And they’ve got punitive licensing terms, not just allowing very little flexibility in moving to the cloud the way customers want, but also in the auditing and fining of their customers.”

Several thoughts flitted through my mind as I kept one eye on the Philae gizmo:

Amazon’s move, if it proves successful, may allow Mr. Bezos to mount a more serious attack on the enterprise market. Bad news for Oracle and possibly good news for those who want to save some Oracle bucks and trim the number of Oracle DBAs on the payroll
Encourage outfits that offer enterprise cloud solutions. Will Amazon snap up some of the enterprise services and put the squeeze on Google and Microsoft?
Trigger another round of database wars. Confusion and marketing hype often add a bit of spice to the Codd fest
Cause concern among the commercial, proprietary NoSQL outfits. Think of MarkLogic and its ilk trying to respond to an Amazon package designed to make a 20 something developer jump up and down.

Interesting move by the digital WalMart.

Stephen E Arnold, November 14, 2014

Written by Stephen E. Arnold · Filed Under Database, Financial, News | Comments Off on Amazon and Oracle: The Love Affair Ends

Google and Images: What Does Remove Mean?

October 4, 2014

I read “After Legal Threat, Google Says It Removed ‘Tens of Thousands’ of iCloud Hack Pics.” On the surface, the story is straightforward. A giant company gets a ringy dingy from attorneys. The giant company takes action. Legal eagles return to their nests.

However, a question zipped through my mind:

What does remove mean?

If one navigates to a metasearch engine like Devilfinder.com, the user can run queries. A query often generates results with a hot link to the Google cache. Have other services constructed versions of the Google index to satisfy certain types of queries? Are their third parties that have content in Web mirrors? Is content removed from those versions of content? Does “remove” mean from the Fancy Dan pointers to content or from the actual Google or other data structure? (See my write ups in Google Version 2.0 and The Digital Gutenberg to get a glimpse of how certain content can be deconstructed and stored in various Google data structures.)

Does remove mean a sweep of Google Images? Again are the objects themselves purged or are the pointers deleted.

Then I wondered what happens if Google suffers a catastrophic failure. Will the data and content objects be restored by a back up. Are those back ups purged?

I learned in the write up:

The Hollywood Reporter on Thursday published a letter to Google from Hollywood lawyers representing “over a dozen” of the celebrity victims of last month’s leak of nude photos. The lawyers accused Google of failing to expeditiously remove the photos as it is required to do under the Digital Millennium Copyright Act. They also demanded that Google remove the images from Blogger and YouTube as well as suspend or terminate any offending accounts. The lawyers claimed that four weeks after sending the first DMCA takedown notice relating to the images, and filing over a dozen more since, the photos are still available on the Google sites.

What does “remove” mean?

Stephen E Arnold, October 4, 2014

Written by Stephen E. Arnold · Filed Under Data, Database, Legal matters, News | 4 Comments

Why Good Enough Is the New Norm in Search

September 29, 2014

Navigate to “Postgres Full Text Search Is Good Enough.” I first heard this argument at a German information technology conference a few years ago. The idea is surprisingly easy to understand. As long as a user can bang in a couple of key words, scan a result list, and locate information that the user finds helpful—job done. The search results may consist of flawed or manipulated information. The search results may be off point for the user’s query when evaluated by old fashioned methods such as precision and recall. The user may be dumb and relies on what the user finds accurate.

Whatever.

This write up explains the good enough approach in terms of PostgreSQL, a useful open source Codd type data management system. Please, note. I am not uncomfortable with good enough search. I understand that when the herd stampedes, it is not particularly easy to stop the run. Prudence suggests that one take cover.

Here’s the guts of the write up:

What do I mean by ‘good enough’? I mean a search engine with the following features:

Stemming

Ranking / Boost

Support Multiple languages

Fuzzy search for misspelling

Accent support

Luckily PostgreSQL supports all these features.

The write up contains some useful code snippets to make use of search features. The discussion of full text search is coherent and addresses a vast swath of content. Note that proprietary vendors have tilled acres of marketing earth and fertilizer to convert search into a mind boggling range of functions.

This article includes code snippets to tackle full text within PostgreSQL.

Querying is included as well. Again, code snippets are included. (My teenage advisors said, “Very useful snippets.” Okay. Good.

The write up concludes:

We have seen how to build a decent multi-language search engine based on a non-trivial document. This article is only an overview but it should give you enough background and examples to get you started with your own….Postgres is not as advanced as ElasticSearch and SOLR but these two are dedicated full-text search tools whereas full-text search is only a feature of PostgreSQL and a pretty good one

Reasonable observation. Worth reading.

If you are a vendor of proprietary search technology, there will be more individuals infused with the sprit of open source, not fewer. How many experts are there for proprietary systems? Fewer than the cadres of open source volk I surmise.

Stephen E Arnold, September 29, 2014

Written by Stephen E. Arnold · Filed Under Database, News, Open source, Search | Comments Off on Why Good Enough Is the New Norm in Search

« Previous Page — Next Page »

Search the site
Subscribe to Beyond Search
Feature archive
News archive

Stephen E. Arnold monitors search, content processing, text mining and related topics from his high-tech nerve center in rural Kentucky. He tries to winnow the goose feathers from the giblets. He works with colleagues worldwide to make this Web log useful to those who want to go "beyond search". Contact him at sa [at] arnoldit.com. His Web site with additional information about search is arnoldit.com.

Categories
- 3D-Printing
- Acquisition
- Advertising
- Aggregation
- AI
- Alexa
- algorithms
- Amazon
- Amazonia
- Analytics
- Appliance
- Applications
- Audio
- Augmented Reality
- Big data
- Bing
- Bitcoin
- Bitext
- Book review
- Business intelligence
- Business process
- Business strategy
- Censorship
- Cloud computing
- Company Profile
- Conferences
- Connectors
- Consulting
- Consumer
- Content processing
- Copyright
- Corporate Concerns
- Cost
- Crawl
- Crowdfunding
- cryptocurrency
- Customer support
- Cyber OSINT
- cybercrime
- cybersecurity
- Dark Web
- DarkCyber
- Data
- Data mining
- Database
- Deepfakes
- Digital Assistant
- Digital Library
- E2EE
- ECommerce
- EDiscovery
- Editorial opinion
- Education
- Emoticons
- Enterprise
- Enterprise search
- Entity extraction
- Ethics
- Facebook
- Faceted search
- Factualities
- Feature
- Federated search
- Financial
- Fogint
- Google
- Governance
- Government
- Hackers
- healthcare
- IBM Watson
- Image search
- Indexing
- Infrastructure
- Innovation
- Integration
- intelware
- Interface
- Internet
- Interview
- Investment
- law enforcement
- Legal matters
- Library automation
- Management
- Marketing
- Mathematics
- Metadata
- Microsoft
- Mobile
- Natural language processing
- News
- NGIA
- Online (general)
- Open Access
- Open source
- OSINT
- Osint Radar
- Overflight
- Palantir
- Patents
- Personnel
- Podcast
- Policeware
- Portals
- Predictive coding
- Privacy
- Profile
- Publishing
- Quotation
- Real time search
- Reference tool
- Rich media
- Robot Writer
- Search
- Search enabled applications
- search engine
- Search quality
- Security
- Semantic
- Sentiment analysis
- SEO
- SharePoint
- Short Honks
- Smart Technology
- Social
- Social Media
- software
- Statistics
- Taxonomy
- Technology
- Text analytics
- Text processing
- Tools
- Tor
- Training
- Translation
- Twitter
- Uncategorized
- Unstructured Data
- User experience
- User Interface
- Vertical search
- Video
- visualization
- Voice search
- Voice technology
- Web 3
- Web Services
- Webinar
- Windows
- Work flow
- XML
- Yahoo

Beyond Search

Modus Operandi Gets a Big Data Storage Contract

Data and Marketing Come Together for a Story

Apache Samza Revamps Databases

DataStax Buys Graph-Database Startup Aurelius

Apache Solr Search NoSQL Search Shines Solo

Basho: A Comeback?

On Commercial vs Open Source Databases

Amazon and Oracle: The Love Affair Ends

Google and Images: What Does Remove Mean?

Why Good Enough Is the New Norm in Search

Search the site

Categories

Archives

Recent Posts

Meta

Share this:

Share this:

Share this:

Share this:

Share this:

Share this:

Share this:

Share this:

Share this:

Share this:

Search the site

Categories

Archives

Recent Posts

Meta