Prepare To Update Your Cassandra
June 2, 2015
It is time for an update to Apache’s headlining, open source, enterprise search software! The San Diego Times let us know that “DataStax Enterprise 4.7 Released” and it has a slew of updates set to make open source search enthusiasts drool. DataStax is a company that built itself around the open source Apache Cassandra software. The company specializes in enterprise applications for search and analytics.
The newest release of DataStax Enterprise 4.7 includes several updates to improve a user’s enterprise experience:
“…includes a production-certified version of Cassandra 2.1, and it adds enhanced enterprise search, analytics, security, in-memory, and database monitoring capabilities. These include a new certified version of Apache Solr and Live Indexing, a new DSE feature that makes data immediately available for search by leveraging Cassandra’s native ability to run across multiple data centers.”
The update also includes DataStax’s OpCenter 5.2 for enhanced security and encryption. It can be used to store encryption keys on servers and to manage admin security.
The enhanced search capabilities are the real bragging points: fault-tolerant search operations-used to customize failed search responses, intelligent search query routing-queries are routed to the fastest machines in a cluster for the quickest response times, and extended search analytics-using Solr search syntax and Apache Spark research and analytics tasks can run simultaneously.
DataStax Enterprise 4.7 improves enterprise search applications. It will probably pull in users trying to improve their big data plans. Has DataStax considered how its enterprise platform could be used for the cloud or on mobile computing?
Whitney Grace, June 2, 2015
Sponsored by ArnoldIT.com, publisher of the CyberOSINT monograph
Legal Aquisition Ahead? For Sure, for Sure
June 1, 2015
I read, along with Reed Elsevier and Thomson Reuters executives, the article with this click attractive title: “Westlaw and Lexis Nexis: Are Your Days Numbered Yet?” The answer is, “Nope.” Legal eagles, the government, and Trump-wealthy online surfers will continue to use LexisNexis and Westlaw for a while.
The article references new online services which are trying to offer alternatives at a lower cost. mentioned in the write up is Casetext at https://casetext.com. I expected to see a reference to Fastcase at http://www.fastcase.com and maybe a pointer to Fastcase buying LexisNexis’ Collier TopForm & File.
My hunch is that legal information is getting more difficult to locate. Verify this by looking for information related to the MIC, RAC, and ZPIC matters in US government Web sites like www.usa.gov and in the commercial legal information services. I found this research a challenge.
If Casetext has some oomph, I assume Reed and Thomson will acquire the company. Online legal research has a certain predictability inherent it its business model.
Stephen E Arnold, June 1, 2015
Elasticsearch and SQL Queries
May 30, 2015
Short honk: Want to use Elasticsearch for SQL queries? Now you can learn how. Navigate to “Elasticsearch-SQL.” Explanations, code samples, and a feature summary are available. LucidWorks (Really?), your serve.
Stephen E Arnold, May 30, 2015
JobSamurai Offers Alternative Job Search Method (Without the Search)
May 29, 2015
The article titled Take the Search Out of Job Hunting with JobSamurai on MakeUseOf describes the perks in using JobSamurai next time you are out of work. A lot of people rely on services like Craigslist, but anyone who has searched for a job there knows that a good portion of the listings are frauds, or just non-existent. The number of irrelevant posts are also high and weeding through them all is time-consuming and frustrating. JobSamurai claims to have the answers, with a job website that minimizes the search factor. The article explains,
“JobSamurai uses your information to find jobs around the web that match your profile, then shows them to you as banner adverts on the websites you visit most often. They do this by leaving a tracking cookie in your web browser that sends data back to JobSamurai to notify them of where to display their content. It typically takes 10-15 days for their internal search engines to find all the jobs that match a candidate.”
While this means that users will need to exercise some patience before seeing results, it is balanced out by the absence of those terrible spam emails that job search websites love to litter your inbox with. JobSamurai promises to limit itself to one email every two months- which really seems like no emails at all.
Chelsea Kerwin, May 29, 2014
Sponsored by ArnoldIT.com, publisher of the CyberOSINT monograph
Generalizations about Big Data: Hail, the Mighty Hadoop
May 26, 2015
I read “A Big Data Cheat Sheet: What Executives Want to Know.” The hidden agenda in the write up is revealed with the juxtaposition of the source Social Media Today and the technology Hadoop.
Big Data is one of those buzzwords which now grates on me. When I hear it, I wonder what the outfit is pitching and how something as nebulous as Big Data is going to save someone’s bacon or, if one is a vegetarian, tofu.
This write up beats the Hadoop drum. Isn’t Hadoop one method for performing certain types of data management tasks and extracting results from those tasks? Hadoop is a tool, and like a router in the home workshop, a pretty feisty gizmo in the hands of a novice.
The article suggests that Hadoop is a federation system. Hadoop can be a federation system, but it can handle data from a single source; for example, log files. Federation is not magic; it requires work. In fact, federation may render the benefits of Hadoop secondary to the cost of the resources required to utilize Hadoop in an effective way.
There are other assertions as well; for example:
- Hadoop can archive “all data.” Hmmm. “All.” Does this sound a bit over blown.
- Hadoop is enterprise ready? Sure, if the enterprise has the resources to make appropriate use of Hadoop.
- Are data lakes and data warehouses the same? According to the write up, the data warehouse uses structured data and the data lake is just a big pool of disparate data. Queries across this type of “pool” can be exciting and expensive.
- The upsides and downsides of the data lake pivot on data management. Okay, that is definitely true. What is not explored is the cost of managing large volumes of data, their updates, and their manipulation. Queries can be expensive.
My point is that sweeping generalizations about a technology which is useful are not helpful. Firing buzzwords into the mushy brain of a person involved in social media can have some interesting consequences.
Hadoop is not magic. Hadoop requires specialized knowledge. Hadoop does not deliver like the tooth fairy a quarter under one’s pillow. If Hadoop were the answer to Big Data problems, why are so many Hadoop projects vulnerable to very common problems in configuration, memory handling, lousy performance, and problematic hives?
Social media experts are not likely to appreciate these challenges as they work to deal with large volumes of data, updates, and queries. Oh, are the outputs valid? Frankly some Hadoop projects never face that problem.
Stephen E Arnold, May 26, 2015
Sinequa and Systran Partner on Cyber Defense
May 20, 2015
Enterprise search firm Sinequa and translation tech outfit Systran are teaming up on security software. “Systran and Sinequa Combine in the Field of Cyber Defense,” announces ITRmanager.com. (The article is in French, but Google Translate is our friend.) The write-up explains:
“Sinequa and Systran have indeed decided to cooperate to develop a solution for detecting and processing of critical information in multiple languages ??and able to provide investigators with a panoramic view of a given subject. On one side Systran provides safe instant translation in over 45 languages, and the other Sinequa provides big data processing platform to analyze, categorize and retrieve relevant information in real time. The integration of the two solutions should thus facilitate the timely processing of structured and unstructured data from heterogeneous sources, internal and external (websites, audio transcripts, social media, etc.) and provide a clear and comprehensive view of a subject for investigators.”
Launched in 2002, Sinequa is a leader in the Enterprise Search field; the company boasts strong business analytics, but also emphasizes user-friendliness. Based in Paris, the firm maintains offices in Frankfurt, London, and New York City. Systran has a long history of providing innovative translation services to defense and security organizations around the world. The company’s headquarters are in Seoul, with other offices located in Daejeon, South Korea; Paris; and San Diego.
Cynthia Murrell, May 20, 2015
Stephen E Arnold, Publisher of CyberOSINT at www.xenky.com
Searching Bureaucracy
May 19, 2015
The rise of automatic document conversion could render vast amounts of data collected by government agencies useful. In their article, “Solving the Search Problem for Large-Scale Repositories,” GCN explains why this technology is a game-changer, and offers tips for a smooth conversion. Writer Mike Gross tells us:
“Traditional conversion methods require significant manual effort and are economically unfeasible, especially when agencies are often precluded from using offshore labor. Additionally, government conversion efforts can be restricted by document security and the number of people that require access. However, there have been recent advances in the technology that allow for fully automated, secure and scalable document conversion processes that make economically feasible what was considered impractical just a few years ago. In one particular case the cost of the automated process was less than one-tenth of the traditional process. Making content searchable, allowing for content to be reformatted and reorganized as needed, gives agencies tremendous opportunities to automate and improve processes, while at the same time improving workflow and providing previously unavailable metrics.”
The write-up describes several factors that could foil an attempt to implement such a system, and I suggest interested parties check out the whole article. Some examples include security and scalability, of course, as well as specialized format and delivery requirements, and non-textual elements. Gross also lists criteria to look for in a vendor; for instance, assess how well their products play with related software, like scanning and optical character recognition tools, and whether they will be able to keep up with the volumes of data at hand. If government agencies approach these automation advances with care and wisdom, instead of reflexively choosing the lowest bidder, our bureaucracies’ data systems may actually become efficient. (Hey, one can dream.)
Cynthia Murrell, May 19, 2015
Stephen E Arnold, Publisher of CyberOSINT at www.xenky.com
Hadoop: Its Inventor Speaks
May 18, 2015
I must have my wires crossed about Hadoop. I thought other folks were the creators of what became Hadoop. I read “Where Next for Hadoop? An Interview with Co-Creator Doug Cutting” to get my memory refreshed. (Note: you may have to register or pay to view the full text of this interview.)
According to the article Doug Cutting and mike Cafarella cooked up Hadoop in 2005. Cutting now works at Cloudera, which, according to Crunchbase, is
an enterprise software company that provides Apache Hadoop-based software and training to data-driven enterprises. –
You can find some objective analyses of the company and its technology at http://bit.ly/1desDEN. I use the term “objective” to mean written by mid tier consultants.
I highlighted this statement:
Hadoop is already much more versatile and user-friendly than it was in the early days and innovations such as Yarn, Impala and Spark as well as a hardening of the platform’s security have all made it more “enterprise ready” too…
To underscore the user friendliness of Hadoop I circled in high intensity pink:
Asked whether some IT people are so bowled over by the number and choice of big data tools that they neglect to think how they will use them, Cutting agrees that this can be the case, but says that as use cases grow this issue will diminish. “It’s in an early stage of maturity so that’s not unexpected, but I think over time people are going to think about the functionality you’ve got in the distribution. You could have a SQL engine for analytics queries. You’ve got a NoSQL engine for reporting queries,” he says. So are companies like Cloudera, which, thanks to support from the likes of Intel (see below) and its vast marketing budget, distracting the market from the bigger picture? “There is confusion but I think it’s mostly because people are new to it and do not have much experience,” Cutting says.
And a final snippet:
Mostly I think this mantle of open and standard is deceptive. It is neither open in that everybody’s really invited on equal terms to play, nor is it a standard. It’s a minority of people out there.”
There are other comments about Hadoop. I will leave them to you. Easy to use, not confusing, and no problems with open and standard. There are many consulting firms thrilled with Hadoop. Snap it in and dig into data. Versatile too.
Stephen E Arnold, May 18, 2015
Popular and Problematic Hadoop
May 15, 2015
We love open source on principle, and Hadoop is indeed an open-source powerhouse. However, any organization considering a Hadoop system must understand how tricky implementation can be, despite the hype. A pair of writers at GCN asks and answers the question, “What’s Holding Back Hadoop?” The brief article reports on a recent survey of data management pros by data-researcher TDWI. Reporters Troy K. Schneider and Jonathan Lutton explain:
“Hadoop — the open-source, distributed programming framework that relies on parallel processing to store and analyze both structured and unstructured data — has been the talk of big data for several years now. And while a recent survey of IT, business intelligence and data warehousing leaders found that 60 percent will Hadoop in production by 2016, deployment remains a daunting task. TDWI — which, like GCN, is owned by 1105 Media — polled data management professionals in both the public and private sector, who reported that staff expertise and the lack of a clear business case topped their list of barriers to implementation.”
The write-up supplies a couple bar graphs of survey results, including the top obstacles to implementation and the primary benefits of going to the trouble. Strikingly, only six percent or respondents say there’s no Hadoop in their organizations’ foreseeable future. Though not covered in the GCN write-up, the full, 43-page report includes word on best practices and implementation trends; it can be downloaded here (registration required).
Cynthia Murrell, May 15, 2015
Sponsored by ArnoldIT.com, publisher of the CyberOSINT monograph
The Latest SharePoint News from Ignite
May 14, 2015
The Ignite conference in Chicago has answered many of the questions that SharePoint users have been curious about for months now. Among them was the question of release timing and features for the latest iteration of SharePoint. CMS Wire gives a rundown in their article, “What’s Up With SharePoint? #MSIgnite.”
The article sums up the biggest news:
“Microsoft will continue to enhance the core offerings in the on-premises edition. It will also continue to develop SharePoint Online and update it as quickly as the updates are available. A preview version of SharePoint 2016 will be made available later this summer, with a beta version expected by the end of the year . . . In an afternoon session entitled Evolution of SharePoint Overview and Roadmap, the duo gave a rough outline of Microsoft’s plans, albeit without precise delivery dates.”
Having had to push back delivery dates once already, Microsoft is likely hesitant to announce anything solid until development is final. As far as qualities for the new version, Microsoft is focusing on: user experience, extensibility, and SharePoint management. The inclusion of user experience should be a welcome change for many. To stay in touch with developments as they become available, keep an eye on ArnoldIT.com, and particularly his feed devoted to SharePoint. Stephen E. Arnold has made a lifelong career out of all things search, and he has a knack for distilling down the “need to know” facts to keep an organization on track.
Emily Rae Aldridge, May 14, 2015
Sponsored by ArnoldIT.com, publisher of the CyberOSINT monograph

