Top Trends for Cyber Security and Analytics in 2016
December 23, 2015
With the end of the year approaching, people try to predict what will happen in the New Year. The New Year brings on a sort of fortunetelling, because if companies are able to correctly predict what will happen in 2016 then it serves for positive profit margins and a healthier customer base. The IT industry has its own share of New Year soothsayers and the Executive Biz blog shares that “Booz Allen Cites Top Cyber, Analytics Trends In 2016; Bill Stewart Comments” with possible trends in cyber security and data analytics for the coming year.
Booz Allen Hamilton says that companies will want to merge analytical programs with security programs to receive data sets that show network vulnerabilities; they have been dubbed “fusion centers.”
“ ‘As cyber risk and advanced analytics demand increasing attention from the C-suite, we are about to enter a fundamentally different period,’ said Bill Stewart, executive vice president and leader of commercial cyber business at Booz Allen. ‘The dynamics will change… Skilled leaders will factor these changing dynamics into their planning, investments and operations.’”
The will also be increased risks coming from the Dark Web and risks that are associated with connected systems, such as cloud storage. Booz Allen also hints that companies will need skilled professionals who know how to harness cyber security risks and analytics. That suggestion is not new, as it has been discussed since 2014. While the threat from the Internet and vulnerabilities within systems has increased, the need for experts in these areas as well as better programs to handle them has always been needed. Booz Allen is restating the obvious, the biggest problem is that companies are not aware of these risks and they usually lack the budget to implement preemptive measures.
Whitney Grace, December 23, 2015
Sponsored by ArnoldIT.com, publisher of the CyberOSINT monograph
Machine Learning Used to Decipher Lute Tablature
December 23, 2015
The Oxford Journal’s Early Music publication reveals a very specialized use of machine learning in, “Bring ‘Musicque into the Tableture’: Machine-Learning Models for Polyphonic Transcription of 16th-Century Lute Tablature” by musical researchers Reinier de Valk and Tillman Weyde. Note that this link will take you to the article’s abstract; to see the full piece, you’ll have to subscribe to the site. The abstract summarizes:
“A large corpus of music written in lute tablature, spanning some three-and-a-half centuries, has survived. This music has so far escaped systematic musicological research because of its notational format. Being a practical instruction for the player, tablature reveals very little of the polyphonic structure of the music it encodes—and is therefore relatively inaccessible to non-specialists. Automatic polyphonic transcription into modern music notation can help unlock the corpus to a larger audience and thus facilitate musicological research.
“In this study we present four variants of a machine-learning model for voice separation and duration reconstruction in 16th-century lute tablature. These models are intended to form the heart of an interactive system for automatic polyphonic transcription that can assist users in making editions tailored to their own preferences. Additionally, such models can provide new methods for analysing different aspects of polyphonic structure.”
The full article lays out the researchers’ modelling approaches and the advantages of each. They report their best model returns accuracy rates of 80 to 90 percent, so for modelers, it might be worth the $39 to check out the full article. We just think it’s nice to see machine learning used for such a unique and culturally valuable project.
Cynthia Murrell, December 23, 2015
Sponsored by ArnoldIT.com, publisher of the CyberOSINT monograph
The Importance of Google AI
December 23, 2015
According to Business Insider, we’ve all been overlooking something crucial about Google. Writer Lucinda Shen reports, “Top Internet Analyst: There Is One Thing About Google that Everyone Is Missing.” Shen cites an observation by prominent equity analyst Carlos Kirjner. She writes:
“Kirjner, that thing [that everyone else is missing] is AI at Google. ’Nobody is paying attention to that because it is not an issue that will play out in the next few quarters, but longer term it is a big, big opportunity for them,’ he said. ‘Google’s investments in artificial intelligence, above and beyond the use of machine learning to improve character, photo, video and sound classification, could be so revolutionary and transformational to the point of raising ethical questions.’
“Even if investors and analysts haven’t been closely monitoring Google’s developments in AI, the internet giant is devoted to the project. During the company’s third-quarter earnings call, CEO Sundar Pichai told investors the company planned to integrate AI more deeply within its core business.”
Google must be confident in its AI if it is deploying it across all its products, as reported. Shen recalls that the company made waves back in November, when it released the open-source AI platform TensorFlow. Is Google’s AI research about to take the world by storm?
Cynthia Murrell, December 23, 2015
Sponsored by ArnoldIT.com, publisher of the CyberOSINT monograph
When the Data Cannot Be Trusted
December 22, 2015
A post at Foreign Policy, “Cyber Spying Is Out, Cyber Lying Is In,” reveals that it may be more important now than ever before to check the source, facts, and provenance of digital information. Unfortunately, search and content processing systems do not do a great job of separating baloney from prime rib.
Journalist Elias Groll tells us that the experts are concerned about hacking’s new approach:
“In public appearances and congressional testimony in recent months, America’s top intelligence officials have repeatedly warned of what they describe as the next great threat in cyberspace: hackers not just stealing data but altering it, threatening military operations, key infrastructure, and broad swaths of corporate America. It’s the kind of attack they say would be difficult to detect and capable of seriously damaging public trust in the most basic aspects of both military systems and a broader economy in which tens of millions of people conduct financial and health-related transactions online….
“Drones could beam back images of an empty battlefield that is actually full of enemy fighters. Assembly robots could put together cars using dimensions that have been subtly altered, ruining the vehicles. Government personnel records could be modified by a foreign intelligence service to cast suspicion on a skilled operative.”
Though such attacks have not yet become commonplace, there are several examples to cite. Groll first points to the Stuxnet worm, which fooled Iranian engineers into thinking their centrifuges were a-okay when it had actually sabotaged them into over-pressurizing. (That was a little joint project by the U.S. and Israel.) See the article for more examples, real and hypothesized. Not all experts agree that this is a growing threat, but I, for one, am glad our intelligence agencies are treating it like one.
Cynthia Murrell, December 22, 2015
Sponsored by ArnoldIT.com, publisher of the CyberOSINT monograph
Use the Sentiment Analysis Luke
December 22, 2015
The newest Star Wars film is out in theaters and any credible Star Wars geek has probably seen the film at least twice. One theme that continues to be prevalent in the franchise is the use of the mystical, galactic power the Force. The Force gives the Jedi special powers, such as the ability to read a person’s mind. Computer Weekly says that data will be able to do the same thing in: “Sentiment Analysis With Hadoop: 5 Steps To Becoming A Mind Reader.”
While the article title reads more like a kit on how to became a psychic cheat, sentiment analysis has proven to predict a person’s actions, especially their shopping habits. Sentiment analysis is a huge market for companies wanting to learn how to reach their shoppers on a more intimate level, predict trends before they happen, and connect with shoppers in real-time. Apache Hadoop is a tool used to harness the power of data to make anyone with the right knowledge a mind reader and Twitter is one of the tools used.
First-data is collect, second-label data to create a data dictionary with positive or negative annotations, third-run analytics, fourth-run through a beta phase, and fifth-get the insights. While it sounds easy, the fourth step is going to be the biggest hassle:
“Remember that analytic tools that just look for positive or negative words can be entirely misleading if they miss important context. Typos, intentional misspellings, emoticons and jargon are just few additional obstacles in the task.
Computers also don’t understand sarcasm and irony and as a general rule are yet to develop a sense of humor. Too many of these and you will lose accuracy. It is probably best to address this point by fine-tuning your model.”
The purpose of sentiment analysis is teaching software how to “think” like a human and understand all our illogical ways. (Hmm…that was a Star Trek reference, whoops!) Hadoop Apache might not have light sabers or help you find droids, but it does offer to help understand consumers spending habits. So how about, “These are the greenbacks you have been looking for.”
Whitney Grace, December 22, 2015
Sponsored by ArnoldIT.com, publisher of the CyberOSINT monograph
Topology Is Finally on Top
December 21, 2015
Topology’s time has finally come, according to “The Unreasonable Usefulness of Imagining You Live in a Rubbery World,” shared by 3 Quarks Daily. The engaging article reminds us that the field of topology emphasizes connections over geometric factors like distance and direction. Think of a subway map as compared to a street map; or, as writer Jonathan Kujawa describes:
“Topologists ask a question which at first sounds ridiculous: ‘What can you say about the shape of an object if you have no concern for lengths, angles, areas, or volumes?’ They imagine a world where everything is made of silly putty. You can bend, stretch, and distort objects as much as you like. What is forbidden is cutting and gluing. Otherwise pretty much anything goes.”
Since the beginning, this perspective has been dismissed by many as purely academic. However, today’s era of networks and big data has boosted the field’s usefulness. The article observes:
“A remarkable new application of topology has emerged in the last few years. Gunnar Carlsson is a mathematician at Stanford who uses topology to extract meaningful information from large data sets. He and others invented a new field of mathematics called Topological data analysis. They use the tools of topology to wrangle huge data sets. In addition to the networks mentioned above, Big Data has given us Brobdinagian sized data sets in which, for example, we would like to be able to identify clusters. We might be able to visually identify clusters if the data points depend on only one or two variables so that they can be drawn in two or three dimensions.”
Kujawa goes on to note that one century-old tool of topology, homology, is being used to analyze real-world data, like the ways diabetes patients have responded to a specific medication. See the well-illustrated article for further discussion.
Cynthia Murrell, December 21, 2015
Sponsored by ArnoldIT.com, publisher of the CyberOSINT monograph
Textio for Text Analysis
December 17, 2015
I read “Textio, A Startup That Analyzes Text Performance, Raises $8M.” The write up reported:
Textio recognizes more than 60,000 phrases with its predictive technology, Snyder [Textio’s CEO] said, and that data set is changing constantly as it continues to operate. It looks at how words are put together — such as how verb dense a phrase is — and at other syntax-related properties the document may have. All that put together results in a score for the document, based on how likely it is to succeed in whatever the writer set out to do.
The secret sauce bubbles in this passage:
it’s important that it [Textio] feels easy to use — hence the highlighting and dropdown boxes rather than readouts.
Textio’s Web site states:
From e-commerce to real estate to marketing content, Textio was founded on this simple vision: how you write changes who you reach, and using data, we can predict ahead of time how you’re going to do.
The company, according to Crunchbase has received $9.5 million from Emergence Capital Partners and four other firms.
There are a number of companies offering text analysis, but Textio may be one of the few providing user friendly tools to help people write and make sense of the nuances in résumés and similar corpuses. Sophisticated analysis of text is available from a number of vendors.
It is encouraging to me that a sub function of information access is attracting attention as a stand alone service. One of the company’s customers is Microsoft, a firm with home grown text solutions and technologies from Fast Search & Transfer and Powerset, among others sources. Microsoft’s interest in Textio underscores that text processing that works as one hopes is an unmet need.
Stephen E Arnold, December 17, 2015
Old School Mainframes Still Key to Big Data
December 17, 2015
According to ZDNet, “The Ultimate Answer to the Handling of Big Data: The Mainframe.” Believe it or not, a recent survey of 187 IT pros from Syncsort found the mainframe to be the important to their big data strategy. IBM has even created a Hadoop-capable mainframe. Reporter Ken Hess lists some of the survey’s findings:
*More than two-thirds of respondents (69 percent) ranked the use of the mainframe for performing large-scale transaction processing as very important
*More than two-thirds (67.4 percent) of respondents also pointed to integration with other standalone computing platforms such as Linux, UNIX, or Windows as a key strength of mainframe
*While the majority (79 percent) analyze real-time transactional data from the mainframe with a tool that resides directly on the mainframe, respondents are also turning to platforms such as Splunk (11.8 percent), Hadoop (8.6 percent), and Spark (1.6 percent) to supplement their real-time data analysis […]
*82.9 percent and 83.4 percent of respondents cited security and availability as key strengths of the mainframe, respectively
*In a weighted calculation, respondents ranked security and compliance as their top areas to improve over the next 12 months, followed by CPU usage and related costs and meeting Service Level Agreements (SLAs)
*A separate weighted calculation showed that respondents felt their CIOs would rank all of the same areas in their top three to improve
Hess goes on to note that most of us probably utilize mainframes without thinking about it; whenever we pull cash out of an ATM, for example. The mainframe’s security and scalability remain unequaled, he writes, by any other platform or platform cluster yet devised. He links to a couple of resources besides the Syncsort survey that support this position: a white paper from IBM’s Big Data & Analytics Hub and a report from research firm Forrester.
Cynthia Murrell, December 17, 2015
Sponsored by ArnoldIT.com, publisher of the CyberOSINT monograph
Distribution Ready Reference
December 16, 2015
Distributions are nifty. Some are easy, like the bell curve. Nice and symmetrical. Others are less regular. If you want to see what type of distribution your data generates, navigate to “Common Probability Distributions: The Data Scientist’s Crib Sheet.” Is it necessary to understand the mathematics underpinning each curve? If you are an MBA, the answer is, “No.” If you are more catholic in your approach, you can use these curves to poke into the underbelly of the numerical recipes. Nice write up. It does not include the Tracy Widom distribution, but the beta distribution may be close enough for MBA horse shoes.
Stephen E Arnold, December 16, 2015
Google Timeline Knows Where You Have Been
December 16, 2015
We understand that to get the most out of the Internet, we sacrifice a bit of privacy; but do we all understand how far-reaching that sacrifice can be? The Intercept reveals “How Law Enforcement Can Use Google Timeline to Track Your Every Move.” For those who were not aware, Google helpfully stores all the places you (or your devices) have traveled, down to longitude and latitude, in Timeline. Now, with an expansion launched in July 2015, that information goes back years, instead of just six months. Android users must actively turn this feature off to avoid being tracked.
The article cites a report titled “Google Timelines: Location Investigations Involving Android Devices.” Written by a law-enforcement trainer, the report is a tool for investigators. To be fair, the document does give a brief nod to privacy concerns; at the same time, it calls it “unfortunate” that Google allows users to easily delete entries in their Timelines. Reporter Jana Winter writes:
“The 15-page document includes what information its author, an expert in mobile phone investigations, found being stored in his own Timeline: historic location data — extremely specific data — dating back to 2009, the first year he owned a phone with an Android operating system. Those six years of data, he writes, show the kind of information that law enforcement investigators can now obtain from Google….
“The ability of law enforcement to obtain data stored with privacy companies is similar — whether it’s in Dropbox or iCloud. What’s different about Google Timeline, however, is that it potentially allows law enforcement to access a treasure trove of data about someone’s individual movement over the course of years.”
For its part, Google admits they “respond to valid legal requests,” but insists the bar is high; a simple subpoena has never been enough, they insist. That is some comfort, I suppose.
Cynthia Murrell, December 16, 2015
Sponsored by ArnoldIT.com, publisher of the CyberOSINT monograph

