Natural Language Processing App Gains Increased Vector Precision
March 1, 2016
For us, concepts have meaning in relationship to other concepts, but it’s easy for computers to define concepts in terms of usage statistics. The post Sense2vec with spaCy and Gensim from SpaCy’s blog offers a well-written outline explaining how natural language processing works highlighting their new Sense2vec app. This application is an upgraded version of word2vec which works with more context-sensitive word vectors. The article describes how this Sense2vec works more precisely,
“The idea behind sense2vec is super simple. If the problem is that duck as in waterfowl andduck as in crouch are different concepts, the straight-forward solution is to just have two entries, duckN and duckV. We’ve wanted to try this for some time. So when Trask et al (2015) published a nice set of experiments showing that the idea worked well, we were easy to convince.
We follow Trask et al in adding part-of-speech tags and named entity labels to the tokens. Additionally, we merge named entities and base noun phrases into single tokens, so that they receive a single vector.”
Curious about the meta definition of natural language processing from SpaCy, we queried natural language processing using Sense2vec. Its neural network is based on every word on Reddit posted in 2015. While it is a feat for NLP to learn from a dataset on one platform, such as Reddit, what about processing that scours multiple data sources?
Megan Feil, March 1, 2016
Sponsored by ArnoldIT.com, publisher of the CyberOSINT monograph
IBM Continued to Brag About Watson, with Decreasing Transparency
February 29, 2016
A totally objective article sponsored by IBM on Your Story is titled How Cognitive Systems Like IBM Watson Are Changing the Way We Solve Problems. The article basically functions to promote all of the cognitive computing capabilities that most of us are already keenly aware that Watson possesses, and to raise awareness for the Hackathon event taking place in Bengaluru, India. The “article” endorses the event,
“Participants will have an unprecedented opportunity to collaborate, co-create and exchange ideas with one another and the world’s most forward-thinking cognitive experts. This half-day event will focus on sharing real-world applications of cognitive technologies, and allow attendees access to the next wave of innovations and applications through an interactive experience. The program will also include panel discussions and fireside chats between senior IBM executives and businesses that are already working with Watson.”
Since 2015, the “Watson for Oncology” program has involved Manipal Hospitals in Bengaluru, India. The program is the result of a partnership between IBM and Memorial Sloan Kettering Cancer Center in New York. Watson has now consumed almost 15 million pages of medical content from textbooks and journals in the hopes of providing rapid-fire support to hospital staffers when it comes to patient records and diagnosis. Perhaps if IBM put all of their efforts into Watson’s projects instead of creating inane web content to promote him as some sort of missionary, he could have already cured cancer. Or not.
Chelsea Kerwin, February 29, 2016
Sponsored by ArnoldIT.com, publisher of the CyberOSINT monograph
Intel Identifies the Future of High Performance Computing. Surprise. It Is Itself
February 29, 2016
I make a feeble attempt to pay attention to innovations in high performance computing. The reason is that some mathematical procedures require lots of computing resources; for example, figuring out the interactions in a fusion plasma test. Think in terms of weeks of calculation. Bummer. Most folks believe that the cloud and other semi magical marketing buzzwords have made super computers as fast as those in a sci fi movie. Wrong, gentle reader. There are computational issues. Big O?
I read with interest “The Future of High Performance Computing Has Arrived.” The write up does not do too much with the GPU methods, the brute force methods, or the “quantum” hopes and dreams.
Nope.
The write up points out with a nifty diagram with many Intel labels:
Intel is tightly integrating the technologies at both the component and system levels, to create a highly efficient and capable infrastructure. One of the outcomes of this level of integration is how it scales across both the node and the system. The result is that it essentially raises the center of gravity of the memory pyramid and makes it fatter, which will enable faster and more efficient data movement.
I like the mathy center of gravity lingo. It reminds me of the “no gravity” buzzword from 15 years ago.
Allegedly Moore’s Law is dead. Maybe? Maybe not? But as long as we are geared up with Von Neumann’s saddles and bits, Intel is going to ride that pony.
Gentle reader, we need much more computing horse power. Is it time to look for a different horse to ride? Intel does not agree.
Stephen E Arnold, February 27, 2016
New Tor Communication Software for Journalists and Sources Launches
February 29, 2016
A new one-to-one messaging tool for journalists has launched after two years in development. The article Ricochet uses power of the dark web to help journalists, sources dodge metadata laws from The Age describes this new darknet-based software. The unique feature of this software, Ricochet, in comparison to others used by journalists such as Wickr, is that it does not use a server but rather Tor. Advocates acknowledge the risk of this Dark Web software being used for criminal activity but assert the aim is to provide sources and whistleblowers an anonymous channel to securely release information to journalists without exposure. The article explains,
“Dr Dreyfus said that the benefits of making the software available would outweigh any risks that it could be used for malicious purposes such as cloaking criminal and terrorist operations. “You have to accept that there are tools, which on balance are a much greater good to society even though there’s a tiny possibility they could be used for something less good,” she said. Mr Gray argued that Ricochet was designed for one-to-one communications that would be less appealing to criminal and terrorist organisers that need many-to-many communications to carry out attacks and operations. Regardless, he said, the criminals and terrorists had so many encryption and anonymising technologies available to them that pointing fingers at any one of them was futile.”
Online anonymity is showing increasing demand as evidenced through the recent launch of several new Tor-based softwares like Ricochet, in addition to Wickr and consumer-oriented apps like Snapchat. The Dark Web’s user base appears to be growing and diversifying. Will public perception follow suit?
Megan Feil, February 29, 2016
Sponsored by ArnoldIT.com, publisher of the CyberOSINT monograph
Startup Semantic Machines Scores Funding
February 26, 2016
A semantic startup looks poised for success with experienced executives and a hefty investment, we learn from “Artificial Intelligence Startup Semantic Machines Raises $12.3 Million” at VentureBeat. Backed by investors from Bain Capital Ventures and General Catalyst Partners, the enterprise focuses on deep learning and improved speech recognition. The write-up reveals:
“Last year, Semantic Machines named Larry Gillick as its chief technology officer. Gillick was previously chief speech scientist for Siri at Apple. Now Semantic Machines is looking to go further than Siri and other personal digital assistants currently on the market. ‘Semantic Machines is developing technology that goes beyond understanding commands, to understanding conversations,’ the startup says on its website. ‘Our Conversational AI represents a powerful new paradigm, enabling computers to communicate, collaborate, understand our goals, and accomplish tasks.’ The startup is building tools that third-party developers will be able to use.”
Launched in 2014, Semantic Machines is based in Newton, Massachusetts, with offices in Berkeley and Boston. The startup is also seeking to hire a few researchers and engineers, in case anyone is interested.
Cynthia Murrell, February 26, 2016
Sponsored by ArnoldIT.com, publisher of the CyberOSINT monograph
More Hacked US Voter Data Appears on the Dark Web
February 25, 2016
From HackRead comes a piece called More US Voters Data Circulating on the Dark Net, which points to the lack of protection surrounding data on US voters. This data was leaked on the site The Hell on Dark Web. No reports yet suggest how this data was hacked. While no social security numbers or highly sensitive information was released, records include name, date of birth, voter registration dates, voting records, political affiliation and address. Continuing the explanation of implications, the article’s author writes,
“However, it provides any professional hacker substantial information to initiate and plan a phishing attack in the next election which takes place in the US. Recent discoveries, news and speculations have exposed the role of nation-state actors and cyber criminals in planning, instigating and initiating hacking attacks aimed at maligning the upcoming US elections. While social media has emerged as one of the leading platforms adopted by politicians when they wish to spread a certain message or image, cyber criminals and non-state actors are also utilizing the online platform to plan and initiate their hacking attacks on the US election.”
As the article reminds us, this is the not first instance of voter records leaking. Such leaks call into question how this keeps happening and makes us wonder about any preventative measures. The last thing needed surrounding public perception of voting is that it puts one at risk for cyber attacks. Aren’t there already enough barriers in place to keep individuals from voting?
Megan Feil, February 25, 2016
Sponsored by ArnoldIT.com, publisher of the CyberOSINT monograph
monograph
Brown Dog Fetches Buried Data
February 25, 2016
Outdated file formats, particularly those with no metadata, are especially difficult to search and utilize. The National Science Foundation (NSF) reports on a new search engine designed to plumb the unstructured Web in, “Brown Dog: A Search Engine for the Other 99 Percent (ofData).” With the help of a $10 million award from the NSF, a team at the University of Illinois-based National Center for Supercomputing Application (NCSA) has developed two complementary services. Writer Aaron Dubrow explains:
“The first service, the Data Access Proxy (DAP), transforms unreadable files into readable ones by linking together a series of computing and translational operations behind the scenes. Similar to an Internet gateway, the configuration of the Data Access Proxy would be entered into a user’s machine settings and then forgotten. From then on, data requests over HTTP would first be examined by the proxy to determine if the native file format is readable on the client device. If not, the DAP would be called in the background to convert the file into the best possible format….
“The second tool, the Data Tilling Service (DTS), lets individuals search collections of data, possibly using an existing file to discover other similar files in the data. Once the machine and browser settings are configured, a search field will be appended to the browser where example files can be dropped in by the user. Doing so triggers the DTS to search the contents of all the files on a given site that are similar to the one provided by the use…. If the DTS encounters a file format it is unable to parse, it will use the Data Access Proxy to make the file accessible.”
See the article for more on these services, which NCSA’s Kenton McHenry likens to a DNS for data. Brown Dog conforms to NSF’s Data Infrastructure Building Blocks program, which supports development work that advances the field of data science.
Cynthia Murrell, February 25, 2016
Sponsored by ArnoldIT.com, publisher of the CyberOSINT monograph
No Evidence That Terrorists Are Using Bitcoin
February 23, 2016
If you were concerned virtual currencies like Bitcoin are making things easier for Islamic State (aka IS, ISIS, ISIL, or Daesh), you can rest easy, at least for now. The International Business Times reports, “Isis: Bitcoin Not Used by Daesh.” That is the conclusion reached by a Europol investigation performed after last November’s attacks in Paris. Though some had suggested the terrorists were being funded with cyber money, investigators found no evidence of it.
On the other hand, the organization’s communication networks are thriving online through the Dark Web and a variety of apps. Writer Alistair Charlton tells us:
Better known by European law enforcement is how terrorists like IS use social media to communicate. The report says: “The internet and social media are used for communication and the acquisition of goods (weapons, fake IDs) and services, made relatively safe for terrorists with the availability of secure and inherently encrypted appliances, such as WhatsApp, Skype and Viber. In Facebook, VKA and Twitter they join closed and hidden groups that can be accessed by invitation only, and use coded language.”
se of Tor, the anonymising browser used to access the dark web where sites are hidden from search engines like Google, is also acknowledged by Europol. “The use of encryption and anonymising tools prevent conventional observation by security authorities. There is evidence of a level of technical knowledge available to religiously inspired terrorist groups, allowing them to make their use of the internet and social media invisible to intelligence and law enforcement agencies.”
Of course, like any valuable technology, anonymizing apps can be used for weal or woe; they benefit marginalized peoples trying to make their voices heard as much as they do terrorists. Besides, there is no going back to a disconnected world now. My question is whether terrorists have taken the suggestion, and are now working on a Bitcoin initiative. I suppose we will see, eventually.
Cynthia Murrell, February 23, 2016
Sponsored by ArnoldIT.com, publisher of the CyberOSINT monograph
Study Determines Sad News for People Who Look on Facebook “Likes” as Friendship
February 23, 2016
The article on Independent titled Facebook Friends Are Almost entirely Fake, Study Finds illuminates the cold, cold world of Facebook. According to the study, out of the hundreds of “friends” accumulated on Facebook, typically only about four are true blue buds. Most of them are not interested in your life or sympathetic to your problems. 2% are actively trying to stab you in the back. I may have made up the last figure, but you get the picture. The article tells us,
“The average person studied had around 150 Facebook friends. But only about 14 of them would express sympathy in the event of anything going wrong. The average person said that only about 27 per cent of their Facebook friends were genuine. Those numbers are mostly similar to how friendships work in real life, the research said. But the huge number of supposed friends on a friend list means that people can be tricked into thinking that they might have more close friends.”
This is particularly bad news considering how Facebook has opened the gates to all populations meaning that most people have family members on the site in addition to friends. Aunt Mary may have knit you a sweater for Christmas, but she really isn’t interested in your status update about running into your ex and his new girlfriend. If this article teaches us anything, it’s that you should look offline for your real relationships.
Chelsea Kerwin, February 23, 2016
Sponsored by ArnoldIT.com, publisher of the CyberOSINT monograph
The Pros and Cons of Data Silos When It Comes to Data Analysis and Management
February 22, 2016
The article on Informatica Blog titled Data Silos Are the Death of Analytics. Here’s the Fix explores the often overlooked need for a thorough data management vision and strategy at any competitive business. The article is plugging for an eBook guide to data analytics, but it does go into some detail on the early stages of streamlining the data management approach, summarized by the advice to avoid data silos. The article explains,
“It’s vital to pursue a data management architecture that works across any type of data, BI tool, or storage technology. If the move to add Hadoop or NoSQL demands entirely different tools to manage the data, you’re at risk of creating another silo…When you’ve got different tools for your traditional data warehouse versus your cloud setup, and therefore different skill sets to hire for, train for, and maintain, you’re looking at a real mess.”
The suggestions for streamlined processes and analysis certainly make sense, but the article does not defend the reasonable purposes of data silos, such as power, control, and secrecy. Nor do they consider that in some cases a firm is required to create data silos to comply with a government contract. But it is a nice thought: one big collection of data, one comprehensive data strategy. Maybe.
Chelsea Kerwin, February 22, 2016
Sponsored by ArnoldIT.com, publisher of the CyberOSINT monograph

