Microsoft: On the Bandwagon Singing Me Too
November 30, 2016
In my dead tree copy of the November 21, 2016, New York Times (which just report a modest drop in profits), I read a bit of fluff called “Microsoft Spends Big to Build a Computer Out of Science Fiction.” (If you have to pay to view the source, don’t honk at Beyond Search. Let your favorite national newspaper know directly.)
The main point of the PR piece was to make clear that Microsoft is not lagging behind the Alphabet Google thing in quantum computing. Also, Microsoft is not forking over a measly couple of hundred bucks. Nope, Microsoft is spending “big.” I learned from the write up:
There is a growing optimism in the tech world that quantum computers, super powerful devices that were once the stuff of science fiction, are possible — and may even be practical.
I think “spending” is a nice way to say “betting.”
I learned:
In the exotic world of quantum physics, Microsoft has set itself apart from its competitors by choosing a different path. The company’s approach is based on “braiding” particles known as anyons — which physicists describe as existing in just two dimensions — to form the building blocks of a supercomputer that would exploit the unusual physical properties of subatomic particles.
One problem. The Google DWave gizmos are not exactly ready for use in your mobile phone. The Microsoft approach is the anyon, and it is anyone’s guess if the Microsofties can make the gizmo do something useful for opening Word or, like IBM, treat cancer or, like Google, “solve death.”
Where on the journey to the anyon is Microsoft? It seems that this sentence suggests that Microsoft is just about ready to start thinking about planning a trip down computing lane:
Once we get the first qubit figured out, we have a road map that allows us to go to thousands of qubits in a rather straightforward way,” Mr. Holmdahl [a Microsoftie who has avoided termination] said.
Yep, get those qubits working and then one can solve problems in quantum physics or perhaps get Microsoft Word’s auto numbering system to work. Me too, me too. Do you hear the singing? I do.
Stephen E Arnold, November 30, 2016
Emphasize Data Suitability over Data Quantity
November 30, 2016
It seems obvious to us, but apparently, some folks need a reminder. Harvard Business Review proclaims, “You Don’t Need Big Data, You Need the Right Data.” Perhaps that distinction has gotten lost in the Big Data hype. Writer Maxwell Wessel points to Uber as an example. Though the company does collect a lot of data, the key is in which data it collects, and which it does not. Wessel explains:
In an era before we could summon a vehicle with the push of a button on our smartphones, humans required a thing called taxis. Taxis, while largely unconnected to the internet or any form of formal computer infrastructure, were actually the big data players in rider identification. Why? The taxi system required a network of eyeballs moving around the city scanning for human-shaped figures with their arms outstretched. While it wasn’t Intel and Hewlett-Packard infrastructure crunching the data, the amount of information processed to get the job done was massive. The fact that the computation happened inside of human brains doesn’t change the quantity of data captured and analyzed. Uber’s elegant solution was to stop running a biological anomaly detection algorithm on visual data — and just ask for the right data to get the job done. Who in the city needs a ride and where are they? That critical piece of information let the likes of Uber, Lyft, and Didi Chuxing revolutionize an industry.
In order for businesses to decide which data is worth their attention, the article suggests three guiding questions: “What decisions drive waste in your business?” “Which decisions could you automate to reduce waste?” (Example—Amazon’s pricing algorithms) and “What data would you need to do so?” (Example—Uber requires data on potential riders’ locations to efficiently send out drivers.) See the article for more notes on each of these guidelines.
Cynthia Murrell, November 30, 2016
Sponsored by ArnoldIT.com, publisher of the CyberOSINT monograph
Wisdom from the First OReilly AI Conference
November 28, 2016
Forbes contributor Gil Press nicely correlates and summarizes the insights he found at September’s inaugural O’Reilly AI Conference, held in New York City, in his article, “12 Observations About Artificial Intelligence from the O’Reily AI Conference.” He begins:
At the inaugural O’Reilly AI conference, 66 artificial intelligence practitioners and researchers from 39 organizations presented the current state-of-AI: From chatbots and deep learning to self-driving cars and emotion recognition to automating jobs and obstacles to AI progress to saving lives and new business opportunities. … Here’s a summary of what I heard there, embellished with a few references to recent AI news and commentary.
Here are Press’ 12 observations; check out the article for details on any that spark your interest: “AI is a black box—just like humans”; “AI is difficult”; “The AI driving driverless cars is going to make driving a hobby. Or maybe not”; “AI must consider culture and context”; “AI is not going to take all our jobs”; “AI is not going to kill us”; “AI isn’t magic and deep learning is a useful but limited tool”; “AI is Augmented Intelligence”; “AI changes how we interact with computers—and it needs a dose of empathy”; “AI should graduate from the Turing Test to smarter tests”; “AI according to Winston Churchill”; and “AI continues to be possibly hampered by a futile search for human-level intelligence while locked into a materialist paradigm.”
It is worth contemplating the point Press saved for last—are we even approaching this whole AI thing from the most productive angle? He ponders:
Is it possible that this paradigm—and the driving ambition at its core to play God and develop human-like machines—has led to the infamous ‘AI Winter’? And that continuing to adhere to it and refusing to consider ‘genuinely new ideas,’ out-of-the-dominant-paradigm ideas, will lead to yet another AI Winter? Maybe, just maybe, our minds are not computers and computers do not resemble our brains? And maybe, just maybe, if we finally abandon the futile pursuit of replicating ‘human-level AI’ in computers, we will find many additional–albeit ‘narrow’–applications of computers to enrich and improve our lives?
I think Press is on to something. Perhaps we should admit that anything approaching Rosie the Robot is still decades away (according to conference presenter Oren Etzioni). At this early date, we may do well to accept and applaud specialized AIs that do one thing very well but are completely ignorant of everything else. After all, our Roombas are unlikely to attempt conquering the world.
Cynthia Murrell, November 28, 2016
Sponsored by ArnoldIT.com, publisher of the CyberOSINT monograph
Machine Learning Does Not Have All the Answers
November 25, 2016
Despite our broader knowledge, we still believe that if we press a few buttons and press enter computers can do all work for us. The advent of machine learning and artificial intelligence does not repress this belief, but instead big data vendors rely on this image to sell their wares. Big data, though, has its weaknesses and before you deploy a solution you should read Network World’s, “6 Machine Learning Misunderstandings.”
Pulling from Juniper Networks’s security intelligence software engineer Roman Sinayev explains some of the pitfalls to avoid before implementing big data technology. It is important not to take into consideration all the variables and unexpected variables, otherwise that one forgotten factor could wreck havoc on your system. Also, do not forget to actually understand the data you are analyzing and its origin. Pushing forward on a project without understanding the data background is a guaranteed fail.
Other practical advice, is to build a test model, add more data when the model does not deliver, but some advice that is new even to us is:
One type of algorithm that has recently been successful in practical applications is ensemble learning – a process by which multiple models combine to solve a computational intelligence problem. One example of ensemble learning is stacking simple classifiers like logistic regressions. These ensemble learning methods can improve predictive performance more than any of these classifiers individually.
Employing more than one algorithm? It makes sense and is practical advice why did that not cross our minds? The rest of the advice offered is general stuff that can be applied to any project in any field, just change the lingo and expert providing it.
Whitney Grace, November 25, 2016
Sponsored by ArnoldIT.com, publisher of the CyberOSINT monograph
The Noble Quest Behind Semantic Search
November 25, 2016
A brief write-up at the ontotext blog, “The Knowledge Discovery Quest,” presents a noble vision of the search field. Philologist and blogger Teodora Petkova observed that semantic search is the key to bringing together data from different sources and exploring connections. She elaborates:
On a more practical note, semantic search is about efficient enterprise content usage. As one of the biggest losses of knowledge happens due to inefficient management and retrieval of information. The ability to search for meaning not for keywords brings us a step closer to efficient information management.
If semantic search had a separate icon from the one traditional search has it would have been a microscope. Why? Because semantic search is looking at content as if through the magnifying lens of a microscope. The technology helps us explore large amounts of systems and the connections between them. Sharpening our ability to join the dots, semantic search enhances the way we look for clues and compare correlations on our knowledge discovery quest.
At the bottom of the post is a slideshow on this “knowledge discovery quest.” Sure, it also serves to illustrate how ontotext could help, but we can’t blame them for drumming up business through their own blog. We actually appreciate the company’s approach to semantic search, and we’d be curious to see how they manage the intricacies of content conversion and normalization. Founded in 2000, ontotext is based in Bulgaria.
Cynthia Murrell, November 25, 2016
Sponsored by ArnoldIT.com, publisher of the CyberOSINT monograph
Keeping Current with Elastic.co
November 24, 2016
Short honk. If you want to keep up with Elastic and Elasticsearch, the company’s “This Week in Elasticsearch and Apache Lucene” may be of interest. The weekly posting includes information about commits, releases, and training. Unlike the slightly crazed, revenue challenged open source search vendors, Elastic.co provides factual information about the plumbing for the search and retrieval system. We found the “Ongoing Changes” section useful and interesting. The idea is that one can keep track of certain features, methods, and issues by scanning a list. The short description of an issue, for instance, includes a link to additional information. Highly recommended for those hooked on Elastic.co’s free and open source solution or the for fee products and services the company offers.
Stephen E Arnold, November 24, 2016
Do Not Forget to Show Your Work
November 24, 2016
Showing work is messy, necessary step to prove how one arrived at a solution. Most of the time it is never reviewed, but with big data people wonder how computer algorithms arrive at their conclusions. Engadget explains that computers are being forced to prove their results in, “MIT Makes Neural Networks Show Their Work.”
Understanding neural networks is extremely difficult, but MIT’s Computer Science and Artificial Intelligence Laboratory (CSAIL) has developed a way to map the complex systems. CSAIL figured the task out by splitting networks in two smaller modules. One for extracting text segments and scoring according to their length and accordance and the second module predicts the segment’s subject and attempts to classify them. The mapping modules sounds almost as complex as the actual neural networks. To alleviate the stress and add a giggle to their research, CSAIL had the modules analyze beer reviews:
For their test, the team used online reviews from a beer rating website and had their network attempt to rank beers on a 5-star scale based on the brew’s aroma, palate, and appearance, using the site’s written reviews. After training the system, the CSAIL team found that their neural network rated beers based on aroma and appearance the same way that humans did 95 and 96 percent of the time, respectively. On the more subjective field of “palate,” the network agreed with people 80 percent of the time.
One set of data is as good as another to test CSAIL’s network mapping tool. CSAIL hopes to fine tune the machine learning project and use it in breast cancer research to analyze pathologist data.
Whitney Grace, November 24, 2016
Sponsored by ArnoldIT.com, publisher of the CyberOSINT monograph
Dawn of Blockchain Technology
November 24, 2016
Blockchain technology though currently powers the Bitcoin and other cryptocurrencies, soon the technology might find takers in mainstream commercial activities.
Blockgeeks in an in-depth article guide titled What Is Blockchain Technology? A Step-By-Step Guide for Beginners says:
The blockchain is an incorruptible digital ledger of economic transactions that can be programmed to record not just financial transactions but virtually everything of value.
Without getting into how the technology works, it would be interesting to know how and where the revolutionary technology can be utilized. Due to its inherent nature of being incorruptible due to human intervention and non-centralization, blockchain has numerous applications in the field of banking, remittances, shared economy, crowdfunding and many more, the list is just endless.
The technology will be especially helpful for people who transact over the Web and as the article points out:
Goldman Sachs believes that blockchain technology holds great potential especially to optimize clearing and settlements, and could represent global savings of up to $6bn per year.
Governments and commercial establishment, however, are apprehensive about it as blockchain might end their control over a multitude of things. Just because blockchain never stores data at one location. This also is the reason why Bitcoin is yet to gain full acceptance. But, can a driving force like blockchain technology that will empower the actual users can be stopped?
Vishal Ingole, November 24, 2016
Sponsored by ArnoldIT.com, publisher of the CyberOSINT monograph
Writing That Is Never Read
November 23, 2016
It is inevitable in college that you were forced to write an essay. Writing an essay usually requires the citation of various sources from scholarly journals. As you perused the academic articles, the thought probably crossed your mind: who ever reads this stuff? Smithsonian Magazine tells us who in the article, “Academics Write Papers Arguing Over How Many People Read (And Cite) Their Papers.” In other words, themselves.
Academic articles are read mostly by their authors, journal editors, and the study’s author write, and students forced to cite them for assignments. In perfect scholarly fashion, many academics do not believe that their work has a limited scope. So what do they do? They decided to write about it and have done so for twenty years.
Most academics are not surprised that most written works go unread. The common belief is that it is better to publish something rather than nothing and it could also be a requirement to keep their position. As they are prone to do, academics complain about the numbers and their accuracy:
It seems like this should be an easy question to answer: all you have to do is count the number of citations each paper has. But it’s harder than you might think. There are entire papers themselves dedicated to figuring out how to do this efficiently and accurately. The point of the 2007 paper wasn’t to assert that 50 percent of studies are unread. It was actually about citation analysis and the ways that the internet is letting academics see more accurately who is reading and citing their papers. “Since the turn of the century, dozens of databases such as Scopus and Google Scholar have appeared, which allow the citation patterns of academic papers to be studied with unprecedented speed and ease,” the paper’s authors wrote.
Academics always need something to argue about, no matter how miniscule the topic. This particular article concludes on the note that someone should get the number straight so academics can move onto to another item to argue about. Going back to the original thought a student forced to write an essay with citations also probably thought: the reason this stuff does not get read is because they are so boring.
Whitney Grace, November 23, 2016
Sponsored by ArnoldIT.com, publisher of the CyberOSINT monograph
Exit Shakespeare, for He Had a Coauthor
November 22, 2016
Shakespeare is regarded as the greatest writer in the English language. Many studies, however, are devoted to the theory that he did not pen all of his plays and poems. Some attribute them to Francis Bacon, Edward de Vere, Christopher Marlowe, and others. Whether Shakespeare was a singular author or one of many, two facts remain: he was a dirty, old man and it could be said he plagiarized his ideas from other writers. Shall he still be regarded as the figurehead for English literature?
Philly.com takes the Shakespeare authorship into question in the article, “Penn Engineers Use Big Data To Show Shakespeare Had Coauthor On ‘Henry VI’ Plays.” Editors of a new edition of Shakespeare’s complete works listed Marlowe as a coauthor on the Henry VI plays due to a recent study at the University of Pennsylvania. Alejandro Ribeiro used his experience researching networks could be applied to the Shakespeare authorship question using big data.
Ribeiro learned that Henry VI was among the works for which scholars thought Shakespeare might have had a co-author, so he and lab members Santiago Segarra and Mark Eisen tackled the question with the tools of big data. Working with Shakespeare expert Gabriel Egan of De Montfort University in Leicester, England, they analyzed the proximity of certain target words in the playwright’s works, developing a statistical fingerprint that could be compared with those of other authors from his era.
Two other research groups had the same conclusion with other analytical techniques. The results from all three studies were enough to convince the lead general editor of the New Oxford Shakespeare Gary Taylor, who decided to list Marlowe as a coauthor to Henry VI. More research has been conducted to determine other potential Shakespeare coauthors and six more will also be credited in the New Oxford editions.
Ribeiro and his team created “word-adjacency networks” that discovered patterns in Shakespeare’s writing style and six other dramatists. They discovered that many scenes in Henry VI were non-written in Shakespeare’s style, enough to prove a coauthor.
Some Shakespeare purists remain against the theory that Shakespeare did not pen all of his plays, but big data analytics proves many of the theories that other academics have theorized for generations. The dirty old man was not old alone as he wrote his ditties.
Whitney Grace, November 22, 2016
Sponsored by ArnoldIT.com, publisher of the CyberOSINT monograph

