Talk to Text: Problem. What Problem?
April 15, 2016
I marvel at the baloney I read about smart software. The most effective systems blend humans with sort of smart software. The interaction of the human with the artificial intelligence can speed some work processes. But right now, I am not sure that I want a smart software driven automobile to navigate near the bus on which I am riding. I don’t need smart automobile keys which don’t work when the temperature drops, do you? I am not keen on reading about the wonders of IBM Watson type systems when IBM struggles to generate revenue.
I read “Why Our Crazy-Smart AI Still Sucks at Transcribing Speech.” Frankly I was surprised with the candor about the difficulty software has in figuring out human speech. I highlighted this passage:
“If you have people transcribe conversational speech over the telephone, the error rate is around 4 percent,” says Xuedong Huang, a senior scientist at Microsoft, whose Project Oxford has provided a public API for budding voice recognition entrepreneurs to play with. “If you put all the systems together—IBM and Google and Microsoft and all the best combined—amazingly the error rate will be around 8 percent.” Huang also estimates commercially available systems are probably closer to 12 percent. “This is not as good as humans,” Huang admits, “but it’s the best the speech community can do. It’s about as twice as bad as humans.”
I suggest your read the article. My view is that speech recognition is just one area which requires more time, effort, research, and innovation.
The situation today is that as vendor struggle to prove their relevance and importance to investors, many companies are struggling to generate sustainable revenue. In case anyone has not noticed, Microsoft’s smart system Tay was a source of humor and outrage. IBM Watson spends more on marketing the wonders of its Lucene, acquired technology, and home brew confection than many companies earn in a year.
There are folks who insist that speech to text is not that hard. It may not be hard, but this one tiny niche in the search and content processing sector seems to be lagging. Hyperbole, assurance, and marketing depict one reality. The software often delivers a different one.
Who is the leader? The write up points out:
…most transcription start-ups seem to be mainly licensing Google’s API and going from there.
Yep, the Alphabet Google thing.
Stephen E Arnold, April 15, 2016
First Surface Web Map of the Dark Web
April 15, 2016
Interested in a glimpse of the Dark Web without downloading Tor and navigating it yourself? E-Forensics Magazine published Peeling back the onion part 1: Mapping the Dark Web by Stuart Peck, which shares an overview of services and content in this anonymity-oriented internet. A new map covering the contents of the Dark Web, the first one to do so, was launched recently by a ZeroDayLab key partner, and threat intelligence service Intelliagg. The write-up explains,
“But this brings me to my previous point why is this map so important? Until recently, it had been difficult to understand the relationships between hidden services, and more importantly the classification of these sites. As a security researcher, understanding hidden services, such as private chat forums and closed sites, and how these are used to plan and discuss potential campaigns, such as DDoS, Ransom Attacks, Kidnapping, Hacking, and Trading of Vulnerabilities and leaked data, is key to protecting our clients through proactive threat intelligence.”
Understanding the layout of an online ecosystem is an important first step for researchers or related business ventures. But what about a visualization showing these web services are connected to functions, such as financial and other services, with brick-and-mortar establishments? It is also important to that while this may be the first Surface Web map of the Dark Web, many navigational “maps” on .onion sites that have existed as long as users began browsing on Tor.
Megan Feil, April 15, 2016
Sponsored by ArnoldIT.com, publisher of the CyberOSINT monograph
Microsoft Azure Plans Offers Goldilocks and Three Bears Strategy to Find Perfect Fit
April 15, 2016
The article on eWeek titled Microsoft Debuts Azure Basic Search Tier relates the perks of the new plan from Microsoft, namely, that it is cheaper than the others. At $75 per month (and currently half of for the preview period, so get it while it’s hot!) the Basic Azure plan has lower capacity when it comes to indexing, but that is the intention. The completely Free plan enables indexing of 10,000 documents and allows for 50 megabytes of storage, while the new Basic plan goes up to a million documents. The more expensive Standard plan costs $250/month and provides for up to 180 million documents and 300 gigabytes of storage. The article explains,
“The new Basic tier is Microsoft’s response to customer demand for a more modest alternative to the Standard plans, said Liam Cavanagh, principal program manager of Microsoft Azure Search, in a March 2 announcement. “Basic is great for cases where you need the production-class characteristics of Standard but have lower capacity requirements,” he stated. Those production-class capabilities include dedicated partitions and service workloads (replicas), along with resource isolation and service-level agreement (SLA) guarantees, which are not offered in the Free tier.”
So just how efficient is Azure? Cavanagh stated that his team measured the indexing performance at 15,000 documents per minute (although he also stressed that this was with batches organized into groups of 1,000 documents.) With this new plan, Microsoft continues its cloud’s search capabilities.
Chelsea Kerwin, April 15, 2016
Sponsored by ArnoldIT.com, publisher of the CyberOSINT monograph
Free Book? Semantic Mining of Social Networks
April 14, 2016
I saw a reference to a 2015 book, Semantic Mining of Social Networks by Jie Tang and Juanzi Li. This volume consists of essays about things semantic. Published by Morgan & Claypool publishers, the link I clicked did not return a bibliographic citation nor a review. The link displayed the book which appeared to be downloadable. If your engines are revved with the notion of semantic analysis, you may want to explore the volume yourself. I advocate purchasing monographs. Here’s the link I followed. Keep in mind that if the link 404s you, the fault is not mine.
Stephen E Arnold, April 14, 2016
eBay and Facebook: Different Spins in Online Sales
April 14, 2016
I noted two seemingly unrelated items about two different companies. Here are the two items:
- Russian Diplomat: ISIS Making $200 Million Selling Stolen Artifacts on eBay
- Weapons for Sale on Facebook in Libya
In our work on the “Dark Web Notebook,” we have examined a number of sites which purport to offer contraband or prohibited products. These sites have been accessible using special software.
What is interesting is that the difference between the Dark Web and the “regular” Web seem to be blurring.
If these two stories are accurate, questions about governance by the owners of the Web sites may be raised. Since we began working on this new study of online content, we have noted that the boundary separating the Web which billions use from the Web tailored to a smaller set of online users is growing more difficult to discern.
In itself, the boundary’s change is interesting.
Stephen E Arnold, April 14, 2016
The Force of the Dark Web May Not Need Sides
April 14, 2016
The name “Dark Web” has sensational language written all over it. Such a label calls for myth-busting articles to be published, such as the recent one from Infosecurity Magazine, The Dark Web — Is It All Bad?. This piece highlights the opinions of James Chappell, CTO and Co-founder of Digital Shadows, who argues the way the Dark Web is portrayed in the media pigeonholes sites accessible by Tor as for criminal purposes. Chappell is quoted,
“Looking at some of the press coverage you could be forgiven for thinking that the Dark Web is solely about criminality,” he told Infosecurity. “In reality, this is not the case and there are many legitimate uses alongside the criminal content that can be found on these services. Significantly – criminality is an internet-wide problem, rather than exclusively a problem limited to just the technologies that are labelled with the Dark Web.”
The author’s allusion to Star Wars’ divided force, between supposed “good” and “bad” seems an appropriate analogy to the two sides of the internet. However, with a slightly more nuanced perspective, could it not be argued that Jedi practices, like those of the Sith, are also questionable? Binaries may be our preferred cultural tropes, as well as the building blocks of computer software programming, but let’s not forget the elements of variability: humans and time.
Megan Feil, April 14, 2016
Sponsored by ArnoldIT.com, publisher of the CyberOSINT monograph
Tumblr Tumbles, Marking yet Another Poor Investment Decision by Yahoo
April 14, 2016
The article on VentureBeat titled As Tumblr’s Value Head to Zero, a Look at Where It Ranks Among Yahoo’s 5 Worst Acquisition Deals pokes fun at Yahoo’s tendency to spend huge amounts of cash for companies only to watch them immediately fizzle. In the number one slot is Broadcast.com. Remember that? Me neither. But apparently Yahoo doled out almost $6B in 1999 to wade into the online content streaming game only to shut the company down after a few years. And thusly, we have Mark Cuban. Thanks Yahoo. The article goes on with the ranking,
“2. GeoCities: Yahoo paid $3.6 billion for this dandy that let people who knew nothing about the Web make web pages. Fortunately, this was also mostly shut down, and nearly all of its content vanished, saving most of us from a lot GIF-induced embarrassment. 3. Overture: Yahoo paid $1.63 billion in 2003 for this search engine firm after belatedly realizing that some upstart called Google was eating its lunch. Spoiler alert: Google won.”
The article suggests that Tumblr would slide into fourth place given the $1.1B price tag and two year crash and burn. It also capitulates that there are other ways of measuring this list, such as: levels of hard to watch. By that metric, cheaper deals with more obvious mismanagement like the social sites Flickr or Delicious might take the cake.
Chelsea Kerwin, April 14, 2016
Sponsored by ArnoldIT.com, publisher of the CyberOSINT monograph
The Database Divide: SQL or NoSQL
April 13, 2016
I enjoy reading about technical issues which depend on use cases. When I read “Big Data And RDBMS: Can They Coexist?”, I thought about the premise, not the article. Information Week is one of those once, high flying dead tree outfits which have embraced digital. My hunch is that the juicy headline is designed less to speak to technical issues and more to the need to create some traffic.
In my case, it worked. I clicked. I read. I ignored because obviously specific methods exist because there are different problems to solve.
Here’s what I read after the lusted after click:
Peaceful coexistence is turning out to be the norm, as the two technologies prove to be complementary, not exclusive. As much as casual observers would like to see big data technologies win the future, RDBMS (the basis for SQL and database systems such as Microsoft SQL Server, IBM DB82, Oracle, and MySQL) is going to stick around for a bit longer.
So this is news? In an organization, some types of use cases are appropriate for the row and column approach. Think Excel. Others are better addressed with a whizzy system like Cassandra or a similar data management tool.
The write up reported that Codd based systems are pretty useful for transactions. Yep, that is accurate for most transactional applications. But there are some situations better suited to different approaches. My hunch is that is why Palantir Technologies developed its data management middleware AtlasDB, but let’s not get caught in a specific approach.
The write up points out that governance is a good idea. The context for governance is the SQL world, but my experience is that figuring out what to analyze and how to ensure “good enough” data quality is important for the NoSQL crowd as well.
I noted this statement from the wizard “Brown” who authored Data Mining for Dummies:
Users are not always clear [RDBMS and big data] are different products,” Brown said. “The sales reps are steering them to whatever product they want [the users] to buy.”
Yep, sales. Writing about data can educate, entertain, or market.
In this case, the notion that two technologies themselves content for attention does little to help one determine what method to use and when. Marketing triumphs.
Stephen E Arnold, April 13, 2016
The Story of Google and How It Remains Reliable
April 13, 2016
I noted that Google Books offers a preview of “Site Reliability Engineering: How Google Runs Production System” by a gaggle of Googlers. The book will soon be available from O’Reilly which has given its permission to Google to provide a preview of a book about Google written by Google. You can also find a “summary” of the book at this link. I am not sure who DanLuu is, but the individual “likes this book a lot.” I would, therefore, conclude that he is either a Googler, a Xoogler, or a Googler in waiting.
From the introduction available on Google Books, it seems that the authors are Googlers. The information appears to be an explanation of some of the innovations produced by the Google in the last 15 years, a lot of the philosophy of speed and efficiency, and a bit of Google cheerleading.
What’s the book cover? Here’s a sampling of the subjects:
- A run down of Google’s philosophy of site reliability engineering
- The principles of SRE (eliminating boring manual work, simplicity, etc.)
- Practices (handling problems like cascading failure, data integrity). I would point out that Palantir moved beyond Google’s methods in its rework of Percolator to achieve greater reliability.)
- Management (more of engineering practices than orchestrating humans)
- Conclusions (Google learns which suggests other organizations do not learn).
Each of these sections is chopped into smaller segments. In generate, the writing is less academic than the approach into the technical papers which Googlers deliver at conferences.
You can order the book on Amazon too.
Stephen E Arnold, April 13, 2016
Battlefield Moves Online Forming Cyber Industrial Complex
April 13, 2016
Undoubtedly, in recent decades many processes and products have moved online. Warfare may not be exempt from this migration. Meet The Cyber-Industrial Complex: Private Contractors May Get $7B Windfall From Pentagon’s Cyberwar On ISIS, an article from International Business Times, tells us more. Defense Secretary Ashton Carter recently confirmed U.S. development of digital weapons and training of online soldiers. According to the article,
“Cyberwar threatens to cause havoc worldwide, but it could be good for the U.S. economy and a handful of publicly listed companies. Defense Secretary Ashton Carter, as part of a $582.7 billion budget request to fund his department through 2017, recently said nearly $7 billion of that will be allocated toward improving the military’s ability to develop and deploy offensive cyberweapons. That’s great news for a number of private contractors, who stand to benefit from the spending., and the highly skilled individuals they may end up hiring.”
The article explains these capabilities have been utilized by the U.S. in the past, such as the Kosovo war, but now the U.S. is claiming these tools and tactics. It is an interesting leap to visualize what attacks will evolve to look like on an online battlefield. Equally interesting is the article’s point about conflict being a business opportunity for some; it may also be true to say more problems, more money.
Megan Feil, April 13, 2016
Sponsored by ArnoldIT.com, publisher of the CyberOSINT monograph

