Facebook AI Explainer

June 10, 2016

Facebook posted a partial explanation of its artificial intelligence system. You can review the document “Introducing DeepText: Facebook’s Text Understanding Engine” and decide if Facebook or IBM is winning the smart software race. The Facebook document states:

In traditional NLP approaches, words are converted into a format that a computer algorithm can learn. The word “brother” might be assigned an integer ID such as 4598, while the word “bro” becomes another integer, like 986665. This representation requires each word to be seen with exact spellings in the training data to be understood. With deep learning, we can instead use “word embeddings,” a mathematical concept that preserves the semantic relationship among words. So, when calculated properly, we can see that the word embeddings of “brother” and “bro” are close in space. This type of representation allows us to capture the deeper semantic meaning of words. Using word embeddings, we can also understand the same semantics across multiple languages, despite differences in the surface form. As an example, for English and Spanish, “happy birthday” and “feliz cumpleaños” should be very close to each other in the common embedding space. By mapping words and phrases into a common embedding space, DeepText is capable of building models that are language-agnostic.

Due to Facebook’s grip on the 18 to 35 demographic, its approach may have more commercial impact than the methods in use at other firms. Just ask IBM Watson.

Stephen E Arnold, June 10, 2016

Murdoch Wall Street Journal Factiva: Known Unknowns

June 10, 2016

That Donald Rumsfeld statement about known knowns, known unknowns, etc. Is back. The Wall Street Journal ran an ad for Factiva. You remember Factiva. It is the Dow Jones Information Service repositioned and renamed a number of times over the last 15 or 20 years.

If you are into for fee search, you will know about Factiva and its kissing cousins: LexisNexis (bring your legal client’s purchase order), CSA ProQuest Dialog (bring your library acquisition budget), and Ebsco (bring your credit card). For fee information services serve the professional searcher market. Most people — including Gen X and Millennials researchers — are happy with Google. Objective results every time.

The for-fee services are still around. Public library and university fund raising programs help pay for access. Some queries returning zero useful results can cost $100 or more. Hey, you didn’t know, right?

If you navigate to the June 2, 2016, Wall Street Journal, page A7 in my dead tree edition ran a full page ad for Factiva. The ad highlights a couple of pie charts. Here they are in a tough to read gray and blue motif. Users of commercial database services have really sharp eyes and don’t need high contrast text, right?

The first pie chart shows your life consumed with research. Notice how little time one has to eat lunch. Note what a tiny portion of one’s day is available for email, Facebook, talking with colleagues, making sales calls, printing, the youth soccer telephone tree.

image

Now look at the second chart.

image

Look at the many different tasks one can undertake in a single work day. One can, of course, “take lunch.” I eat lunch, but that’s because here in rural Kentucky, we “eat” a meal. We make decisions. Apparently in Factiva land one takes a meal and probably takes decisions.

Other tasks one can pursue when one has Factiva are:

  • Collaborating across departments
  • Advise colleagues
  • Stay on top of the news (Hey, it is part of that real journalism outfit owned by Mr. Murdoch. No bugging telephones, please.)
  • Create a company newsletter. (I assume this word is “blog”, a Snapchat, or a tweet, but I could be off base.)
  • Build powerful infographics. (Hmmm. I thought art types created infographics based on the data generated by a business intelligence system.)
  • Research. Yes via Factiva.

Now I know that I am really out of the flow. The diagram showing the different between Baby Boomers and Millennials created by ace research analyst Mary Meeker reminded me of the gulf between my demographic and the zippy millennials.

image

Slide 51 from the Meeker, State of the Internet report.

The main point for me is that I possess zero of the attributes of millennials. I don’t earn to spend. I am retired. I conserve to pay for the old age home which I believe millennials call “opportunities for bingo.”

But the best part of the Factiva ad is the copy. I know words. Those nifty pie charts were the cat’s pajamas, weren’t they?

Here’s the guts of the message:

Spend your day working, not searching. Factiva’s reputable sources, flexible search and powerful insights provide access to thousands of quality, licensed, news and information sources in 28 languages. Know unknowns. [Emphasis added]

If Ms. Meeker is correct in her research and the supporting information from Hillhouse Capital and dozens of what appear to be primary sources and many hours of online searching commercial and Web resources — messaging apps are where the future is. Oh, there are videos too, but the takeaway is that traditional methods of getting digital information are in the same spot newspapers were yesterday.

The ad warrants several questions:

  • Why does it have to be so darned big? Maybe small ads in the Wall Street Journal are ignored?
  • How many of the Wall Street Journal’s readers are information specialists trained in the use of commercial online services? Judging from the Special Library Association’s challenges, I would suggest that the ad would have made sense to the corporate information specialist working in 1986, not 2016.
  • What’s with the wonky pie charts? When I worked at a commercial database company, I don’t recall meeting any online users who spent the bulk of every day online. There were reference interviews (remember them, millennials?), culling the outputs from dot matrix printers, and planning search strategies before going online and whacking away.

Mr. Rumsfeld’s statement about knowns and unknowns emerged from his brush with the murky world of government related information. If he were to use Factiva today, would he have modified this famous statement:

There are known knowns. These are things we know that we know. There are known unknowns. That is to say, there are things that we know we don’t know. But there are also unknown unknowns. There are things we don’t know we don’t know.

Perhaps Factiva, like IBM Watson, is easier to describe than turn an information search system into a lean, mean, money making machine? I would suggest that the answer for decades has been an unknown unknown.

Stephen E Arnold, June 10, 2016

The Unknown Future of Google Cloud Platform

June 10, 2016

While many may have the perception Google dominates in many business sectors, a recent graph published shows a different story when it comes to cloud computing. Datamation released a story, Why Google Will Dominate Cloud Computing, which shows Google’s position in fourth. Amazon, Microsoft and IBM are above the search giant in cloud infrastructure services when looking at the fourth quarter market share and revenue growth for 2015. The article explains why Google appears to be struggling,

“Yet as impressive as its tech prowess is, GCP’s ability to cater to the prosaic needs of enterprise cloud customers has been limited, even fumbling. Google has always focused more on selling its own services rather than hosting legacy applications, but these legacy apps are the engine that drives business. Remarkably, GCP customers don’t get support for Oracle software, as they do on Amazon Web Services. Alas, catering to the needs of enterprise clients isn’t about deep genius – it’s about working with others. GCP has been like the high school student with straight A’s and perfect SAT scores that somehow doesn’t have too many friends.”

Despite the current situation, the article hypothesizes Google Cloud Platform may have an edge in the long-term. This is quite a bold prediction. We wonder if Datamation may approach the goog to sell some ads. Probably not, as real journalists do not seek money, right?

 

Megan Feil, June 10, 2016

Sponsored by ArnoldIT.com, publisher of the CyberOSINT monograph

Libraries Will Save the Internet

June 10, 2016

Libraries are more than place to check out free DVDs and books and use a computer.  Most people do not believe this and if you try to tell them otherwise, their eyes glaze offer and they start chanting “obsolete” under their breath.  BoingBoing, however, agrees that “How Libraries Can Save The Internet Of Things From The Web’s Centralized Fate”.  For the past twenty years, the Internet has become more centralized and content is increasingly reliant on proprietary sites, such as social media, Amazon, and Google.

Back in the old days, the greatest fear was that the government would take control of the Internet.  The opposite has happened with corporations consolidating the Internet.  Decentralization is taking place, mostly to keep the Internet anonymous.  Usually, these are tied to the Dark Web.  The next big thing in the Internet is “the Internet of things,” which will be mostly decentralized and that can be protected if the groundwork is laid now.  Libraries can protect decentralized systems, because

“Libraries can support a decentralized system with both computing power and lobbying muscle. The fights libraries have pursued for a free, fair and open Internet infrastructure show that we’re players in the political arena, which is every bit as important as servers and bandwidth.  What would services built with library ethics and values look like? They’d look like libraries: Universal access to knowledge. Anonymity of information inquiry. A focus on literacy and on quality of information. A strong service commitment to ensure that they are available at every level of power and privilege.”

Libraries can teach people how to access services like Tor and disseminate the information to a greater extent than many other institutes within the community.  While this is possible, in many ways it is not realistic due to many factors.  Many of the decentralized factors are associated with the Dark Web, which is held in a negative light.  Libraries also have limited budgets and trying to install a program like this will need finances, which the library board might not want to invest in.  Also comes the problem of locating someone to teach these services.  Many libraries are staffed by librarians that are limited in their knowledge, although they can learn.

It is possible, it would just be hard.

 

Whitney Grace, June 10, 2016
Sponsored by ArnoldIT.com, publisher of the CyberOSINT monograph

Weakly Watson: Unusual Watson Applications

June 9, 2016

The change in leadership at IBM Watson is a bit like the weather. One does not know how the weekend will turn out. I read “5 Unusual Things You Can Do with IBM’s Watson.” I must admit that I have missed the full page ads with weird made up chemical symbols suggesting Watson’s combinatorial magic. I also have missed the “Watson cures cancer” write ups. I always wonder how that project is coming along.

In the unusual write up, I noted the five things; to wit:

  1. Create a “custom” order for granola.
  2. Shop for clothes.
  3. Find a bottle of wine. [Shades of Endeca’s long standing example!]
  4. Ask health questions. [When I worked at Ziff in the 1990s, we had a health reference center which performed the same trick. Libraries loved the system. Doctors, not so much.]
  5. Check into a Hilton and ask about bus routes. [Uber, anyone?]

My hunch is that IBM wants to make darned certain it is in the race for smart software. Okay, IBM Watson with its open source technology, home brew scripts, and acquired technology is really big in artificial intelligence. I give up already.

Custom granola? A slam dunk. Help me shop for clothes? My wife may have some thoughts about that. These five items comprise compelling use cases for someone I assume. Oh, when I check into a hotel, I think Uber, not bus routes. Ever try to take a bus in Xian, China?

Stephen E Arnold, June 3, 2016

More Data to Fuel Debate About Malice on Tor

June 9, 2016

The debate about malicious content on Tor continues. Ars Technica published an article continuing the conversation about Tor and the claims made by a web security company that says 94 percent of the requests coming through the network are at least loosely malicious. The article CloudFlare: 94 percent of the Tor traffic we see is “per se malicious” reveals how CloudFlare is currently handling Tor traffic. The article states,

“Starting last month, CloudFlare began treating Tor users as their own “country” and now gives its customers four options of how to handle traffic coming from Tor. They can whitelist them, test Tor users using CAPTCHA or a JavaScript challenge, or blacklist Tor traffic. The blacklist option is only available for enterprise customers. As more websites react to the massive amount of harmful Web traffic coming through Tor, the challenge of balancing security with the needs of legitimate anonymous users will grow. The same network being used so effectively by those seeking to avoid censorship or repression has become a favorite of fraudsters and spammers.”

Even though the jury may still be out in regards to the statistics reported about the volume of malicious traffic, several companies appear to want action sooner rather than later. Amazon Web Services, Best Buy and Macy’s are among several sites blocking a majority of Tor exit nodes. While a lot seems unclear, we can’t expect organizations to delay action.

 

Megan Feil, June 9, 2016

Sponsored by ArnoldIT.com, publisher of the CyberOSINT monograph

Palantir Technology Takes on Rogue Traders

June 9, 2016

Rogue trading has always been a problem for the stock market, but the more technology advances the easier it becomes for rogue traders to take advantage.  The good news is that security and compliance officers can use the same tools that rogue traders use in their schemes to stop them.  CNBC showed the story; “Tech Takes On Rogue Traders” that explains how technology is being used to stop the bad guys.   The report is described as:

“Colleen Graham, Chief Supervisory Officer at Signac, discusses Palantir and Credit Suisse’s joint technology initiative to crack down on rogue traders.”

Palantir Technology is being used along with Credit Suisse to monitor trader behavior data trade data, risk data, and market data to monitor how a trader changes over time.  They compare individual trader to others invested in similar stocks.  Using a combination of all these data fields, unusual behavior is monitored to prevent rogue trading.

The biggest loss on Wall Street is rogue trading.  The data Signac gathers helps figure out how rogue trading happens and what causes it.  By using analytical software, compliance officers are able to learn from past crimes and teach the software to recognize similar patterns.  In turn, this allows them to prevent future crimes. While some false positives are generated, all of the captured data is public.  Supervisors and other people actually are supposed to read this data; Signac just does so at a more in-depth level.

Catching rogue traders helps keep Wall Street running smoother and even puts the stockbrokers and other financial force back to work.

Palantir scored a new deal from this venture.  The same technology used to monitor the Dark Web is used to capture rogue traders.

Whitney Grace, June 9, 2016
Sponsored by ArnoldIT.com, publisher of the CyberOSINT monograph

Google Wants China. Does China Want Google?

June 8, 2016

Years ago Google acted more like a nation state than a Silicon Valley outfit selling online advertising. China wanted Google to be — well, China supportive. Google wanted China to change. The dog was not wagged by that dog.

China is a large market. Google is locked in a perceptual battle with regulators, lawyers, and elected officials over its approach to business. Plus, Google has to face the reality of Amazon and Facebook. Both of these outfits are growing in market sectors which Google has not been able to put into its barn; namely, ecommerce and social media.

China, therefore, looks tempting.

I read “Google CEO: Open to returning to China.” I would assume so. China is struggling, but it does have quite a few eyeballs online. [Note that when you follow the link to the original, be careful where you click; otherwise USA Today sends you to an index page so you get to click on the Google story and see another video ad.]

The write up reports Google’s chief ad sales professional as saying:

“If we can do it in the right and thoughtful way, we are always open to it,” said Pichai at the Code conference here. “I care about serving consumers everywhere.”

I wonder if China wants Google in a “right and thoughtful way.”

Stephen E Arnold, June 8, 2016

Image Recognition: Think Tattoo Recognition

June 8, 2016

I know that some bad guys encourage their “assistants” to get facial tattoos. I am not personally into tattoos, but there are some who believe that one’s immune system is strengthened via the process. The prison tattoos I have seen in pictures mind you, did not remind me of the clean room conditions in some semi conductor fabrication facilities. I am confident that ball point pen ink, improvised devices, and frequent hand washing are best practices.

I read “Tattoo Recognition Research Threatens Free Speech and Privacy.” The write up states:

government scientists are working with the FBI to develop tattoo recognition technology that police can use to learn as much as possible about people through their tattoos.

The write up points out that privacy is an issue.

My question:

If a person gets a facial tattoo, perhaps that individual wants others to notice it?

I have heard that some bad guys want their “assistants” to get facial tattoos. With a message about a specific group, it makes it difficult for an “assistant” to join another merry band of pranksters.

Stephen E Arnold, June 8, 2016

Enterprise Search Vendor Sinequa Partners with MapR

June 8, 2016

In the world of enterprise search and analytics, everyone wants in on the clients who have flocked to Hadoop for data storage. Virtual Strategy shared an article announcing Sinequa Collaborates With MapR to Power Real-Time Big Data Search and Analytics on Hadoop. A firm specializing in big data, Sinequa, has become certified with the MapR Converged Data Platform. The interoperation of Sinequa’s solutions with MapR will enable actionable information to be gleaned from data stored in Hadoop. We learned,

“By leveraging advanced natural language processing along with universal structured and unstructured data indexing, Sinequa’s platform enables customers to embark on ambitious Big Data projects, achieve critical in-depth content analytics and establish an extremely agile development environment for Search Based Applications (SBA). Global enterprises, including Airbus, AstraZeneca, Atos, Biogen, ENGIE, Total and Siemens have all trusted Sinequa for the guidance and collaboration to harness Big Data to find relevant insight to move business forward.”

Beyond all the enterprise search jargon in this article, the collaboration between Sinequa and MapR appears to offer an upgraded service to customers. As we all know at this point, unstructured data indexing is key to data intake. However, when it comes to output, technological solutions that can support informed business decisions will be unparalleled.

 

Megan Feil, June 8, 2016

Sponsored by ArnoldIT.com, publisher of the CyberOSINT monograph

 

« Previous PageNext Page »

  • Archives

  • Recent Posts

  • Meta