HP Autonomy: Why Do the Deal?
September 30, 2015
I read “If Ray Lane Hated HP’s Autonomy Move So Much, How Did It Happen?”
Darned good question. The article reviews information which suggests that HP chairman was uncomfortable with the tie up. Also, HP’s CFO gave the deal a thumbs down.
According to the article:
Reached for comment an HP spokesperson reiterated that the Autonomy buy had unanimous support from the board.
I assume this is the HP way.
Stephen E Arnold, September 30, 2015
The Many Applications of Predictive Analytics
September 29, 2015
The article on Computer World titled Technology that Predicts Your Next Security Fail confers the current explosion in predictive analytics, the application of past occurrences to predict future occurrences. The article cites the example of the Kentucky Department of Revenue (DOR), which used predictive analytics to catch fraud. By providing SAS with six years of data the DOR received a batch of new insights into fraud indicators such as similar filings from the same IP address. The article imparts words of wisdom from SANS Institute instructor Phil Hagen,
“Even the most sophisticated predictive analytics software requires human talent, though. For instance, once the Kentucky DOR tools (either the existing checklist or the SAS tool) suspect fraud, the tax return is forwarded to a human examiner for review. “Predictive analytics is only as good as the forethought you put into it and the questions you ask of it,” Hagen warns…. Also It’s imperative that data scientists, not security teams, drive the predictive analytics project.”
In addition to helping the IRS avoid major fails like the 2013 fraudulent refunds totaling $5.8 billion, predictive analytics has other applications. Perhaps most interesting is its use protecting human assets in regions where kidnappings are common by detecting unrest and alerting organizations to lock up their doors. But it is hard to see limitations for technology that so accurately reads the future.
Chelsea Kerwin, September 29, 2015
Sponsored by ArnoldIT.com, publisher of the CyberOSINT monograph
Google and YouTube Views: Relevance or Money?
September 24, 2015
I read “Google Charges Advertisers for Fake YouTube Video Views, Say Researchers.” My goodness, will criticism of Alphabet Google continue to escalate?
The trigger for the newspaper article’s story with the somewhat negative headline was an academic paper called “Understanding the Detection of Fake View Fraud in Video Content Portals.” The data presented in the journal by seven European wizards suggests that an Alphabet Google type company knows when a video is viewed by a software robot, not a credit card toting human.
“Fake view fraud” is a snappy phrase.
According to the Guardian newspaper write up about the technical paper:
The researchers’ paper says that while substantial effort has been devoted to understanding fraudulent activity in traditional online advertising such as search and banner ads, more recent forms such as video ads have received little attention. It adds that while YouTube’s system for detecting fake views significantly outperforms others, it may still be susceptible to simple attacks.
Is this a Volkswagen-type spoof? Instead of fiddling with fuel efficiency, certain online video portals are playing fast and loose with charging for video ads not displayed to a human with a PayPal account?
Years ago an outfit approached me with a proposition for a seminar about online advertising fraud. I declined. I am confident that the giant companies and their wizards in the ad biz possess business ethics which put the investment bankers to shame. I recall discussing systems and methods with a couple of with it New Yorkers. The lunch topic was dynamically relaxing the threshold for displaying content in response to certain queries.
My comment pointed to ways to determine if an ad “relevant” was relevant to a higher percentage of user queries. I called this “query and ad matching relaxation.”
I did not include a discussion of “relaxation” in my 2003-2004 study Google Version 2.0, which is now out of print. The systems and methods disclosed in technical papers by researchers who ended up working for large online advertising methods were just more plumbing for smart software.
When an ad does not match a query, that’s the challenge of figuring out what’s relevant and what’s irrelevant.
My thought in 2003 when I started writing the book was that most content was essentially spoofed and sponsored. I wanted to focus on more interesting innovations like the use of game theory in online advertising interfaces and the clever notion of “janitors” which were software routines able to clean up “bad” or “incomplete” data.
As I recall, that New York City guy was definitely interested in the notion of tuning ad results to generate money for the ad distribution and not so much for the advertiser. For me, no interest in lecturing a group of ad execs about their business. These folks can figure out the ins and outs of their business without inputs from an old person in Kentucky.
Mobile and video access to digital content do pose some interesting challenges in the online advertising world. My hunch is that the Alphabet Google type outfits and the intrepid researchers will find common ground. If the meeting progresses smoothly, perhaps a T shirt or mouse pad will be offered to some of the participants?
I remain confident that allegations about slippery behavior in online advertising are baseless. Online advertising is making life better and better for users everyday.
The experience of online advertising is thrilling. I am not sure the experience of receiving unwanted advertisements can be improved? Why read a Web page when one can view an overlay which obscures the desired content? Why work in a quite office? Answer: It is simply easier to hear the auto play videos on many Web pages. Why puzzle over a search results page which blurs sponsored hits from relevant content? By definition, displayed information is relevant information, gentle reader. Do you have a problem with that?
Google, according to the article, will chat up the seven experts who reported on the alleged fraud. I am confident that the confusion in the perceptions of the researchers will be replaced with crystal clear thinking.
Online ad fraud? What a silly notion.
Stephen E Arnold, September 24, 2015
Microsoft Bing in Edge is Baidu: Confused?
September 24, 2015
I received an alert about Bing. I usually ignore these. The headline did not reference search. The article is billed as “Windows 10 in China.” I am not sure why I scanned the item, but I noted that the Microsoft blog post contained an interesting factoid about Bing search.
Here’s the passage I noted:
Together [Baidu and Microsoft], we will make it easy for Baidu customers to upgrade to Windows 10 and we will deliver a custom experience for customers in China, providing local browsing and search experiences. Baidu.com will become the default homepage and search for the Microsoft Edge browser in Windows 10.
I wondered if I understood the message. The Windows 10 browser, called Edge, will include a Web and local search function. The search is going to be provided by Baidu for “local browsing and search experiences.”
I find this interesting for two reasons: Is Bing, assisted by a search wizard from Australia, now “funneling” queries to Baidu? and Has Microsoft given up on the job of indexing Chinese language content?
I recall reading “About Microsoft Research Asia,” and learning that one of the goals for Microsoft’s expanding research activities in Asia was:
Search and online advertising takes Web search and online advertising to the next level by applying data-mining, machine-learning and knowledge-discovery techniques to information analysis, organization, retrieval and visualization.
Now the company is relying on a third party for search. Is this a signal that Bing is not up to the search and retrieval job in China?
Stephen E Arnold, September 24, 2015
Rundown on Legal Knowledge Management
September 24, 2015
One of the new legal buzzwords is knowledge management and not just old-fashioned knowledge management, but rather quick, efficient, and effective. Time is an expensive commodity for legal professionals, especially with the amount of data they have to sift through for cases. Mondaq explains the importance of knowledge management for law professionals in the article, “United States: A Brief Overview Of Legal Knowledge Management.”
Knowledge management first started in creating an effective process for managing, locating, and searching relevant files, but it quickly evolved into implementing a document managements system. While knowledge management companies offered law practices decent document management software to tackle the data hill, an even bigger problem arose. The law practices needed a dedicated person to be software experts:
“Consequently, KM emphasis had to shift from finding documents to finding experts. The expert could both identify useful documents and explain their context and use. Early expertise location efforts relied primarily on self-rating. These attempts almost always failed because lawyers would not participate and, if they did, they typically under- or over-rated themselves.”
The biggest problem law professional face is that they might invest a small fortune in a document management license, but they do not know how to use the software or do not have the time to learn. It is a reminder that someone might have all the knowledge and best tools at their fingertips, but unless people have the knowledge on how to use and access it, the knowledge is useless.
Whitney Grace, September 24, 2015
Sponsored by ArnoldIT.com, publisher of the CyberOSINT monograph
Funding Granted for American Archive Search Project
September 23, 2015
Here’s an interesting project: we received an announcement about funding for Pop Up Archive: Search Your Sound. A joint effort of the WGBH Educational Foundation and the American Archive of Public Broadcasting, the venture’s goal is nothing less than to make almost 40,000 hours of Public Broadcasting media content easily accessible. The American Archive, now under the care of WGBH and the Library of Congress, has digitized that wealth of sound and video. Now, the details are in the metadata. The announcement reveals:
“As we’ve written before, metadata creation for media at scale benefits from both machine analysis and human correction. Pop Up Archive and WGBH are combining forces to do just that. Innovative features of the project include:
*Speech-to-text and audio analysis tools to transcribe and analyze almost 40,000 hours of digital audio from the American Archive of Public Broadcasting
*Open source web-based tools to improve transcripts and descriptive data by engaging the public in a crowdsourced, participatory cataloging project
*Creating and distributing data sets to provide a public database of audiovisual metadata for use by other projects.
“In addition to Pop Up Archive’s machine transcripts and automatic entity extraction (tagging), we’ll be conducting research in partnership with the HiPSTAS center at University of Texas at Austin to identify characteristics in audio beyond the words themselves. That could include emotional reactions like laughter and crying, speaker identities, and transitions between moods or segments.”
The project just received almost $900,000 in funding from the Institute of Museum and Library Services. This loot is on top of the grant received in 2013, from the Corporation for Public Broadcasting, that got the project started. But will it be enough money to develop a system that delivers on-point results? If not, we may be stuck with something clunky, something that resembles the old Autonomy Virage, Blinkxx, Exalead video search, or Google YouTube search. Let us hope this worthy endeavor continues to attract funding so that, someday, anyone can reliably (and intuitively) find valuable Public Broadcasting content.
Cynthia Murrell, September 23, 2015
Sponsored by ArnoldIT.com, publisher of the CyberOSINT monograph
Exalead Gets a New Application
September 22, 2015
Exalead is Dassault Systems’s big data software targeted specifically at businesses. Exalead offers innovative data discovery and analytics solutions to manage information in real time across various servers and generate insightful reports to make better, faster decisions. It is the big data solution of choice for many businesses across various industries. The Exalead blog shares that “PricewaterhouseCoopers Is Launching Its Information Management Application, Based on Exalead CloudView.”
PricewaterhouseCoopers (PwC) analyzed the amount of time users spent trying to locate, organize, and disseminated information. When users spend the time on information management, they lose two valuable resources: time and money. PwC designed Pulse, a search and information tool as a solution to the problem.
“The EXALEAD CloudView software solution from Dassault Systèmes facilitates the rapid search and use of sources of structured and unstructured information. In existence since 2007, this enterprise information management concept was integrated initially in other software applications. Since it was reworked as EXALEAD CloudView, the configuration of the queries has become easier and they are processed much faster. Furthermore, the results of the searches are more precise, significantly reducing the number of duplicates and the time wasted managing them. PwC has deliberately decided to roll out Pulse on an international scale gradually, in order to generate plenty of enthusiasm amongst users. A business case is prepared for each country on the basis of its needs, the benefits and the potential savings. PwC also intends to make the content in Pulse accessible by other internal systems (e.g., the project workspaces), to integrate the sources, and to make the search function even smarter.”
Pulse is supposed to cut costs and reinvest the resources into more fruitful venues. One interesting aspect to note is that PwC did not build the Pulse upgrade, Exalead provided the plumbing.
Whitney Grace, September 22, 2015
Sponsored by ArnoldIT.com, publisher of the CyberOSINT monograph
New Search System for Comparing Companies
September 22, 2015
There is a new tool out to help companies compile information on their competitors: RivalSeek. This brainchild of entrepreneur Richard Brevig seeks to combat an issue he encountered when he turned to Google while researching the market for a different project: Google’s “personalized search” filters
keep users from viewing the whole landscape of any particular field. Frustration led Brevig to develop some tools of his own, which he realized might appeal to others. The site’s homepage explains simply:
“Find your competitors that Google can’t. RivalSeek’s competitor search engine looks past filter bubbles, finding competitors you’ve never heard of.”
More information can be found in Brevig’s brief introductory video on YouTube. There’s also this “quick demo,” which can be found on YouTube or playing quietly on RivalSeek’s home page. While the tool is still in Beta, Brevig is confident enough in its usefulness to charge $29 a month for access. You can find an example success story, for the Dollar Shave Club, at the company’s blog.
This is a great idea. While Google’s filter bubbles can be convenient, it is clear that confirmation bias is not their only hazard. Perhaps Brevig would be interested in expanding this tool into other areas, like science, literature, or sociology. Just a suggestion.
Cynthia Murrell, September 22, 2015
Sponsored by ArnoldIT.com, publisher of the CyberOSINT monograph
Redundant Dark Data
September 21, 2015
Have you heard the one about how dark data hides within an organization’s servers and holds potential business insights? Wait, you did not? Then where have you been for the past three years? Datameer posted an SEO heavy post on its blog called, “Shine Light On Dark Data.” The post features the same redundant song and dance about how dark data retained on server has valuable customer trend and business patterns that can put them bring them out ahead of the competition.
One new fact is presented: IDC reports that 90% of digital data is dark. That is a very interesting fact and spurs information specialists to action to get a big data plan in place, but then we are fed this tired explanation:
“This dark data may come in the form of machine or sensor logs that when analyzed help predict vacated real estate or customer time zones that may help businesses pinpoint when customers in a specific region prefer to engage with brands. While the value of these insights are very significant, setting foot into the world of dark data that is unstructured, untagged and untapped is daunting for both IT and business users.”
The post ends on some less than thorough advice to create an implementation plan. There are other guides on the Internet that better prepare a person to create a big data action guide. The post’s only purpose is to serve as a search engine bumper for Datameer. While Datameer is one of the leading big data software providers, one would think they wouldn’t post a “dark data definition” post this late in the game.
Whitney Grace, September 21, 2015
Sponsored by ArnoldIT.com, publisher of the CyberOSINT monograph
The Semantic Web Has Arrived
September 20, 2015
Short honk: If you want evidence of the impact of the semantic Web, you will find “What Happened to the Semantic Web?” useful. The author captures 10 examples of the semantic Web in action. I highlighted this passage in the narrative accompanying the screenshots:
there is no question that the Web already has a population of HTML documents that include semantically-enriched islands of structured data. This new generation of documents creates a new Web dimension in which links are no longer seen solely as document addresses, but can function as unambiguous names for anything, while also enabling the construction of controlled natural language sentences for encoding and decoding information [data in context] — comprehensible by both humans and machines (bots).
Structured data will probably play a large part in the new walled gardens now under construction.
The conclusion will thrill the search engine optimization folks who want to decide what is relevant to a user’s query; to wit:
A final note — The live demonstrations in this post demonstrate a fundamental fact: the addition of semantically-rich structured data islands to documents already being published on the Web is what modern SEO (Search Engine Optimization) is all about. Resistance is futile, so just get with the program — fast!
Be happy.
Stephen E Arnold, September 20, 2015