Explaining Big Data Mythology
May 14, 2015
Mythologies usually develop over a course of centuries, but big data has only been around for (arguably) a couple decades—at least in the modern incarnate. Recently big data has received a lot of media attention and product development, which was enough to give the Internet time to create a big data mythology. The Globe and Mail wanted to dispel some of the bigger myths in the article, “Unearthing Big Myths About Big Data.”
The article focuses on Prof. Joerg Niessing’s big data expertise and how he explains the truth behind many of the biggest big data myths. One of the biggest items that Niessing wants people to understand is that gathering data does not equal dollar signs, you have to be active with data:
“You must take control, starting with developing a strategic outlook in which you will determine how to use the data at your disposal effectively. “That’s where a lot of companies struggle. They do not have a strategic approach. They don’t understand what they want to learn and get lost in the data,” he said in an interview. So before rushing into data mining, step back and figure out which customer segments and what aspects of their behavior you most want to learn about.”
Niessing says that big data is not really big, but made up of many diverse, data points. Big data also does not have all the answers, instead it provides ambiguous results that need to be interpreted. Have questions you want to be answered before gathering data. Also all of the data returned is not the greatest. Some of it is actually garbage, so it cannot be usable for a project. Several other myths are uncovered, but the truth remains that having a strategic big data plan in place is the best way to make the most of big data.
Whitney Grace, May 14, 2015
Sponsored by ArnoldIT.com, publisher of the CyberOSINT monograph
The Philosophy of Semantic Search
May 13, 2015
The article Taking Advantage of Semantic Search NOW: Understanding Semiotics, Signs, & Schema on Lunametrics delves into semantics on a philosophical and linguistic level as well as in regards to business. He goes through the emergence of semantic search beginning with Ray Kurzweil’s interest in machine learning meaning as opposed to simpler keyword search. In order to fully grasp this concept, the author of the article provides a brief refresher on Saussure’s semantics.
“a Sign is comprised of a signifier, or the name of a thing, and the signified, what that thing represents… Say you sell iPad accessories. “iPad case” is your signifier, or keyword in search marketing speak. We’ve abused the signifier to the utmost over the years, stuffing it onto pages, calculating its density with text tools, jamming it into title tags, in part because we were speaking to robot who read at a 3-year-old level.”
In order to create meaning, we must go beyond even just the addition of price tag and picture to create a sign. The article suggests the need for schema, in the addition of some indication of whom and what the thing is for. The author, Michael Bartholow, has a background in linguistics and marketing and search engine optimization. His article ends with the question of when linguists, philosophers and humanists will be invited into the conversation with businesses, perhaps making him a true visionary in a field populated by data engineers with tunnel-vision.
Chelsea Kerwin, May 13, 2014
Sponsored by ArnoldIT.com, publisher of the CyberOSINT monograph
CyberOSINT Videos
May 12, 2015
Xenky.com has posted a single page which provides one click access to the three CyberOSINT videos. The videos provide highlight of Stephen E Arnold’s new monograph about next generation information access. You can explore the videos which run a total of 30 minutes on the Xenky site. One viewer said, “This has really opened my eyes. Thank you.”
Kenny Toth, May 12, 2015
Another Google Challenger with Two Angles: Anonymity and Charity
May 12, 2015
I read another “Google killer” write up today. Hope springs eternal I know. The article is “New Search Engine from Waterfox Founder Aims to Take a Punch at Google.” The idea is that the search engine will offer “users absolute privacy online.” I thought that the combination of an alias, a VPN, and the Tor bundle delivered at least some privacy online.
The idea is that the new Storm search system will deliver anonymized search and pay some money derived from the system to non profit outfits. The article reports:
The aim is to tempt millions of users away from Google and create substantial revenues for worthy organizations. Up to £20 could be generated from each active user per year for charitable causes, the company claims.
The article points out that Storm has come competition:
Most of the successful entrants offer users the ability to search the web privately and securely, hiding their data from brands and data crunchers online. DuckDuckGo, which brands itself as a champion of privacy rights, has now been included on Apple’s internet browser Safari. Qwant, StartPage, and Ixquick are also vying for market share in the private browsing space.
My question, “What metasearch engines do you use regularly?”
Did I hear, “None.”
Another question, “Are you certain the queries are anonymous?”
Did I hear someone say, “I don’t know.”
Exactly.
Stephen E Arnold, May 12, 2015
HP Autonomy Dust Up: Details, Details
May 11, 2015
I read belatedly yet another analysis of the HP lawsuit against Autonomy. “Details of HP Lawsuit against Autonomy Executives” The write up reports that HP is taking “direct legal action against Lynch.” There is nothing like a personal legal action to keep the legal eagles circling in search of money.
The HP position is that Lynch (the founder of Autonomy) and Sushovan Hussain (former Autonomy CFO) overstated Autonomy’s growth and profits. My reaction is “Yeah, but didn’t you guys review the numbers before you wrote a check for $7 or $8 billion?”
Details, details.
The article states:
The acquisition has been seen as a disaster for HP since the tech giant was forced to write down $8.8 billion from the deal in 2012. The $5.1 billion legal claim is one of the largest ever brought against an individual in Britain. HP bases the claim on a $4.6 billion charge linked to the alleged financial misconduct, roughly $400 million connected to shares given to Lynch and Hussain and a further $100 million loss associated with Autonomy that was suspected of being caused by the former executives’ activities, according to the British court documents.
HP may not be a tech leader or even a C student in acquisition analyses, but it is the leader in the magnitude of the claim it is making against Dr. Lynch. If he is found guilty of selling something to HP who analyzed the deal and then decided to buy the company, he will have to pay $5.1 billion.
I don’t have a dog in this fight. But it seems to me that HP reviewed Qatalyst Partners’ financial presentation about Autonomy. Then HP analyzed the numbers. Then HP involved third parties in the review of the numbers. Then HP decided to buy Autonomy. Then HP bought the company. Then HP found that Autonomy is not exactly a product like a tube of Colgate Total toothpaste. Then HP fired, forced, or tasered Lynch and others out of the HP carpet land. Then HP tried to convert the technology into some sort of cloud based toolkit. And finally HP decided to go after Dr. Lynch. You don’t have to like him, but he is a bit of a celebrity in the Silicon Fen, holds an Order of the British Empire, and he is quite intelligent, maybe brilliant, and in my experience, not into dorks, fools, goof balls, losers, or dopey managers. Your mileage may vary, of course.
I am sufficiently experienced to know that when a buyer wants a product, service, or company, craving—nay, lust and craziness—kick in. “Yo, we’re 17 years old again. Let’s do it” scream the adrenaline charged experts. This is a slam dunk. We can take Autonomy waaaay beyond the place it is today. Rah, rah, rah. Get ‘em, team.”
Autonomy’s management and its advisors knows that PowerPoint dust can close deals. The blend of blood frenzy and the feeling of power one gets when taking ownership of a new La Ferrari is what business is about, dog. Smiles and PowerPointing from Autonomy played a part, but HP made the decision and wrote the check. Caveat emptor is good advice.
Frankly I see HP as the ideal candidate for a marvelous business school case. The HP Autonomy story is better than the Yahoo track record of blunders and blind luck. The management of HP believed something that has never ever ever been done: Generate billions of dollars in new revenue quickly. Google generates billions from advertising. Autonomy generated hundreds of millions in revenues from the licensing of dozens of products. HP got its wires crossed in reasoning which does not line up with the history of the search and content processing industry.
Billions do not flow from content processing and search technology. Investors can pump big money into a content processing company like Palantir. Will these investors get their money back? Don’t know. But to spend billions for a search and content processing company and then project that a $600 million or $800 million per year outfit would produce a gusher of billions is a big, but quite incorrect, thought.
Never has happened. Never will. It took Autonomy 15 years, good management, intelligent acquisitions, and lots of adaptation to hit the $600-$700 million plus in annual revenue it generated. Only energy drinking MBAs with Excel fever can convert 15 years and multiple revenue streams from dozens of quite different products into one giant multi billion dollar business in a couple of years. The scale is out of whack. When I visited the store in Manhattan with the big crazy pencil and the other giant products I could see the difference between my pencil and the big pencil. HP, I assume, would see the two pencils as identical. HP, if it purchased a big pencil, would sue the shop in Manhattan because the big pencil would not fit into a Panasonic desktop pencil sharpener. Scale of thinking, accuracy of perception—They matter to me. HP? Hmm.
This is not bad business on HP’s part. This is not flawed acquisition analysis on HP’s part. This is not HP’s inability to ask the right questions. This is medieval lunacy with managers dancing on the grass under a full moon. Isn’t HP that company which has floundered, investigated its own Board of Directors, chased good managers from one office in Silicon Valley into the arms of a competitor based on the old Sea World property? Maybe. Maybe HP is a fully stocked fishing pond, not a water deficient stream in Palo Alto?
My personal view is that HP has itself, its Board of Directors, and its advisors to blame. I find it very difficult to believe that as talented as Dr. Lynch is that he could spoof HP’s Board, HP’s financial professionals, HP’s advisors, HP’s lawyer, and HP’s Meg Whitman. Hey, the guy is talented, but he is not Houdini.
Well, we have a show, gentle reader. We have a really big show. Where is Ed Sullivan when we need an announcer?
Stephen E Arnold, May 11, 2015
Show Business and Enterprise Search
May 11, 2015
Short Honk: I read “In Our Increasingly Automated and Global Economy, Every Business Is Becoming Just a Little Bit Like Show Business.” Quite a Google-ized string of words. The write up asserts that work will be skilled contractors coming together when there is a project, money, and a need for specialists. This is—wait for it—the Hollywood model.
I think the author is sort of right. For certain types of work, hiring specialists makes sense. When an employee needs a hip replacement, few companies want to have the requisite specialists on staff.
The article asserts:
Our economy is in the midst of a grand shift toward the Hollywood model.
The author adds:
It’s a surprisingly good system for many workers too, in particular those with highly sought after skills.
The future will be
a new era of the human-robot partnership, in which robots can be told what to do without the use of difficult programming languages.
Sounds fantastic as long as one has in demand skills and can market her or his skills to generate awareness of an individual’s capabilities. (Too bad for those without skills and lacking in visibility. Tough luck.)
My interest is search.
If there is a tech sector where the Hollywood model should be visible, it is enterprise search. The experts come together, implement a system, and users become really happy with their new information retrieval system.
Unfortunately the data I have gathered suggests that anywhere from 55 to 75 percent of a search system’s users are unhappy. The folks in information technology departments have become gun shy when it comes to search. The folks who manage enterprise search solutions live a life of quiet desperation. It is not whether the person managing search will be RIFed; it is when in many search intolerant organizations.
The generalizations about the outputs of a Hollywood style approach to staffing don’t make much sense to me on a practical level for television and motion pictures. I find the outputs’ quality and value at odds with the products themselves.
The fact that a handful of specialists contribute their skills to a product via services that look good, appeal to the young in mind, and tap into the rich repository of comic book literature is evidence that the Hollywood model does not work for me.
Enterprise search has embraced the Hollywood model and tossed in superstars like the Google Search Appliance as well. How is that working? From my experience, search remains a problem no matter what the experts say or do.
Maybe the Hollywood model works but only in a superficial way. But those who are unemployed can watch the TV or go to the motion pictures. That’s value.
For enterprise search, more than a buzzword and a management catchphrase are needed to deliver a usable system. I hear the song now, “Another opening, another show…”
Stephen E Arnold, May 11, 2015
Elasticsearch Transparent about Failed Jepsen Tests
May 11, 2015
The article on Aphyr titled Call Me Maybe: Elasticsearch 1.5.0 demonstrates the ongoing tendency for Elasticsearch to lose data during network partitions. The author goes through several scenarios and found that users can lose documents if nodes crash, a primary pauses, a network partitions into two intersecting components or into two discrete components. The article explains,
“My recommendations for Elasticsearch users are unchanged: store your data in a database with better safety guarantees, and continuously upsert every document from that database into Elasticsearch. If your search engine is missing a few documents for a day, it’s not a big deal; they’ll be reinserted on the next run and appear in subsequent searches. Not using Elasticsearch as a system of record also insulates you from having to worry about ES downtime during elections.”
The article praises Elasticsearch for their internal approach to documenting the problems, and especially the page they opened in September going into detail on resiliency. The page clarifies the question among users as to what it meant that the ticket closed. The page states pretty clearly that ES failed their Jepsen tests. The article exhorts other vendors to follow a similar regimen of supplying such information to users.
Chelsea Kerwin, May 11, 2014
Sponsored by ArnoldIT.com, publisher of the CyberOSINT monograph
Making Queries of PostgreSQL Data Easy
May 9, 2015
If you query PostgreSQL tables, you may find yourself making nice with a script herder. Tired of that intermediated approach? Navigate to Slinky. You will want to watch the demo in Internet Explorer because I encountered flakiness in Firefox and Mozilla. You enter what you want in a search box, pick the table, and the system spits out the SQL query. Punch a button and you get a data table. Looked good and worked for us.
Stephen E Arnold, May 8, 2015
Search Left Out of the Collaborative Economy Honeycomb
May 8, 2015
I must admit that I knew very little about the collaborative economy. I used AirBnB once time and worried about my little test. I survived. I rode in an Uber car one time because my son is an aficionado. I am okay with the subway and walking. I ignore apps which allegedly make my life better, faster, and more expensive.
I saw a post which pointed me to the Chief Digital Officer Summit and that pointed me to this page with the amazing honeycomb shown below. The title is “Collaborative Economy Honeycomb 2: Watch It Grow”
The hexagons are okay, but the bulk of the write up is a listing of companies which manifest the characteristics of a collaborative honeycomb outfit.
Most of the companies were unfamiliar to me. I did recognize the names of a couple of the honeycombers; for example, Khan Academy, Etsy, eBay (ah, delightful eBay), Craigslist, Freelancer, the Crypto currencies (yep, my Dark Web work illuminated this hexagon in the honeycomb for me), and Indiegogo (I met the founder at a function in Manhattan).
But the other 150 companies in the list were news to me.
But what caused me to perk up and pay attention was one factoid:
There were zero search, content processing, or next generation information access companies in the list.
I formed a hypothesis which will probably give indigestion to the individuals and financial services firm pumping money into search and content processing companies. Here it is:
The wave of innovation captured in the wonky honeycomb is moving forward with search as an item on a checklist. The finding functions of these outfits boil down to social media buzz and niche marketing. Information access is application centric, not search centric.
If I am correct, why would honeycomb companies in collaboration mode want to pump money into a proprietary keyword search system? Why not use open source software and put effort into features for the app crowd?
Net net: Generating big money from organic license deals may be very difficult if the honeycomb analysis is on the beam. How hard will it be to sell a high priced search system to the companies identified in this analysis? I think that the task might be difficult and time consuming.
the good news is that the list of companies provides outfits like Attivio, BA Insight, Coveo, Recommind, Smartlogic, and other information retrieval firms with some ducks at which to shoot. How many ducks will fall in a fusillade of marketing?
One hopes that the search sharpshooters prevail.
Stephen E Arnold, May 8, 2015
Blur Private Search Promises to Hide User Identities from Google
May 8, 2015
We advise you to not take this advice: ReadWrite purports to tell us “How to Blur your Search Tracks on Google.” The article profiles Blur Private Search from privacy company Albine, a shield service that works to hide your identity from Google’s prying databases. The tool does this by setting each user up with a fake, cookie-free identity for each search. Writer Yael Grauer tells us:
“Private Search provides a new made-up identity for each individual search. It then funnels the request through an SSL tunnel, so that the search is encrypted—even Abine can’t see what you’re searching for. And every phrase or topic you search appears as if it is unconnected to previous searches, since each query is sent through Abine’s server with an entirely different IP address (which is yet another avenue by which websites can track people).
“Your search requests are modified before leaving your browser in a way that breaks the identity connection between your searches and the rest of your tabs. That means you can keep your YouTube tab open with all of your videos, and stay logged into Gmail, all without allowing Google to link your search queries with your account (and identity).”
At this time, the tool runs only in Firefox, and they have not yet implemented the in-results visuals that let you know it is working. Those problems will be fixed, but the bigger issue lies in trying to hide the tracks of anything typed into Google. Even the folks at Albine admit that people with something to hide that could put them in actual danger (Chinese dissidents, for example) would be better off going through Tor. There are other engines that don’t track in the first place, too. At the same time, it is true that Google’s functionality is unmatched, so users must weigh their priorities; one might use a non-tracking tool for anything financial, health, or uprising-related, for example, and Google for everything else. Just a suggestion.
Boston-based Albine bills itself as “the online privacy company,” and their goal is to bring user-friendly security to anyone who goes online. Their other products include DoNotTrackMe, MaskMe, and DeleteMe. The company was founded in 2008.
Cynthia Murrell, May 8, 2015
Sponsored by ArnoldIT.com, publisher of the CyberOSINT monograph