Data Mining Algorithms Explained
May 18, 2015
In plain English too. Navigate to “Top 10 Data Mining Algorithms in Plain English.” When you fire up an enterprise content processing system, the algorithms beneath the user experience layer are chestnuts. Universities do a good job of teaching students about some reliable methods to perform data operations. In fact, the universities do such a good job that most content processing systems include almost the same old chestnuts in their solutions. The decision to use some or all of the top 10 data mining algorithms has some interesting consequences, but you will have to attend one of my lectures about the weaknesses of these numerical recipes to get some details.
The write up is worth a read. The article includes a link to information which underscores the ubiquitous nature of these methods. This is the Xindong Wu et all write up “Top 10 Algorithms in Data Mining.” Our research reveals that dependence on these methods is more wide spread now than they were seven years ago when the paper first appeared.
The implication then and now is that content processing systems are more alike than different. The use of similar methods means that the differences among some systems is essentially cosmetic. There is a flub in the paper. I am confident that you, gentle reader, will spot it easily.
Now to the “made simple” write up. The article explains quite clearly the what and why of 10 widely used methods. The article also identifies some of the weaknesses of each method. If there is a weakness, do you think it can be exploited? This is a question worth considering I suggest.
Example: What is a weakness of k means:
Two key weaknesses of k-means are its sensitivity to outliers, and its sensitivity to the initial choice of centroids. One final thing to keep in mind is k-means is designed to operate on continuous data — you’ll need to do some tricks to get it to work on discrete data.
Note the key word “tricks.” When one deals with math, the way to solve problems is to be clever. It follows that some of the differences among content processing systems boils down to the cleverness of the folks working on a particular implementation. Think back to your high school math class. Was there a student who just spit out an answer and then said, “It’s obvious.” Well, that’s the type of cleverness I am referencing.
The author does not dig too deeply into PageRank, but it too has some flaws. An easy way to identify one is to attend a search engine optimization conference. One flaw turbocharges these events.
My relative Vladimir Arnold, whom some of the Arnolds called Vlad the Annoyer, would have liked the paper. So do I. The write up is a keeper. Plus there is a video, perfect for the folks whose attention span is better than a goldfish’s.
Stephen E Arnold, May 18, 2015
Exit Governance. Enter DMP.
May 17, 2015
A DMP is a data management platform. I think in terms of databases. I find that software does not do a particularly reliable job “managing data.” Software can run processes, write log file, and perform other functions. But management, based on my experience at Booz, Allen & Hamilton, requires humans. Talking about analytics from Big Data and implementing a platform to perform management are apples and house paint in my mind.
Intrigued by the reference, I downloaded a document available upon registration from Infinitive. You can find the company’s Web site at www.infinitive.com. The white paper maps you 10 ways a data management platform can help me.
I was not familiar with Infinitive. According to the firm’s Web site: Infinitive is
A Different Kind of Consultancy. Results-driven and client-centric. Fun, focused and flexible. Highly engaged and easy to work with. Those are the qualities that make Infinitive a different kind of consultancy. And they’re the pillars of our unique culture. Headquartered in the Washington, D.C. area, Infinitive specializes in digital ad solutions, business transformation, customer & audience intelligence and enterprise risk management. Leveraging best practices in process engineering, change management and program management, we design and deliver custom solutions for leading organizations in communications, media and entertainment, financial services and educational services. For our clients, the results include quantifiable performance improvement and tangible bottom-line value in addressing their most pressing challenges and fulfilling their top-priority objectives.
What is a data management platform?
White paper or two page document identifies these benefits of a DMP. I was hoping for an explanation of the “platform,” but let’s look at the payoffs from the platform.
The company points out that a DMP makes ad money go farther. Big Data become actionable. A DMP provides a foundation for analytics. The DMP “ensures the quality and accessibility of customer and audience intelligence data.” The DMP can harmonize data. A DMP allows me to “adapt traditional CRM strategies and technology to incorporate new customer behavior.” I can create new customer and audience “segments.” The DMP becomes the central nervous system for my company. And the DMP protects privacy.
That is a bundle of benefits. But what is the platform provided by a consulting company, especially one that is “fun”? I was not able to locate details about the platform. The company appears to be a firm focused on advertising.
The Web site includes a page about the DMP at this link. The information is buzzword heavy and fact free. My view is that the DMP is a marketing hook. The implied technology is consulting services. That’s okay, but I find the approach representative of marketing billable time, not delivering a platform with the remarkable and perhaps unattainable benefits suggested in the white paper.
The approach must work. The company’s Web site points out this message:
Not a platform, however.
Stephen E Arnold, May 17, 2015
Developing an NLP Semantic Search
May 15, 2015
Can you imagine a natural language processing semantic search engine? It would be a lovely tool to use in your daily routines and make research a bit easier. If you are working on such a project and are making a progress, keep at that startup because this is lucrative field at the moment. Over at Stack Overflow, an entrepreneuring spirit is trying to develop a “Semantic Search With NLP And Elasticsearch”:
“I am experimenting with Elasticsearch as a search server and my task is to build a “semantic” search functionality. From a short text phrase like “I have a burst pipe” the system should infer that the user is searching for a plumber and return all plumbers indexed in Elasticsearch.
Can that be done directly in a search server like Elasticsearch or do I have to use a natural language processing (NLP) tool like e.g. Maui Indexer. What is the exact terminology for my task at hand, text classification? Though the given text is very short as it is a search phrase.”
Given that this question was asked about three years ago, a lot has been done not only with Elasticsearch, but also NLP. Search is moving towards a more organic experience, but accuracy is often muddled by different factors. These include the quality of the technology, classification, taxonomies, ads in results, and even keywords (still!).
NLP semantic search is closer now than it was three years ago, but technology companies would invest a lot of money in a startup that can bridge the gap between natural language and machine learning.
Whitney Grace, May 15, 2015
Sponsored by ArnoldIT.com, publisher of the CyberOSINT monograph
The Latest SharePoint News from Ignite
May 14, 2015
The Ignite conference in Chicago has answered many of the questions that SharePoint users have been curious about for months now. Among them was the question of release timing and features for the latest iteration of SharePoint. CMS Wire gives a rundown in their article, “What’s Up With SharePoint? #MSIgnite.”
The article sums up the biggest news:
“Microsoft will continue to enhance the core offerings in the on-premises edition. It will also continue to develop SharePoint Online and update it as quickly as the updates are available. A preview version of SharePoint 2016 will be made available later this summer, with a beta version expected by the end of the year . . . In an afternoon session entitled Evolution of SharePoint Overview and Roadmap, the duo gave a rough outline of Microsoft’s plans, albeit without precise delivery dates.”
Having had to push back delivery dates once already, Microsoft is likely hesitant to announce anything solid until development is final. As far as qualities for the new version, Microsoft is focusing on: user experience, extensibility, and SharePoint management. The inclusion of user experience should be a welcome change for many. To stay in touch with developments as they become available, keep an eye on ArnoldIT.com, and particularly his feed devoted to SharePoint. Stephen E. Arnold has made a lifelong career out of all things search, and he has a knack for distilling down the “need to know” facts to keep an organization on track.
Emily Rae Aldridge, May 14, 2015
Sponsored by ArnoldIT.com, publisher of the CyberOSINT monograph
Blur Private Search Promises to Hide User Identities from Google
May 8, 2015
We advise you to not take this advice: ReadWrite purports to tell us “How to Blur your Search Tracks on Google.” The article profiles Blur Private Search from privacy company Albine, a shield service that works to hide your identity from Google’s prying databases. The tool does this by setting each user up with a fake, cookie-free identity for each search. Writer Yael Grauer tells us:
“Private Search provides a new made-up identity for each individual search. It then funnels the request through an SSL tunnel, so that the search is encrypted—even Abine can’t see what you’re searching for. And every phrase or topic you search appears as if it is unconnected to previous searches, since each query is sent through Abine’s server with an entirely different IP address (which is yet another avenue by which websites can track people).
“Your search requests are modified before leaving your browser in a way that breaks the identity connection between your searches and the rest of your tabs. That means you can keep your YouTube tab open with all of your videos, and stay logged into Gmail, all without allowing Google to link your search queries with your account (and identity).”
At this time, the tool runs only in Firefox, and they have not yet implemented the in-results visuals that let you know it is working. Those problems will be fixed, but the bigger issue lies in trying to hide the tracks of anything typed into Google. Even the folks at Albine admit that people with something to hide that could put them in actual danger (Chinese dissidents, for example) would be better off going through Tor. There are other engines that don’t track in the first place, too. At the same time, it is true that Google’s functionality is unmatched, so users must weigh their priorities; one might use a non-tracking tool for anything financial, health, or uprising-related, for example, and Google for everything else. Just a suggestion.
Boston-based Albine bills itself as “the online privacy company,” and their goal is to bring user-friendly security to anyone who goes online. Their other products include DoNotTrackMe, MaskMe, and DeleteMe. The company was founded in 2008.
Cynthia Murrell, May 8, 2015
Sponsored by ArnoldIT.com, publisher of the CyberOSINT monograph
The Dichotomy of SharePoint Migration
May 7, 2015
SharePoint Online gets good reviews, but only from critics and those who are utilizing SharePoint for the first time. Those who are sitting on huge on-premises installations are dreading the move and biding their time. It is definitely an issue stemming from trying to be all things to all people. Search Content Management covers the issue in their article, “Migrating to SharePoint Online is a Tale of Two Realities.”
The article begins:
“Microsoft is paving the way for a future that is all about cloud computing and mobility, but it may have to drag some SharePoint users there kicking and screaming. SharePoint enables document sharing, editing, version control and other collaboration features by creating a central location in which to share and save files. But SharePoint users aren’t ready — or enthused about — migrating to . . . SharePoint Online. According to a Radicati Group survey, only 23% of respondents have deployed SharePoint Online, compared with 77% that have on-premises SharePoint 2013.”
If you need to keep up with how SharePoint Online may affect your organization’s installation, or the best ways to adapt, keep an eye on ArnoldIT.com. Stephen E. Arnold is a longtime leader in search and distills the latest tips, tricks, and news on his dedicated SharePoint feed. SharePoint Online is definitely the future of SharePoint, but it cannot afford to get there at the cost of its past users.
Emily Rae Aldridge, May 7, 2015
Sponsored by ArnoldIT.com, publisher of the CyberOSINT monograph
RichRelevance Promises Complete Omnichannel Personalization
May 7, 2015
The article on MarketWatch titled RichRelevance Extends Its Partner Ecosystem to Support True Omnichannel Personalization predicts the consequences of San Francisco-based company RichRelevance’s recent announcement that they will be amping up partner support in order to improve the continuity of the customer experience across “web, mobile, call center and store.” The article explains what is meant by omnichannel personalization and why it is so important,
“Personalization has emerged as the most important strategic imperative for global businesses,” said Eduardo Sanchez, CEO of RichRelevance. “Our partner ecosystem provides our customers with a unique resource to support the implementation of different components of the Relevance Cloud in their business, as well as customize personalization according to the highly specific demands of their own businesses and consumer base.” Gartner predicts that 89% of companies plan to compete primarily on the basis of the customer experience by 2016…”
The Relevance Cloud is available for Richrelevance partners and includes such core capabilities as Pre-built personalization apps for recommendations and search, the Open Innovation Platform for Build, and Relevance in Store for the reported 90% of sales that occur in-store. The announcement ensures that the collaboration Richrelevance emphasizes with its partners will really range all areas of customer engagement.
Chelsea Kerwin, May 7, 2014
Sponsored by ArnoldIT.com, publisher of the CyberOSINT monograph
Yahoo and Microsoft Announce Search Partnership Reboot
May 7, 2015
It seems that Microsoft and Yahoo are friends again, at least for the time being. Search Engine Watch announces, “Yahoo and Microsoft Amend Search Agreement.” The two companies have been trying to partner on search for the past six years, but it has not always gone smoothly. Writer Emily Alford tells us what will be different this time around:
“First, Yahoo will have greater freedom to explore other search platforms. In the past, Yahoo was rumored to be seeking a partnership with Google, and under the new terms, Microsoft and Yahoo’s partnership will no longer be exclusive for mobile and desktop. Under the new agreement, Yahoo will continue to serve Bing ads on desktop and mobile, as well as use Bing search results for the majority of its desktop search traffic, though the exact number was undisclosed.
“Microsoft and Yahoo are also making changes to the way that ads are served. Microsoft will now maintain control of the Bing ads salesforce, while Yahoo will take full control of its Gemini ads salesforce, which will leave Bing free to serve its own ads side by side with Yahoo search results.”
Yahoo CEO Marissa Mayer painted a hopeful picture in a prepared statement. She and Microsoft CEO Satya Nadella have been working together, she reports, to revamp the search deal. She is “very excited to explore” the fresh possibilities. Will the happy relationship hold up this time around?
Cynthia Murrell, May 7, 2015
Sponsored by ArnoldIT.com, publisher of the CyberOSINT monograph
Google Cloud Bigtable: The Real Hadoop de Doop?
May 6, 2015
Navigate to “Announcing Google Cloud Bigtable: The same database that powers Google Search, Gmail and Analytics is now available on Google Cloud Platform.” I learned:
we are excited to introduce Google Cloud Bigtable – a fully managed, high-performance, extremely scalable NoSQL database service accessible through the industry-standard, open-source Apache HBase API. Under the hood, this new service is powered by Bigtable, the same database that drives nearly all of Google’s largest applications.
In the list of benefits Google offers, one caught my attention:
Over the past 10+ years, Bigtable has driven Google’s most critical applications. In addition, the HBase API is a industry-standard interface for combined operational and analytical workloads.
The question becomes, “Is this the real Hadoop?” Another question: “Is Google using decade old technology for its “most critical applications”? I answer, “Nope. I think there are newer, whizzier software in use.”
Stephen E Arnold, May 6, 2015
Annual Ranking of Legal Sector Puts Omnivere at the Top
May 6, 2015
The article titled Omnivere Voted Best National End-To-End Ediscovery, Managed Ediscovery & Litigation Support, and Data & Technology Provider in 2015 Best of the National Law Journal on Blackbird discusses the ranking and what it means. This is an annual ranking that is conducted with readers of The National Law Journal & Legal Times casting ballots based on their experiences with their own legal services. Omnivere won this year’s legal sector “best in show.” The article states,
“In less than a year, OmniVere has established itself as a trailblazer in the next wave of data and technology consulting, eDiscovery services and litigation support. In creating an in-house team of expert, veteran data consultants, including former senior leadership from FTI, Navigant Consulting, Integreon, Recommind, Xerox and Berkeley Research Group, OmniVere is well positioned to deliver a range of products and services on a global playing field.”
Omnivere was launched in May 2014 and rapidly grew into one of the biggest and most sought-after companies for its work in litigation support and discovery management. Erik Post, Omnivere President, is quoted in the article celebrating the win and the overall success of the company. He suggests that in spite of their new brand, the work and abilities of the staff is “resonating across the country.”
Chelsea Kerwin, May 6, 2014
Sponsored by ArnoldIT.com, publisher of the CyberOSINT monograph

