Enterprise Search: A Problem of Relevance to the Users
January 23, 2015
I enjoy email from those who read my for fee columns. I received an interesting comment from Australia about desktop search.
In a nutshell, the writer read one of my analyses of software intended for a single user looking for information on his local hard drives. The bigger the hard drives, the greater the likelihood, the user will operate in squirrel mode. The idea is that it is easier to save everything because “you never know.” Right, one doesn’t.
Here’s the passage I found interesting:
My concern is that with the very volatile environment where I saw my last mini OpenVMS environment now virtually consigned to the near-legacy basket and many other viable engines disappearing from Desktop search that there is another look required at the current computing environment.
I referred this person to Gaviri Search, which I use to examine email, and Effective File Search, which is useful for looking in specific directories. These suggestions sidestepped the larger issue:
There is no fast, easy to use, stable, and helpful way to look for information on a couple of terabytes of local storage. The files are a mixed bag: Excels, PowerPoints, image and text embedded PDFs, proprietary file formats like Framemaker, images, music, etc.
Such this problem was in the old days and such this problem is today. I don’t have a quick and easy fix. But these are single user problems, not an enterprise scale problem.
An hour after I read the email about my column, I received one of those frequent LinkedIn updates. The title of the thread to which LinkedIn wished to call my attention was/is: “What would you guess is behind a drop in query activity?”
I was enticed by the word “guess.” Most assume that the specialist discussion threads on LinkedIn attract the birds with the brightest plumage, not the YouTube commenter crowd.
I navigated to the provided link which may require that you become a member of LinkedIn and then appeal for admission to the colorful feather discussion for “Enterprise Search Professionals.”
The situation is that a company’s enterprise search engine is not being used by its authorized users. There was a shopping list of ideas for generating traffic to the search system. The reason is that the company spent money, invested human resources, and assumed that a new search system would deliver a benefit that the accountants could quantify.
What was fascinating was the response of the LinkedIn enterprise search professionals. The suggestions for improving the enterprise search engine included:
- Asking for more information about usage? (Interesting but the operative fact is that traffic is low and evident to the expert initiating the thread.)
- A thought that the user interface and “global navigation” might be an issue.
- The idea that an “external factor” was the cause of the traffic drop. (Intriguing because I would include the search for a personal search system described in the email about my desktop search column as an “external factor.” The employee looking for a personal search solution was making lone wolf noises to me.)
- An former English major’s insight that traffic drops when quality declines. I was hoping for a quote from a guy like Aristotle who said, “Quality is not an act, it is a habit.” The expert referenced “social software.”
- My tongue in cheek suggestion that the search system required search engine optimization. The question sparked sturm und drang about enterprise search as something different from the crass Web site marketing hoopla.
- A comment about the need for users to understand the vocabulary required to get information from an index of content and “search friendly” pages. (I am not sure what a search friendly page is, however? Is it what an employee creates, an interface, or a canned, training wheels “report”?)
Let’s step back. The email about desktop search and this collection of statements about lack of usage strike me as different sides of the same information access coin.
Enterprise Search Lags Behind: Actionable Interfaces, Not Lists, Needed
January 22, 2015
I was reviewing the France24.com item “Paris Attacks: Tracing Shadowy Terrorist Links.” I came across this graphic:
Several information-access thoughts crossed my mind.
First, France24 presented information that looks like a simplification of the outputs generated by a system like IBM’s i2. (Note: I was an advisor to i2 before its sale to IBM.) i2 is an NGIA or next generation information access system which dates from the 1990s. The notion that crossed my mind is that this relationship diagram presents information in a more useful way than a list of links. After 30 years, I wondered, “Why haven’t traditional enterprise search systems shifted from lists to more useful information access interfaces?” Many vendors have and the enterprise search vendors that stick to the stone club approach are missing what seems to be a quite obvious approach to information access.
A Google results list with one ad, two Wikipedia item, pictures, and redundant dictionary links. Try this query “IBM Mainframe.” Not too helpful unless one is looking for information to use in a high school research paper.
Second, the use of this i2-type diagram, now widely emulated from Fast Search centric outfits like Attivio to high flying venture backed outfits like Palantir permits one click access to relevant information. The idea is that a click on a hot spot—a plus in the diagram—presents additional information. I suppose one could suggest that the approach is just a form of faceting or “Guided Navigation”, which is Endeca’s very own phrase. I think the differences are more substantive. (I discuss these in my new monograph CyberOSINT.)
Third, no time is required to figure out what’s important. i2 and some other NGIA systems present what’s important, identifies key data points, and explains what is known and what is fuzzy. Who wants to scan, click, read, copy, paste, and figure out what is relevant and what is not? I don’t for many of my information needs. The issue of “time spent searching” is an artifact of the era when Boolean reigned supreme. NGIA systems automatically generate indexes that permit alternatives to a high school term paper’s approach to research.
Little wonder why the participants in enterprise search discussion groups gnaw bones that have been chewed for more than 50 years. There is no easy solution to the hurdles that search boxes and lists of results present to many users of online systems.
France24 gets it. When will the search vendors dressed in animal skins and carrying stone tools figure out that the world has changed. Demographics, access devices, and information have moved on.
Most enterprise search vendors deliver systems that could be exhibited in the Smithsonian next to the Daystrom 046 Little Gypsy mainframe and the IBM punch card machine.
Stephen E Arnold, January 22, 2015
Microsoft, Text Analytics, and Writing
January 21, 2015
I read the marvelously named “Microsoft Acquires Text Analysis Startup Equivio, Plans to Integrate Machine Learning Tech into Office 365: Equivio Zoom In. Find Out.”
Taking a deep breath I read the article. Here’s what I deduced: Word and presumably PowerPoint will get some new features:
While Office 365 offers e-discovery and information governance capabilities, Equivio develops machine learning technologies for both, meaning an integration is expected to make them “even more intelligent and easy to use.” Microsoft says the move is in line with helping its customers tackle “the legal and compliance challenges inherent in managing large quantities of email and documents.”
The Fast Search & Transfer technology is not working out? The dozens of SharePoint content enhancers are not doing their job? The grammar checker is not doing its job?
What is different is that Word is getting more machine learning:
Equivio uses machine learning to let users explore large, unstructured sets of data. The startup’s technology leverages advanced text analytics to perform multi-dimensional analyses of data collections, intelligently sort documents into themes, group near-duplicates, and isolate unique data.
Like Microsoft’s exciting adaptive menus, the new system will learn what the user wants.
Is this a next generation information access system? Is Microsoft nosing into Recorded Future territory?
Nope, but the desire to covert what the user does into metadata seems to percolate in the Microsoft innovation coffee pot.
If Microsoft pulls off this shotgun marriage, I think more pressure will be put on outfits like Content Analyst and Smartlogic.
Stephen E Arnold, January 21, 2015
NGIA Palantir Worth Almost As Much As Uber and Xiaomi
January 18, 2015
Short honk: Value is in the eye of the beholder. I am reminded of this each time I see an odd ball automobile sell for six figures on the Barrett Jackson auction.
Navigate to “Palantir Raising More Money After Tagged With $15 Billion Valuation.” Keep in mind that you may have to pay to view the article, or you can check out the apparently free link to source data at http://bit.ly/KKOAw1.
The key point is that Palantir is an NGIA system. Obviously it appears on the surface to have more “value” than Hewlett Packard’s Autonomy or the other content processing companies in the hunt for staggering revenues.
Stephen E Arnold, January 18, 2015
Zaizi: Search and Content Consulting
January 13, 2015
I received a call about Zaizi and the company’s search and content services. The firm’s Web site is at www.zaizi.com. Based on the information in my files, the company appears to be open source centric and an integrator of Lucene/Solr solutions.
What’s interesting is that the company has embraced Mondeca/Smartlogic jargon; for example, content intelligence. I find the phrase interesting and an improvement over the Semantic Web lingo.
The idea is that via indexing, one can find and make use of content objects. I am okay with this concept; however, what’s being sold is indexing, entity extraction, and classification of content.
The issue facing Zaizi and the other content intelligence vendors is that “some” content intelligence and slightly “smarter” information access is not likely to generate the big bucks needed to compete.
Firms like BAE and Leidos as well as the Google/In-Tel-Q backed Recorded Future offer considerably more than indexing. The need is to process automatically, analyze automatically, and generate outputs automatically. The outputs are automatically shaped to meet the needs of one or more human consumers or one or more systems.
Think in terms of taking outputs of a next generation information access system and inputting the “discoveries” or “key items” into another system. The idea is that action can be taken automatically or provided to a human who can make a low risk, high probability decision quickly.
The notion that a 20 something is going to slog through facets, keyword search, and the mind numbing scan results-open documents-look for info approach is decidedly old fashioned.
You can learn more about what the next big thing in information access is by perusing CyberOSINT: Next Generation Information Access at www.xenky.com/cyberosint.
Stephen E Arnold, January 14, 2015
Palantir and 2014 Funding
January 5, 2015
I read an article that confused me. Navigate to “Palantir Secures First $60M Chunk of Projected $400M Round as Market Asks, “Who?”
This sentence suggests that Palantir wants to go public. What do you think?
But although it would clearly find no trouble catching the market’s attention, the company is in rush to take on the pressure of public trading The secretive nature of its clientele and an apparent desire to prioritize long-term strategy over short-term returns are the primary considerations behind that approach, but what facilitates it is the ease with which Palantir has managed to draw private investors so far.
I wonder if this article means “no” rush. I wonder if this article is software generated.
Here’s another interesting passage:
The document [cited by Techcrunch?] doesn’t specify the source of the capital or what Palantir intends to spend it on, but based on the claim in NYT report that it wasn’t profitable as of May, the money will probably go primarily toward fueling operations. The paper also noted that most of the estimated billion dollars that the company raked in this year came from private sector customers, which provides a hint as to the areas where the funding will be invested, namely the development of its enterprise-oriented Gotham offering.
I have my own views about Palantir which are summarized in the forthcoming CyberOSINT: Next Generation Information Access monograph. (If you want to order a copy, write benkent2020 at yahoo dot com. The book is available to law enforcement, security, and intelligence professionals.)
The statement “isn’t profitable” is fascinating if true.
Stephen E Arnold, January 5, 2015
A Big NGIA Year Ahead
January 1, 2015
The New Year is upon us. We will be posting a page on Xenky where you can request a copy of CyberOSINT: Next Generation Information Access, a link to the seminar which is limited to law enforcement and intelligence professionals only, and some supplementary information that will allow my Beyond Search blog to shift from the dead end enterprise search to the hottest topics in information access.
If you want information about CyberOSINT: Next Generation Information Access, you can send an email to benkent2020 at yahoo dot com. We will send you a one pager about the study. To purchase the book, you must be an active member of the armed forces, a working law enforcement professional, or an individual working for one of the recognized intelligence agencies we support; for example, a NATO member’s intelligence operation.
Stephen E Arnold, January 1, 2015
SAP Hana Search 2014
December 25, 2014
Years ago I wrote an analysis of TREX. At the time, SAP search asserted a wide range of functionality. I found the system interesting, but primarily of use to die hard SAP licensees. SAP was and still is focused on structured data. The wild and crazy heterogeneous information generated by social media, intercept systems, geo-centric gizmos, and humans blasting terabytes of digital images cheek by jowl with satellite imagery is not the playground of the SAP technology.
If you want to get a sense of what SAP is delivering, check out “SAP Hana’s Built-In Search Engine.” My take on the explanation is that it is quite similar to what Fast Search & Transfer proposed for the pre-sale changes to ESP. The built-in system is not one thing. The SAP explainer points out:
A standalone “engine” is not enough, however. That’s why SAP HANA also includes the Info Access “InA” toolkit for HTML5. The InA toolkit is a set of HTML5 templates and UI controls which you can use to configure a modern, highly interactive UI running in a browser. No code – just configuration.
To make matters slightly more confusing, I read “Google Like Enterprise Search Powered by SAP Hana.” I am not sure what “Google like” means. Google provides its ageing and expensive Google Search Appliance. But like Google Earth, I am not sure how long the GSA will remain on the Google product punch list. Futhermore, the GSA is a bit of a time capsule. Its features and functions have not kept pace with next generation information access technologies. Google invested in Recorded Future a couple of years ago and as far as I know, none of the high value Recorded Future functions are part of the GSA. Google also delivers its Web indexing service. Does Google like refer to the GSA, Google’s cloud indexing of Web sites, or the forward looking Recorded Future technology?
The Google angle seems to relate to Fiori search. Based on the screenshots, it appears that Fiori presents SAP’s structured data in a report format. Years ago we used a product called Monarch to deliver this type of information to a client.
My hypothesis is that SAP wants to generate more buzz about its search technology. The company has moved on from TREX, positioned Hana search as a Fast Search emulation, and created Fiori to generate reports from SAP’s structured data management system.
For now, I will keep SAP in my “maybe next year” folder. For now. I am not sure what SAP information access systems deliver beyond basic keyword search, some clustering, and report outputs. SAP at some point may have to embrace open source search solutions. If SAP has maintained its commitment to open source, perhaps these technologies are open source. I would find that reassuring.
Regardless of what SAP is providing licensees, it is clear that the basic features and functions of next generation information access systems are not part of the present line up of products. Like other IBM-inspired companies, the future is rushing forward with SAP search receding in tomorrow’s rear view mirror. Calling a system “Google like” is not helpful, nor does it suggest that SAP is ware of NGIA systems. Some of SAP’s customers will be licensing these systems in order to move beyond what is a variation of query, scan results, open documents, read documents, and hunt for useful information. Organizations require more sophisticated information access services. The models crafted in the 1990s are, in my opinion, are commoditized. Higher value NGIA operations are the future.
Stephen E Arnold, December 25, 2014
Coveoed Up with End of Week Marketing
December 22, 2014
I am the target of inbound marketing bombardments. I used to look forward to Autonomy’s conceptual inducements. In fact, in my opinion, the all-time champ in enterprise search marketing is Autonomy. HP now owns the company, and the marketing has fizzled in my opinion. I am in some far off place, and I sifted through emails, various alerts, and information dumped in my Overflight system.
I must howl, “Uncle.” I have been covered up or Coveo-ed up.
Coveo is the Canadian enterprise search company that began life as a hard drive search program and then morphed into a Microsoft-centric solution. With some timely venture funding, the company has amped up its marketing. The investor have flown to Australia to lecture about search. Australia as you may know is the breeding ground for the TeraText system which is a darned important enterprise application. Out of the Australia research petri dish emerged Funnelback. There was YourAmigo, and some innovations that keep the lights on in the Google offices in the land down under.
Coveo sent me email asking if my Google search appliance was delivering. Well, the GSA does exactly what it was designed to do in the early 2000s. I am not sure I want it to do anything anymore. Here’s part of the Coveo message to me:
Hi,
Is your Search Appliance failing you? Is it giving you irrelevant search results, or unable to search all of your systems? It’s time you considered upgrading to the only enterprise search platform that:
- Securely indexes all of your on-premise and cloud-based source systems
- Provides easy-to-tune relevance and actionable analytics
- Delivers unified search to any application and device your teams use
If I read this correctly, I don’t need a GSA, an Index Engines, a Maxxcat, or an EPI Thunderstone. I can just pop Coveo into my shop and search my heart out.
How do I know?
Easy. The mid tier consulting firm Gartner has identified Coveo as “the most visionary leader” in enterprise search. I am not sure about the methods of non-blue chip consulting firms. I assume they are objective and on a par with the work of McKinsey, Bain, Booz, Allen, and Boston Consulting Group. I have heard that some mid tier firms take a slightly different approach to their analyses. I know first hand that one mid tier firm recycled my research and sold my work on Amazon without my permission. I don’t recall that happening when I worked at Booz, Allen, though. We paid third parties, entered into signed agreements, and were upfront about who knew what. Times change, of course.
Another message this weekend told me that Coveo had identified five major trends that—wait for it—“increase employee and customer proficiency in 2015.” I don’t mean to be more stupid than the others residing in my hollow in rural Kentucky, but what the heck is “customer proficiency”? What body of evidence supports these fascinating “trends.”
The trends are remarkable for me. I just completed CyberOSINT: Next Generation Information Access. The monograph will be available in early 2015 to active law enforcement, security, and intelligence professionals. If you qualify and want to get a copy, send an email to benkent2020 at yahoo dot com. I was curious to see if the outlook my research team assembled from our 12 months of research into the future of information access matched to Coveo’s trends.
The short answer is, “Not even close.”
Coveo focuses on “the ecosystem of record.” CyberOSINT focuses on automated collection and analytics. An “ecosystem of record” sounds like records management. In 2015 organizations need intelligence automatically discovered in third party, proprietary, and open source content, both historical and real time.
Coveo identifies “upskilling the end users.” In our work, the focus is on delivering to either a human or another system outputs that permit informed action. In many organizations, end users are being replaced by increasingly intelligent systems. That trend seems significant in the software delivered by the NGIA vendors whose technology we analyzed. (NGIA is shorthand for next generation information access.)
Coveo is concerned about a “competent customer.” That’s okay, but isn’t that about cost reduction. The idea is to get rid of expensive call center humans and replace them with NGIA systems. Our research suggests that automated systems are the future, or did I just point that out in the “upskilling” comment.
Coveo is mobile first. No disagreement there. The only hitch in the git along is that when one embraces mobile, there are some significant interface issues and predictive operations become more important. Therefore, in the NGIA arena, predictive outputs are where the trend runway lights are leading.
Coveo is confident that cloud indexes and their security will be solved. That is reassuring. However, the cloud as well as on premises’ solutions, including hybrid solutions, have to adopt predictive technology that automatically deals with certain threats, malware, violations, and internal staff propensities. The trend, therefore, is for OSINT centric systems that hook into operational and intel related functions as well as performing external scans from perimeter security devices.
What I find fascinating is that in the absence of effective marketing from vendors of traditional keyword search, providers of old school information access are embracing some concepts and themes that are orthogonal to a very significant trend in information access.
Coveo is obviously trying hard, experimenting with mid tier consulting firm endorsements, hitting the rubber chicken circuit, and cranking out truly stunning metaphors like the “customer proficiency” assertion.
The challenge for traditional keyword search firms is that NGIA systems have relegated traditional information access approaches to utility and commodity status. If one wants search, Elasticsearch works pretty well. NGIA systems deliver a different class of information access. NGIA vendors’ solutions are not perfect, but they are a welcome advance over the now four decades old approach to finding important items of information without the Model T approach of scanning a results list, opening and browsing possibly relevant documents, and then hunting for the item of information needed to answer an important question.
The trend, therefore, is NGIA. An it is an important shift to solutions whose cost can be measured. I wish Mike Lynch was driving the Autonomy marketing team again. I miss the “Black Hole of Information”, the “Portal in a Box,” and the Digital Reasoning Engine approach. Regardless of what one thinks about Autonomy, the company was a prescient marketer. If the Lynch infused Autonomy were around today, the moniker “NGIA” would be one that might capture of Autonomy’s marketing love.
Stephen E Arnold, December 23, 2014
xx
Cyber OSINT Surprise: Digital Reasoning
December 19, 2014
I read “Machine Learning Can Help Sift Open Source Intelligence.” I found one familiar name, Basis Technologies. I found one established vendor, Opera Solutions, and I noted one company that has a content processing system. In the run up to the February 19, 2014, Cyber OSINT conference, Basis Technologies pointed out that it was not really into cyber OSINT at least on February 19, 2014. Opera Solutions is interesting and was on the list of 20 firms to invite. We filled the 12 slots quickly. Some deserving companies could not be included. Then there is Digital Reasoning, an outfit in Nashville, Tennessee.
The write up says:
The company’s cognitive computing platform, dubbed Synthesys, scans unstructured open source data to highlight relevant people, places, organizations, events and other facts. It relies on natural language processing along with what the company calls “entity and fact extraction.” Applying “key indicators” and a framework, the platform is intended to automate the process of deriving intelligence from open source data, the company claims. The platform then attempts to assemble and organize relevant unstructured data using similarity algorithms, categorization and “entity resolution.”
The idea which unifies these three companies appears to be fancy math; that is, the use of statistical procedures to resolve issues associated with content processing.
The only hitch in the git along is that the companies that appear to be making the quickest strides in cyber OSINT use hybrid approaches. The idea is that statistical systems and methods are used. These are supplemented with various linguistic systems and methods.
The distinction is to me important. In the February 2015 seminar, a full picture of the features and functions associated with content processing in English and other languages is explored. There are profiles of appliance vendors tapping OSINT to head off threats. But the focus of the talks is on the use of advanced approaches that provide system users with an integrated approach to open source information.
The article is good public relations/content marketing. The article does not highlight the rapid progress the companies participating in the seminar are making. Yesterday’s leaders are today’s marketing challenge. Tomorrow’s front runners are focused on delivering to their clients solutions that break new ground.
For information about the seminar, which is restricted to working law enforcement and intelligence professionals and to place an order for my new monograph “CyberOSINT: Next Generation Information Access,” write benkent2020 at yahoo dot com.
Stephen E Arnold, December 19, 2014