Browse >
Home / Archive by category 'Enterprise search'
My team and I are working on a new project. With our Overflight system, we have an archive of memorable and not so memorable factoids about search and content processing. One of the goslings who was actually working yesterday asked me, “Do you recall this presentation?”
The presentation was “Implementing Semantic Search in the Enterprise,” created in 2009, which works out to six years ago. I did not recall the presentation. But the title evoked an image in my mind like this:

I asked, “How is this germane to our present project?’
The reply the gosling quacked was, “Semantic search means taxonomy.” The gosling enjoined me to examine this impressive looking diagram:

Okay.
I don’t want a document. I don’t want formatted content. I don’t want unformatted content. I want on point results I can use. To illustrate the gap between dumping a document on my lap and presenting some useful, look at this visualization from Geofeedia:

The idea is that a person can draw a shape on a map, see the real time content flowing via mobile devices, and look at a particular object. There are search tools and other utilities. The user of this Geofeedia technology examines information in a manner that does not produce a document to read. Sure, a user can read a tweet, but the focus is on understanding information, regardless of type, in a particular context in real time. There is a classification system operating in the plumbing of this system, but the key point is the functionality, not the fact that a consulting firm specializing in taxonomies is making a taxonomy the Alpha and the Omega of an information access system.
The deck starts with the premise that semantic search pivots on a taxonomy. The idea is that a “categorization scheme” makes it possible to index a document even though the words in the document may be the words in the taxonomy.

For me, the slide deck’s argument was off kilter. The mixing up of a term list and semantic search is the evidence of a Rube Goldberg approach to a quite important task: Accessing needed information in a useful, actionable way. Frankly, I think that dumping buzzwords into slide decks creates more confusion when focus and accuracy are essential.
At lunch the goslings and I flipped through the PowerPoint deck which is available via LinkedIn Slideshare. You may have to register to view the PowerPoint deck. I am never clear about what is viewable, what’s downloadable, and what’s on Slideshare. LinkedIn has its real estate, publishing, and personnel businesses to which to attend, so search and retrieval is obviously not a priority. The entire experience was superficially amusing but on a more profound level quite disturbing. No wonder enterprise search implementations careen in a swamp of cost overruns and angry users.
Now creating taxonomies or what I call controlled term lists can a darned exciting process. If one goes the human route, there are discussions about what term maps to what word or phrase. Think buzz group and discussion group and online collaboration. What terms go with what other terms. In the good old days, these term lists were crafted by subject matter and indexing specialists. For example, the guts of the ABI/INFORM classification coding terms originated in the 1981-1982 period and was the product of more than 14 individuals, one advisor (the now deceased Betty Eddison), and the begrudging assistance of the Courier Journal’s information technology department which performed analyses of the index terms and key words in the ABI/INFORM database. The classification system was reasonably, and it was licensed by the Royal Bank of Canada, IBM, and some other savvy outfits for their own indexing projects.
As you might know, investing two years in human and some machine inputs was an expensive proposition. It was the initial step in the reindexing of the ABI/INFORM database, which at the time was one of the go to sources of high value business and management information culled from more than 800 publications worldwide.
The only problem I have with the slide deck’s making a taxonomy a key concept is that one cannot craft a taxonomy without knowing what one is indexing. For example, you have a flow of content through and into an organization. In a business engaged in the manufacture of laboratory equipment, there will be a wide range of information. There will be unstructured information like Word documents prepared by wild eyed marketing associates. There will be legal documents artfully copied and pasted together from boiler plate. There will be images of the products themselves. There will be databases containing the names of customers, prospects, suppliers, and consultants. There will be information that employees download from the Internet or tote into the organization on a storage device.
The key concept of a taxonomy has to be anchored in reality, not an external term list like those which used to be provided by Oracle for certain vertical markets. In short, the time and cost of processing these items of information so that confidentiality is not breached is likely to make the organization’s accountant sit up and take notice.
Today many vendors assert that their systems can intelligently, automatically, and rapidly develop a taxonomy for an organization. I suggest you read the fine print. Even the whizziest taxonomy generator is going to require some baby sitting. To get a sense of what is required, track down an experienced licensee of the Autonomy IDOL system. There is a training period which requires a cohesive corpus of representative source material. Sorry, no images or videos accepted but the existing image and video metadata can be processed. Once the system is trained, then it is run against a test set of content. The results are examined by a human who knows what he or she is doing, and then the system is tuned. After the smart system runs for a few days, the human inspects and calibrates. The idea is that as content flows through the system and periodic tweaks are made, the system becomes smarter. In reality, indexing drift creeps in. In effect, the smart software never strays too far from the human subject matter experts riding herd on algorithms.
The problem exists even when there is a relatively stable core of technical terminology. The content of a lab gear manufacturer is many times greater than the problem of a company focusing on a specific branch of engineering, science, technology, or medicine. Indexing Halliburton nuclear energy information is trivial when compared to indexing more generalized business content like that found in ABI/INFORM or the typical services organization today.
I agree that a controlled term list is important. One cannot easily resolve entities unless there is a combination of automated processes and look up lists. An example is figuring out if a reference to I.B.M., Big Blue, or Armonk is a reference to the much loved marketers of Watson. Now handle a transliterated name like Anwar al-Awlaki and its variants. This type of indexing is quite important. Get it wrong and one cannot find information germane to a query. When one is investigating aliases used by bad actors, an error can become a bad day for some folks.
The remainder of the slide deck rides the taxonomy pony into the sunset. When one looks at the information created 72 months ago, it is easy for me to understand why enterprise search and content processing has become a “oh, my goodness” problem in many organizations. I think that a mid sized company would grind to a halt if it needed a controlled vocabulary which matched today’s content flows.
My take away from the slide deck is easy to summarize: The lesson is that putting the cart before the horse won’t get enterprise where it must go to retain credibility and deliver utility.
Stephen E Arnold, May 9, 2015
It seems that Microsoft and Yahoo are friends again, at least for the time being. Search Engine Watch announces, “Yahoo and Microsoft Amend Search Agreement.” The two companies have been trying to partner on search for the past six years, but it has not always gone smoothly. Writer Emily Alford tells us what will be different this time around:
“First, Yahoo will have greater freedom to explore other search platforms. In the past, Yahoo was rumored to be seeking a partnership with Google, and under the new terms, Microsoft and Yahoo’s partnership will no longer be exclusive for mobile and desktop. Under the new agreement, Yahoo will continue to serve Bing ads on desktop and mobile, as well as use Bing search results for the majority of its desktop search traffic, though the exact number was undisclosed.
“Microsoft and Yahoo are also making changes to the way that ads are served. Microsoft will now maintain control of the Bing ads salesforce, while Yahoo will take full control of its Gemini ads salesforce, which will leave Bing free to serve its own ads side by side with Yahoo search results.”
Yahoo CEO Marissa Mayer painted a hopeful picture in a prepared statement. She and Microsoft CEO Satya Nadella have been working together, she reports, to revamp the search deal. She is “very excited to explore” the fresh possibilities. Will the happy relationship hold up this time around?
Cynthia Murrell, May 7, 2015
Sponsored by ArnoldIT.com, publisher of the CyberOSINT monograph
A new enterprise search system startup is leveraging the SAP HANA Cloud Platform, we learn from “EnterpriseJungle Tames Enterprise Search” at SAP’s News Center. The company states that their goal is to make collaboration easier and more effective with a feature they’re calling “deep people search.” Writer Susn Galer cites EnterpriseJungle Principal James Sinclair when she tells us:
“Using advanced algorithms to analyze data from internal and external sources, including SAP Jam, SuccessFactors, wikis, and LinkedIn, the applications help companies understand the make-up of its workforce and connect people quickly….
“Who Can Help Me is a pre-populated search tool allowing employees to find internal experts by skills, location, project requirements and other criteria which companies can also configure, if needed. The Enterprise Q&A tool lets employees enter any text into the search bar, and find experts internally or outside company walls. Most companies use the prepackaged EnterpriseJungle solutions as is for Human Resources (HR), recruitment, sales and other departments. However, Sinclair said companies can easily modify search queries to meet any organization’s unique needs.”
EnterpriseJungle users manage their company’s data through SAP’s Lumira dashboard. Galer shares Sinclair’s example of one company in Germany, which used EnterpriseJungle to match employees to appropriate new positions when it made a whopping 3,000 jobs obsolete. Though the software is now designed primarily for HR and data-management departments, Sinclair hopes the collaboration tool will permeate the entire enterprise.
Cynthia Murrell, May 4, 2015
Sponsored by ArnoldIT.com, publisher of the CyberOSINT monograph
Funnelback, as I have mentioned, has lost some of its marketing oomph. I think some staff shuffling took place. The company is now stepping up its effort to remain visible in a darned tough, crowded, and struggling market sector: Enterprise search.
I read “How Do You Solve a Problem Like Enterprise Search?” My answer is and has been, “One does not. One solves specific information access problems.” The wreckage of Convera, Delphes, Entopia, Fast Search, et al is evidence that enterprise search is a sticky wicket. The howls of pain on the LinkedIn forums and the odd collection of content in the Paper.Li round up about enterprise search make the challenges quite visible.
Read the article. Here are two points I found interesting.
According to the founder of Funnelback:
A holistic enterprise search solution should include:
- Bird’s-eye view metrics of all content, showing where it’s stored (e.g. web vs. enterprise vs. social media), how much exists in each repository, how old it is, missing metadata, poor quality titles, duplication, accessibility metrics, and the link graph. This provides information managers with a means to prioritize organizational investment in managing information, and thereby enhancing search effectiveness.
- Intelligent guidance on how to make content more visible/findable. Search engines generally attempt to hide the internals of their ranking systems and this makes it difficult for customers to learn how to make content more findable. An enterprise search engine should use its internal ranking knowledge to show content authors why pages rank the way they do and provide guidance on how to increase each page’s findability.
- The ability to surface and promote content based on user context with simple rules such as “User is in Department A”, “User is located in New Zealand”, “User is in the finance industry”, “User works for LexisNexis”. These rules can then be overlaid to form more sophisticated rules, without the need to create rules for every distinct possibility. Funnelback goes even further by allowing these rules to be applied to anonymous users by looking up their IP address in an internal database and inferring information based on the organization that owns the IP address.
These are darned interesting “shoulds.” The problem of access controls, contractual and regulatory constraints, and the human practice of creating silos of information are tough nuts to crack. “Shoulds” are easy. Delivering is tough, and Funnelback is neither more or less well equipped than open source or proprietary information retrieval solutions.
The second point illustrates the flawed logic that many champions of enterprise search as a grand solution make. Here’s the passage:
The first question every organization should ask is: Who are the stakeholders affecting the success of our organization and what information do they need to maximize our success?
At a more practical level, this includes questions like:
- What are the personas in our organization? (i.e. the archetypes that represent the different roles)
- What information do they need in order to maximize productivity and make better decisions?
- What are our customer personas?
- What information do they need in order to maximize engagement and have a positive customer experience?
Without asking these questions, organizations sometimes assume that searching everything with a single query (access controls permitting) is the answer. Sometimes it is the answer, but it can be a more complicated and costly exercise than necessary. For example, do users want to use an enterprise search tool to search their own email, or would they prefer to use the search on their mail client?
Sorry, Funnelback. Asking the questions is the first step. The work is to answer the questions and then use that information to tailor a solution that does not anger the users, lead to litigation, or just not work.
Today’s flagship enterprise search vendors seem to include Coveo, dtSearch, Elastic, Funnelback, and a handful of other firms with low profiles. The present crisis in information access has been created by the actions of previous industry leaders in enterprise search.
The fix is to focus on solving a problem for a specific group of users. Lawyers have specialized search tools. Chemists have specialized search tools. Regular employees have Google and whatever findabiliy solution is available within specific applications.
Want to get in a pickle? Sell a clueless senior executive a solution that solves the information access challenges for the entire organization. Didn’t work for STAIRS and won’t work for today’s systems.
The history of search is a painful one. There are options, but these are next generation systems, not yesterday’s systems wrapped with shoulds.
Stephen E Arnold, May 1, 2015
Navigate to http://hits1k.com/fast-enterprise-search/. Here’s the Web page I see:

The spelling of the word “fast” with caps and the reference to artificial intelligence are interesting. Is this company surfing on the “reputation” of the late, much loved, and not forgotten Fast Search & Transfer? Believe it or not the company is integrating
Yes, it is a reseller of EMC and Microsoft explaining search using Microsoft Surface. The video and the marketing suggest that enterprise search is a most interesting discipline. I watched the video, and I would give the solution a whirl if I had a Surface and some interest in standing up and looking down as I dragged stuff from place to place.
Don’t try this app on your mobile phone.
Stephen E Arnold, May 1, 2015
Written by Stephen E. Arnold · Filed Under Enterprise search, News | Comments Off on FAST Marketing: Will It Work?
Enterprise search is limited to how well users tag their content and the preloaded taxonomies. According Tech Target’s Search Content Management blog, text analytics might be the key to turning around poor enterprise search performance: “How Analytics Engines Could Finally-Relieve Enterprise Pain.” Text analytics turns out to only be part of the solution. Someone had the brilliant idea to use text analytics to classification issues in enterprise search, making search reactive to user input to proactive to search queries.
In general, analytics search engines work like this:
“The first is that analytics engines don’t create two buckets of content, where the goal is to identify documents that are deemed responsive. Instead, analytics engines identify documents that fall into each category and apply the respective metadata tags to the documents. Second, people don’t use these engines to search for content. The engines apply metadata to documents to allow search engines to find the correct information when people search for it. Text analytics provides the correct metadata to finally make search work within the enterprise.”
Supposedly, they are fixing the tagging issue by removing the biggest cause for error: humans. Microsoft caught onto how much this could generate profit, so they purchased Equivio in 2014 and integrated the FAST Search platform into SharePoint. Since Microsoft is doing it, every other tech company will copy and paste their actions in time. Enterprise search is gull of faults, but it has improved greatly. Big data trends have improved search quality, but tagging continues to be an issue. Text analytics search engines will probably be the newest big data field for development. Hint for developers: work on an analytics search product, launch it, and then it might be bought out.
Whitney Grace, May 1 2015
Sponsored by ArnoldIT.com, publisher of the CyberOSINT monograph
A new enterprise search system startup is leveraging the SAP HANA Cloud Platform, we learn from “EnterpriseJungle Tames Enterprise Search” at SAP’s News Center. The company states that their goal is to make collaboration easier and more effective with a feature they’re calling “deep people search.” Writer Susn Galer cites EnterpriseJungle Principal James Sinclair when she tells us:
“Using advanced algorithms to analyze data from internal and external sources, including SAP Jam, SuccessFactors, wikis, and LinkedIn, the applications help companies understand the make-up of its workforce and connect people quickly….
“Who Can Help Me is a pre-populated search tool allowing employees to find internal experts by skills, location, project requirements and other criteria which companies can also configure, if needed. The Enterprise Q&A tool lets employees enter any text into the search bar, and find experts internally or outside company walls. Most companies use the prepackaged EnterpriseJungle solutions as is for Human Resources (HR), recruitment, sales and other departments. However, Sinclair said companies can easily modify search queries to meet any organization’s unique needs.”
EnterpriseJungle users manage their company’s data through SAP’s Lumira dashboard. Galer shares Sinclair’s example of one company in Germany, which used EnterpriseJungle to match employees to appropriate new positions when it made a whopping 3,000 jobs obsolete. Though the software is now designed primarily for HR and data-management departments, Sinclair hopes the collaboration tool will permeate the entire enterprise.
Cynthia Murrell, April 27, 2015
Sponsored by ArnoldIT.com, publisher of the CyberOSINT monograph
In 2014, we noted that Scientel’s Norman Kutemperor was a leader in NoSQL data management. We learned that Scientel is beating the drum for an integrated, user friendly content management, search, and analytics system. Kutemperor has been described as the father of NoSQL.
According to “Scientel Releases EZContent Content Management and Search System for the Small Enterprise,”
Scientel’s EZContent™ Content Management and Search system operating under GENSONIX® NoSQL DB is an advanced ECM solution for “Big Data” content for the smaller enterprise. Scientel’s EZContent is derived from Scientel’s primary Enterprise Content Management & Search System (ECMS). It is the ideal, most cost-effective, and simple to operate tool for organizing, managing, and retrieving your Big Data contents at all organizational levels. Powerful, yet comprehensive and fun to use, it can start small and is highly scalable. The system can be configured for various system requirements. This makes it ideal for use in small offices/organizations as well as medium and large enterprises.
The company asserts that it has a search system which displays an information object thumbnail. The user drags a document to the system. EZContent processes 40 different file types, including images and video clips. Kutemperor explained the search system this way:
With ECMS, we are able to move the contents of that CD into our ECMS system, and all 100+ people can access that all at the same time. They can also do searches from within what we call textual documents – PDFs, Microsoft documents mostly are all textual documents, whereas clips, videos and pictures are not. By being able to search inside the textual document, we can actually locate what we are looking for and get to the right page that we want to read. Content management is a very valuable tool for all of us, and it is a very helpful tool for all organizations, whether it is non-profit or profit, commercial, corporate, scientific, medical, city government, small businesses or large enterprises. Everybody needs it and now can have it cost-effectively. The basic offering that we can start with is a very small appliance that is turnkey and virtually maintenance free. It is easily installed into the network and pretty much goes to work without having to do too much in the way of setup. For larger organizations, we offer appliances that can scale to very large configurations, that can store very large numbers of documents efficiently, and that are able to locate these documents rapidly.
According to CIOReview:
Scientel’s Gensonix DB is an all in one SQL. Gensonix based solutions can take the place of SQL, NoSQL and storage systems and can process large data sets in real time. Its massive core based parallel solutions deliver performance in range with in memory systems. thus performance of Gensonix on Scientel LDWA hardware matches the performance of in memory systems and with higher reliability.
In 2014 the database was described as “polymorphic.” One explanation is:
Polymorphism is the ability of an entity to behave like more than 1 of its counter parts given a set of circumstances or criteria; or, the provision of a single interface (a shared boundary across which separate components of a computer system exchange information) to entities of different types. In other words, in a polymorphic DB, you can use a relational approach when that is appropriate, hierarchical when that is, and so on. No one paradigm is fully implemented, but the DB uses enough of the features/capabilities needed to provide a reasonable solution to a problem.
These are envelope stretching assertions. The Manta entry for the company reports, perhaps erroneously, that the company has three employees. Another Manta entry asserts that the company employs five to nine people and has revenues of $1.0 million to $2.5 million. For more information about Scientel, navigate to the company’s Web site at www.scientel.com.
Should MarkLogic and other vendors offering similar products up their game? Worth monitoring this Swiss Army knife approach to information access.
Stephen E Arnold, April 25, 2015
Sponsored by CyberOSINT: Next Generation Information Access
Searching is an essential function for basic Internet use and it is a vital function in enterprise systems. While searching on the Internet with a search engine might not seem like a security risk, the comparable action on enterprise search could be potentially dangerous. Security Enterprises points out the potential security risks in the article, “SearchBlox Vulnerabilities Underscore Importance Of Updating Enterprise Search Tools.”
Recently the Carnegie Mellon Software Engineering Institute CERT Division compiled a list of all the security risks from SearchBlox’s software. They included ways for hackers to view private information, upload files, cross-site (XSS) scripting, and cross-site request forgeries. Enterprise security developers can learn from SearchBlox’s vulnerabilities by being aware and repairing them before a hacker discovers the information leak.
The problem, however, might come from within an organization rather than out:
“Of all the possible threats, the ability for cybercriminals to conduct XSS attacks from within the product’s default search box is likely the most concerning, Threatpost reported. On the other hand, anyone trying to take advantage of such SearchBlox vulnerabilities would need to be an authenticated user, though there is no shortage of stories about insider threats within the enterprise.”
The article alludes that SearchBlox’s vulnerabilities came from day-to-day activities that keep an organization running. Using SearchBlox as an example, other organizations with enterprise systems will be able to learn where their own products need patches so the same issues don’t happen with them. So what do you take away: most hackers are probably insiders and look for holes in the ordinary, everyday routines.
Whitney Grace, April 23, 2015
Sponsored by ArnoldIT.com, publisher of the CyberOSINT monograph
The article titled Microsoft Beefs Up Office 365’s Delve, Aims To Complete Its Rollout By May on Computerworld discusses the improvements to the enterprise search and discovery app Delve. Delve was built for Office 365’s Office Graph machine learning engine, and helps create and analyze detailed data on users by linking to content through card icons. The article states,
“Based on what it learns about the user’s work, it determines which files, colleagues, documents and data are most relevant and important at any given point, and displays links to them in a graphically rich, card-based dashboard. Delve provides this assistance in real time, so that users can prioritize their work and find the information they need as they participate in whatever work projects and tasks they’re involved in.”
This means that Delve can figure that a user’s upcoming meeting will be about a particular topic with particular colleagues, and then collect information that is relevant in a timely manner for display in the dashboard. Microsoft is currently working to make Delve capable of analyzing email content within Exchange Online attachments. Yammer actions will also be performable in the near future from the Delve interface. It can also, of course, be used more traditionally as a search engine, but Microsoft has big plans for more dynamic and innovative capabilities.
Chelsea Kerwin, April 20, 2014
Stephen E Arnold, Publisher of CyberOSINT at www.xenky.com
« Previous Page — Next Page »