Structured Search: New York Style
October 10, 2016
An interesting and brief search related content marketing white paper “InnovationQ Plus Search Engine Technology” attracted my attention. What’s interesting is that the IEEE is apparently in the search engine content marketing game. The example I have in front of me is from a company doing business as IP.com.
What does InnovationQ Plus do to deliver on point results? The write up says:
This engine is powered by IP.com’s patented neural network machine learning technology that improves searcher productivity and alleviates the difficult task of identifying and selecting countless keywords/synonyms to combine into Boolean syntax. Simply cut and paste abstracts, summaries, claims, etc. and this state-of-the art system matches queries to documents based on meaning rather than keywords. The result is a search that delivers a complete result set with less noise and fewer false positives. Ensure you don’t miss critical documents in your search and analysis by using a semantic engine that finds documents that other tools do not.
The use of snippets of text as the raw material for a behind-the-scenes query generator reminds me of the original DR-LINK method, among others. Perhaps there is some Syracuse University “old school” search DNA in the InnovationQ Plus approach? Perhaps the TextWise system has manifested itself as a “new” approach to patent and STEM (scientific, technology, engineering, and medical) online searching? Perhaps Manning & Napier’s interest in information access has inspired a new generation of search capabilities?
My hunch is, “Yep.”
If you don’t have a handy snippet encapsulating your search topic, just fill in the query form. Google offers a similar “fill in the blanks” approach even thought a tiny percentage of those looking for information on Google use advanced search. You can locate the Google advanced search form at this link.
Part of the “innovation” is the use of fielded search. Fielded search is useful. It was the go to method for locating information in the late 1960s. The method fell out of favor with the Sillycon Valley crowd when the idea of talking to one’s mobile phone became the synonym for good enough search.
To access the white paper, navigate the IEEE registration page and fill out the form at this link.
From my vantage point, structured search with “more like this” functions is a good way to search for information. There is a caveat. The person doing the looking has to know what he or she needs to know.
Good enough search takes a different approach. The systems try to figure out what the searcher needs to know and then deliver it. The person looking for information is not required to do much thinking.
The InnovationQ Plus approach shifts the burden from smart software to smart searchers.
Good enough search is winning the battle. In fact, some Sillycon Valley folks, far from upstate New York, have embraced good enough search with both hands. Why use words at all? There are emojis, smart software systems predicting what the use wants to know, and Snapchat infused image based methods.
The challenge will be to find a way to bridge the gap between the Sillycon Valley good enough methods and the more traditional structured search methods.
IEEE seems to agree as long as the vendor “participates” in a suitable IEEE publishing program.
Stephen E Arnold, October 10, 2016
Crimping: Is the Method Used for Text Processing?
October 4, 2016
I read an article I found quite thought provoking. “Why Companies Make Their Products Worse” explains that reducing costs allows a manufacturer to expand the market for a product. The idea is that more people will buy a product if it is less expensive than a more sophisticated version of the product. The example which I highlighted in eyeshade green explained that IBM introduced an expensive printer in the 1980s. Then IBM manufactured the different version of the printer using cheaper labor. The folks from Big Blue added electronic components to make the cheaper printer slower. The result was a lower cost printer that was “worse” than the original.
Perhaps enterprise search and content processing is a hybrid of two or more creatures?
The write up explained that this approach to degrading a product to make more money has a name—crimping. The concept creates “product sabotage”; that is, intentionally degrading a product for business reasons.
The comments to the article offer additional examples and one helpful person with the handle Dadpolice stated:
The examples you give are accurate, but these aren’t relics of the past. They are incredibly common strategies that chip makers still use today.
I understand the hardware or tangible product application of this idea. I began to think about the use of the tactic by text processing vendors.
The Google Search Appliance may have been a product subject to crimping. As I recall, the most economical GSA was less than $2000, a price which was relatively easy to justify in many organizations. Over the years, the low cost option disappeared and the prices for the Google Search Appliances soared to Autonomy- and Fast Search-levels.
Other vendors introduced search and content processing systems, but the prices remained lofty. Search and content processing in an organization never seemed to get less expensive when one considered the resources required, the license fees, the “customer” support, the upgrades, and the engineering for customization and optimization.
My hypothesis is that enterprise content processing does not yield compelling examples like the IBM printer example.
Perhaps the adoption rate for open source content processing reflects a pent up demand for “crimping”? Perhaps some clever graduate student would take the initiative to examine the content processing product prices? Licensees spend for sophisticated solution systems like those available from outfits like IBM and Palantir Technologies. The money comes from the engineering and what I call “soft” charges; that is, training, customer support, and engineering and consulting services.
At the other end of the content processing spectrum are open source solutions. The middle between free or low cost systems and high end solutions does not have too many examples. I am confident there are some, but I could identify Funnelback, dtSearch, and a handful of other outfits.
Perhaps “crimping” is not a universal principle? On the other hand, perhaps content processing is an example of a technical software which has its own idiosyncrasies.
Content processing products, I believe, become worse over time. The reason is not “crimping.” The trajectory of lousiness comes from:
- Layering features on keyword retrieval in hopes of finding a way to generate keen buyer interest
- Adding features helps justify price increases
- The greater the complexity of the system, the less likely the licensee will be able to fiddle with the system
- A refusal to admit that content processing is a core component of many other types of software so “finding information” has become a standard component for other applications.
If content processing is idiosyncratic, that might explain why investors pour money into content processing companies which have little chance to generate sufficient revenue to pay off investors, generate a profit, and build a sustainable business. Enterprise search and content processing vendors seem to be in a state of reinventing or reimagining themselves. Guitar makers just pursue cost cutting and expand their market. It is not so easy for content processing companies.
Stephen E Arnold, October 4, 2016
Five Years in Enterprise Search: 2011 to 2016
October 4, 2016
Before I shifted from worker bee to Kentucky dirt farmer, I attended a presentation in which a wizard from Findwise explained enterprise search in 2011. In my notes, I jotted down the companies the maven mentioned (love that alliteration) in his remarks:
- Attivio
- Autonomy
- Coveo
- Endeca
- Exalead
- Fabasoft
- IBM
- ISYS Search
- Microsoft
- Sinequa
- Vivisimo.
There were nodding heads as the guru listed the key functions of enterprise search systems in 2011. My notes contained these items:
- Federation model
- Indexing and connectivity
- Interface flexibility
- Management and analysis
- Mobile support
- Platform readiness
- Relevance model
- Security
- Semantics and text analytics
- Social and collaborative features
I recall that I was confused about the source of the information in the analysis. Then the murky family tree seemed important. Five years later, I am less interested in who sired what child than the interesting historical nuggets in this simple list and collection of pretty fuzzy and downright crazy characteristics of search. I am not too sure what “analysis” and “analytics” mean. The notion that an index is required is okay, but the blending of indexing and “connectivity” seems a wonky way of referencing file filters or a network connection. With the Harvard Business Review pointing out that collaboration is a bit of a problem, it is an interesting footnote to acknowledge that a buzzword can grow into a time sink.
There are some notable omissions; for example, open source search options do not appear in the list. That’s interesting because Attivio was at that time I heard poking its toe into open source search. IBM was a fan of Lucene five years ago. Today the IBM marketing machine beats the Watson drum, but inside the Big Blue system resides that free and open source Lucene. I assume that the gurus and the mavens working on this list ignored open source because what consulting revenue results from free stuff? What happened to Oracle? In 2011, Oracle still believed in Secure Enterprise Search only to recant with purchases of Endeca, InQuira, and Rightnow. There are other glitches in the list, but let’s move on.
Attensity: A Big 404 in Text Analytics
October 1, 2016
Search vendors can save their business by embracing text analytics. Sounds like a wise statement, right? I would point out that our routine check of search and content processing companies turned up this inspiring Web page for Attensity, the Xerox Parc love child and once hot big dog in text analysis:
Attensity joins a long list of search-related companies which have had to reinvent themselves.
The company pulled in $90 million from a “mystery investor” in 2014. A pundit tweeted in 2015:
In February 2016, Attensity morphed into Sematell GmbH, a company with interaction solutions.
I mention this arabesque because it underscores:
- No single add on to enterprise search will “save” an information access company
- Enterprise search has become a utility function. Witness the shift to cloud based services like SearchBlox, appliances like Maxxcat, and open source options. Who will go out on a limb for a proprietary utility when open source variants are available and improving?
- Pundits who champion a company often have skin in the game. Self appointed experts for cognitive computing, predictive analytics, or semantic link analysis are tooting a horn without other instruments.
Attensity is a candidate to join the enterprise search Hall of Fame. In the shrine are Delphes, Entopia, et al. I anticipate more members, and I have a short list of “who is next” taped on my watch wall.
Stephen E Arnold, October 1, 2016
Lexmark Upgrades Its Enterprise Search
September 30, 2016
Enterprise search has taken a back a back seat to search news regarding Google’s next endeavor and what the next big thing is in big data. Enterprise search may have taken a back seat in my news feed, but it is still a major component in enterprise systems. You can even speculate that without a search function, enterprise systems are useless.
Lexmark, one of the largest suppliers of printers and business solutions in the country, understand the importance of enterprise search. This is why they recently updated the description of its Perceptive Enterprise Search in its system’s technical specifications:
Perceptive Enterprise Search is a suite of enterprise applications that offer a choice of options for high performance search and mobile information access. The technical specifications in this document are specific to Perceptive Enterprise Search version 10.6…
A required amount of memory and disk space is provided. You must meet these requirements to support your Perceptive Enterprise Search system. These requirements specifically list the needs of Perceptive Enterprise Search and do not include any amount of memory or disk space you require for the operating system, environment, or other software that runs on the same machine.
Some technical specifications also provide recommendations. While requirements define the minimum system required to run Perceptive Enterprise Search, the recommended specifications serve as suggestions to improve the performance of your system. For maximum performance, review your specific environment, network, and platform capabilities and analyze your planned business usage of the system. Your specific system may require additional resources above these recommendations.”
It is pretty standard fare when it comes to technical specifications, in other words, not that interesting but necessary to make the enterprise system work correctly.
Whitney Grace, September 30, 2016
Sponsored by ArnoldIT.com, publisher of the CyberOSINT monograph
HonkinNews for September 27, 2016, Now Available
September 27, 2016
This week’s HonkinNews video tackles Yahoo’s data breach. Stephen E Arnold reveals that Beyond Search thinks Yahoo is a hoot and tags the company Yahoot. Plus, HonkinNews suggests that Oliver Stone may want to do a follow up to Snowden. The new film could be “Marissa: Purple Sun Down.” Other stories include Hewlett Packard Enterprise’s opportunity to see the light with Dr. Michael Lynch’s Luminance. The video explains puppy bias and comments on Harvard’s interest in sugar and fat. You can view the seven minute video at https://youtu.be/64rJdlj4Lew.
Kenny Toth, September 27, 2016
HonkinNews for September 20, 2016 Available
September 20, 2016
Stories in the Beyond Search weekly video news program “HonkinNews” include LinkedIn’s censorship of a former CIA professional’s post about the 2016 election. Documentum, founded in 1990, has moved to the frozen wilds of Canada. A Microsoft and Nvidia sponsored online beauty contest may have embraced algorithmic bias. Google can write a customer’s ad automatically and may be able to alter users’ thoughts and actions. Which vendors of intelligence-centric software may be shown the door to the retirement home? The September 20, 2016, edition of “HonkinNews”, filmed with old-fashioned technology in the wilds of rural Kentucky is online at this link.
Kenny Toth, September 20, 2016
HonkinNews, September 13, 2016 Now Available
September 13, 2016
Interested in having your polynomials probed? The Beyond Search weekly news explains this preventive action. In this week’s program you will learn about Google new enterprise search solution. Palantir is taking legal action against an investor in the company. IBM Watson helps out at the US Open. Catch up on the search, online, and content processing news that makes the enterprise procurement teams squirm. Dive in with Springboard and Pool Party. To view the video, click this link.
Kenny Toth, September 13, 2016
HonkinNews Video Available for August 23, 2016
August 23, 2016
After several tests, the fourth HonkinNews video is available on YouTube. You can view the six minute video at https://youtu.be/AIYdu54p2Mg. The HonkinNews highlights a half dozen stories from the previous week’s Beyond Search stream. The commentary adds a tiny twist to most of the stories. We know that search and content processing are not the core interests of the millennials. We don’t expect to attract much of a following from teens or from “real” search experts. Nevertheless, we will continue with the weekly news program because Google has an appetite for videos. We will continue with the backwoods theme and the 16 mm black and white film. We think it adds a high tech look to endless recycling of search and content jargon which fuels information access today.
Kenny Toth, August 23, 2016
Rocket Software Enterprise Search
August 22, 2016
My recollection is that the search plumbing for Rocket Software Enterprise Search is AeroText. If you are not familiar with AeroText, the system was for a number of years a property of Lockheed Martin. But times change. Rocket Software purchased AeroText in 2008. The news release about the deal stated:
The AeroText product suite provides a fast, agile information extraction system for developing knowledge-based content analysis applications. The technology excels at developing a core understanding of content contained within unstructured text, such as emails and documents, as well as an ability to automatically reconcile information cited across multiple documents. Such a capability makes it suited for a variety of applications, from counter-terrorism and law enforcement to business intelligence and enterprise content management. AeroText was originally developed by Lockheed Martin and is often integrated into other solutions. AeroText solutions provide both information extraction and link analysis capabilities by converting unstructured information into structured information.
Is this information important? Well, to those who want to use open source search solutions, nah. To companies wanting a proprietary search system with a defense pedigree, yes.
If you want Rocket Software’s description of one of its uses of the AeroText technology, you can download the white paper “How Enterprise Search Enhances Enterprise Intelligence” at this link. You will have to register and be careful not to hit the “return” key. Don’t care? Well, prepare to complete the information a second time.
AeroText used to require human tweaked rules, a human built classification scheme, and content in XML. Each of these attributes is characteristics of a traditional approach to information retrieval.
Stephen E Arnold, August 22, 2016