Exploratory Search Trends Debated
December 11, 2013
An article titled The Changing Face of Exploratory Search on Linkedin presents the current trends in search. Exploratory search is distinct from navigational search, the latter searcher-type knows what she is expecting to get in terms of results. An exploratory searcher might know the search criteria but not how many results will meet their criteria, if any. The article claims that while navigational search exploded in the last fifteen years, exploratory search is still nascent.
The trends highlighted in the article include:
1.) Entity-oriented search. Search has moved beyond words as mere strings of text and increasingly focuses on entities that represent people, places, organizations, and topics.
2.) Knowledge graphs. Search is starting to leverage the network of relationships among entities: Google has its knowledge graph; Microsoft has Satori; and networks like LinkedIn and Facebook are fundamentally social graphs of entities.
3.) Search assistance. Google popularized search suggestions nearly a decade ago, using its knowledge of common queries to reduce effort on the part of searchers.
The article goes on to explain what will happen when faceted search (a mixture of entity-oriented and knowledge graph searches) expands, allowing for precision searches. The final step is faceted search combining with search assistance to mold something akin to Facebook’s graph search. The article touts these trends as new, but they sound awfully familiar. Didn’t Inktomi and Endeca approach search in this way in the 1980s? Perhaps this is just old wine in a new semantic bottle.
Chelsea Kerwin, December 11, 2013
Sponsored by ArnoldIT.com, developer of Augmentext
Verity 2005 Profile Now Available
December 10, 2013
If you have found the “frozen” enterprise search vendor profiles interesting, you may want to check out the Verity 2005 profile. From 1988 to 2005, Verity was one of the leading providers of information retrieval solutions. Verity was purchased by Autonomy in late 2005, and since that deal closed, the Verity brand has been less and less visible. Some young search mavens are unfamiliar with the Verity system I learned in November 2013. Would you believe that one of the people who had huge Verity gaps in his knowledge works for the company that owns Autonomy. Perhaps my free profiles will help the new wave of search experts appreciate the past and the sameness of systems and the predictable boom and bust cycles of the enterprise search market.
The profile provides a snapshot of Verity, its innovations, and its marketing trajectory during the firm’s salad days. The company moved in on a market sector carved out by the now almost-forgotten Fulcrum Technologies. Verity moved through the now-standard trajectory of government sales and some big deals, OEM licensing and partnerships, shifting from search to allegedly higher-value concepts like “knowledge,” acquisitions to get a grip on certain market sectors, and then to its sale to Autonomy, arguably the big fish in the enterprise search pond in 2005.
You can access the index page for the free profiles at http://xenky.com/vendor-profiles/.
Please, remember the caveats that were ignored by one poobah last week. You can correct, comment upon, and criticize the “frozen” draft of a report I prepared for a client years ago. Please, use the comments section of the Beyond Search blog. I am not too interested in parental email, smarm, or “wow, that’s great” inputs. A Beyond Search editor will make sure the comments are in bounds, but no direct inputs to the Xenky.com site are supported at this time.
Next up? Fulcrum Technologies. Believe it or not, the firm’s technology is still in use today. When was that technology rolled out? You will have to wait for the next free, frozen profile if you do not know. (I had forgotten until we selected a draft report to post.)
Stephen E Arnold, December 10, 2013
Search and Crowdsourcing: Verbase via Hong Kong
December 6, 2013
Short honk: You may wonder what a crowd sourced search engine is. If you poke around the mainstream Web indexes like Google and Bing, there are some tantalizing clues. Blekko and DuckDuckGo have used the word “crowd sourced” to entice users. With a bit more digging you may come across a search engine from Verbase. The news release, issued in October 2013, explains the notion in this way:
Powered by human intelligence, Verbase delivers more direct results for text-based searches, and enables users to add comments and original content to search results. Verbase is currently receiving over 50,000 unique monthly visitors to its site.
Google is mostly algorithms, most of the time. Rumors of humans tinkering with the giant’s findability system drift around, but Google likes nests of numerical recipes. Humans are, well, human, slow, and often prone to playing volleyball and sleeping.
A Verbase results screen for the query “Fulcrum Technologies Ful/Text”.
The Verbase approach uses three methods:
- A search box that offers category filters. (These look like the Blekko “slash” functions.)
- What the company calls an “automatic user ranking algorithm” that considers “engagement.” (Perhaps this means clicking and the time spent in a results list?)
- A “micro content” function that allows a user to create content. (Does this echo Vivisimo’s approach on steroids?)
According to the news release:
Founded by serial entrepreneur Antoine Sorel Neron, Verbase is a semantic search engine powered by human intelligence that relieves user frustration associated with spam, advertising, and irrelevant search results.
Several observations:
First, Google’s utility is not what it used to be. Search is not about precision and recall. Search is the source of money that funds synthetic biology investments and systems that are tuned to deliver brand advertising. Verbase is one company willing to point out that Google generates results that are sometimes less than useful to online searchers.
Second, Verbase is, like many other Web search companies, hitting some hot buttons to generate interest; for example, crowd sourcing. This is a good idea if methods exist to cope with the issues associated with uncontrolled indexing and content.
Third, the location of the company appears to be Hong Kong. Is this one more example of the center of technology starting to tip somewhere other than longitude of Highway 101?
The system is worth a look. My test queries returned useful results. The graphic approach reminded me of Exalead’s Web search system from three or four years ago. I noted that the system handled an odd ball product name “Ful/Text” reasonably well. Some competitors’ systems insisted that I really wanted “full text.” Wrong.
Verbase brought a smile to my face by returning results that I judged “relevant.” Worth a test drive.
Stephen E Arnold, December 6, 2013
A Search Library for Python
December 6, 2013
Python is one of the many programming languages available. Programmers rely on already existing libraries and open source to help them create new projects. Bitbucket points our attention to “Whoosh-Python Search Library” that appears to be a powerful open source solution to satisfy you search woes.
The article states:
“Whoosh is a fast, featureful full-text indexing and searching library implemented in pure Python. Programmers can use it to easily add search functionality to their applications and websites. Every part of how Whoosh works can be extended or replaced to meet your needs exactly.”
What can Whoosh do? It has fielded indexing, fast indexing and retrieval, a powerful query language, the only production quality pure Python spell-checker, pluggable scoring algorithm, and a Pythonic API. Whoosh was built to handle situations where the programmer needs to avoid creating native libraries, make a research platform, provides one deeply-integrated search solution, and has an easy-to-use interface.
Whoosh started out as a search solution for proprietary software. Matt Chaput designed it for Side Effects Software Inc.’s animation software Houdini. Side Effects Software allowed Chaput to release the library to the open source community and many Python programmers probably consider it an early Christmas gift.
Whitney Grace, December 06, 2013
Sponsored by ArnoldIT.com, developer of Augmentext
Search Innovations: Revisiting the Past
December 5, 2013
I printed out an article from ReadWrite Web five or six years ago. The story was “Top 17 Search Innovations Outside of Google.” I suggest that anyone tracking Yahoo’s decision to jump back into search or struggling with the dearth of Web search options may want to read this article. I think the list, prepared in May 2007, is a useful reminder of the lack of progress in search.
Let me highlight five of the innovations. These are “breakthroughs” that various search vendors and satraps have explained as the “next big thing.” Well, maybe.
- Personalization. The idea that the user does not see a list of results that are believed to be objective and relevant to the query is fascinating. When vendors filter information, vendors control the information agenda. Quite an innovation. I thought something similar happened in other spheres of interest years ago.
- Algorithm improvement. I like the idea that search has broken free of the algorithms that have been in use since the early days of SDC, SMART, and STAIRS. If the “improvement” erodes precision and recall, is that a good thing? If “improvement” means computational efficiency to reduce costs, is that a better thing?
- Parametric search. Yep, structured query language queries. What’s new? The fact that fewer professionals want to hassle with figuring out a query is fresher than the method itself.
- Semantic search. Does a user understand the upside and downside of semantic search? Do marketers? Oh, yeah.
- Results visualization. Hollywood style outputs have helped Palantir raise lots of money. Does a user know what a visualization “means”? Not too often.
The point is that the ReadWrite list makes clear that no significant progress in search has been made in the last five or six years. Am I missing progress?
To get some details about the dead end for search and content processing, check out the vendor case studies at www.xenky.com/vendor-profiles. The similarity among systems, features, and methods is interesting.
Stephen E Arnold, December 5, 2013
TEMIS Gets Another Client
December 5, 2013
Good news for TEMIS, everyone! According to The Sacramento Bee in the article, “OECD Chooses TEMIS To Semantically Structure Its Knowledge And Information Management Process,” TEMIS has a new and very big client. The OECD stands for the Organization for Economic Cooperation and Development and they have selected TEMIS’s semantic content enrichment solution Luxid. OECD has started a new project called the Knowledge and Information Management (KIM) Program to create framework for managing and delivering information as well as improving its accessibility and presentation. The OECD collects and analyzes data for over thirty-four member governments and over one hundred countries to help them sustain economic growth, boost employment, and raise the standard of living. The KIM Program will be a portal for the organization’s information and will hopefully increase findability and search.
What will TEMIS do? the article explains:
“In this context, the OECD has chosen TEMIS’s flagship Luxid® Content Enrichment Platform to address all Semantic Enrichment stages of the KIM framework. Luxid® will help OECD to consistently enrich document metadata in alignment with its taxonomies and ontologies, providing a genuinely semantic integration layer across heterogeneous document storage and content management components. This semantic layer will both enable new search and browsing methods and improved relevance and accuracy of search results, as well as progressively build an integrated map of OECD knowledge.”
Glad to see that enriched search and findability are not dead yet. Metadata still has its place, folks. How else will the big data people be able to find their new insights if metadata is not used?
Whitney Grace, December 05, 2013
Sponsored by ArnoldIT.com, developer of Augmentext
Tips to Customize Your Bing Start Search on Windows 8.1
December 5, 2013
The story on LifeHacker titled How to Configure or Disable Bing Web Search in Windows 8.1 explains a step-by step process to shut down or adjust Bing Search. The article responds to some complaints about Windows 8.1 search being slow and frustrating. Windows latest version is set up so that any search from the Start screen will yield web results.
The article explains:
“You can either turn off Bing search completely, or simply tweak settings like whether to give personalized results using your location or turning off safe search. To do any of these things, here’s where to go: Open the charms menu (place your cursor in the top right or bottom right corner) and select “Settings.” Click “Change PC settings.” Click “Search and apps.” Click “Search” in the side bar if it’s not already selected. Disable or change any options you choose.”
Some may find the search option useful, and time-saving, but for others the web search option is unnecessary. Depending on the connection speed, this might be a very frustrating option. For a more thorough tutorial laying out all of the options for customizing your Start search read How to Customize or Disable Search with Bing in Windows 8.1.
Chelsea Kerwin, December 05, 2013
Sponsored by ArnoldIT.com, developer of Augmentext
Yahoo and Search: Innovation or PR?
December 4, 2013
I read “You Are the Query: Yahoo’s Bold Quest to Reinvent Search.” The write up explains that “search” is important to Yahoo. The buzzwords personalization and categorization make an appearance. There is no definition of “search.” So the story suggests that the new direction may be a “feed”, a stream of information. The passage I noted is:
So what is Yahoo building? To wit, the company is working on a new “personalization platform,” according to the LinkedIn profile of one Yahoo senior director. Cris Luiz Pierry, the director who headed up Yahoo’s now-shuttered Flipboard clone Livestand, writes that he is heading up a “stealth project,” and that he is “building the best content discovery and recommendation engine on the Web, across all of our regions.” Pierry also has an in-the-weeds search background, with experience in core Web search, ranking algorithms, and e-commerce software — which may come in handy when dealing with monetization.
A stealth search project. Didn’t Fulcrum Technologies operate in this way between 1983 and its run up to a much needed initial public offering in the early 1990s? Wasn’t the newcomer SRCH2 in stealth mode earlier in 2013?
The hook to the new approach may be nestled within this comment in the article:
That search experience would likely be layered on top of another company’s Web crawler, like Microsoft’s Bing, which took over those operations for Yahoo in 2010, as part of a 10-year deal. (More on that later.) Beginning in 2008.
Indexing the Web is an expensive proposition. No commercial publisher can afford it. Google is able to pull it off via its Yahoo-inspired ad model. Yandex is struggling to find monetization methods that allow it to keep its indexes fresh. But other Web indexers have had to cut back on coverage. Exalead’s Web index is thin gruel. Blekko has lost its usefulness for me. In fact, looking for information is now more difficult that it has been for a number of years.
Another interesting comment in the article jumped off the screen for me; to wit:
We firmly believe that the Search Product of tomorrow will not be anything alike [sic] the product that we are used to today,” says the job description for the search architect. The posting also name-checks Search Direct, Yahoo’s version of Google Instant, as the “first step” in changing the landscape of search. After testing out a few queries on Yahoo’s home page, the feature, which looks up queries without requiring the user to hit “search,” looks to be dormant.
The write up concludes with this speculative paragraph:
Some theories: The company could be planning a Bing exit strategy for 2015 or earlier, and look to partner with another Web crawler, aka Google. Some reports have said Mayer has been cozying up to her former company on that front. Or Yahoo could be rebuilding its own core search capabilities, though that’s the unlikeliest of scenarios because that would be a nightmare for the company’s margins. Or Yahoo could even be beefing up its team just enough to gain more authority within the Bing partnership, in case it wanted to advise Bing on what to do on the back end.
What I find interesting is that the term “search” is not really defined in this write up or most of the information I see that address findability. I am not sure what “search” means for Yahoo. The company has a history of listing sites by categories. Then the company indexed Web sites. Then the company used other vendors’ results. What’s next? I am not sure.
Observations? I have a few:
First, anyone looking for specific information has a tough job on their hands today. In a conversation with two experts in information retrieval, both mentioned that finding historical information via Web search systems was getting more difficult.
Second, queries run by different researchers return different results. The notion of comparative searching is tricky.
Third, with library funding shrinking, access to commercial databases is dwindling. For example, in Kentucky, patrons cannot locate a company news release from the 1980s using public library services.
The article about Yahoo is less about search and more about public relations. Is Yahoo or any vendor able to do something “new” in search? Without defining the term “search,” does it matter to the current generation of experts?
Personally I don’t want to influence a query. I want to locate information that is germane to a query that I craft and submit to an information retrieval system. Then I want to review results lists for relevant content and I want to read that information, analyze the high value information, synthesize it, and move on about my business.
I want to control the query. I don’t want personalization, feeds, or predictive analytics clouding the process. Does “search” mean thinking or taking what a company wants to provide to advance its own agenda?
Stephen E Arnold, December 4, 2013
Bing Continues Making Changes to Shopping Search Experience
December 4, 2013
An article titled Bing Sunsets Shopping Search, Integrates Directly Into Web Results on Search Engine Watch offers some insights into Bing’s attempts to improve its shopping experience. Bing announced in August that they are working to improve shopping results and more recently that they are retiring the “dedicated shopping experience” in favor of a user intent model.
The article explains:
“Using Bing Snapshot technology, certain search queries will return snapshots of various products in the right side column. Clicking on these products will produce a different result set of vendor sites that sell that particular product. Those results will also contain a carousel of similar products or models directly under the search box. Reviews, product specs are also included as snapshot information in the sidebar, as are prices from various vendors who purchase Bing ads.”
Bing is working to gain on Amazon, the company to beat worldwide when it comes to online shopping. Bing’s user intent plan is shaped around logical connections between queries and product comparisons. Bing is trying to move away from keywords and toward understanding what the user really wants. The integration of shopping results into the main experience is meant to provide for an improved proficiency.
Chelsea Kerwin, December 04, 2013
Sponsored by ArnoldIT.com, developer of Augmentext
ThisPlusThat for Smarter Searches
December 2, 2013
Leave it to an astrophysicist to make search smarter. One of the fellows over at the Insight Data Science Fellows Program, Christopher Moody, describes how his search engine uses vector words to produce more accurate search results in, “ThisPlusThat.me: a Search Engine that Lets You ‘Add’ Words as Vectors.” The scientist says he was inspired by the possibilities presented by Google’s new vectoring algorithm, word2vec. He explains:
“What [Google] doesn’t do is understand the relationships between words and understand the similarities or dissimilarities. That’s where ThisPlusThat.me comes in–a search site I built to experiment with the word2vec algorithm recently released by Google. word2vec allows you to add and subtract concepts as if they were vectors, and get out sensible, and interesting results. I applied it to the Wikipedia corpus, and in doing so, tried creating an interactive search site that would allow users to put word2vec through its paces.”
Moody supplies several examples of his project in action. The first and most elementary: querying “King – Man + Woman” leads to “Queen.” Since the algorithm was trained using Wikipedia‘s vast collection of data, Moody explains, it has “a pretty good grasp of not only common words like ‘smart’ or ‘American’ but also loads of human concepts and real world objects, allowing us to manipulate proper nouns.” You can try ThisPlusThat.me for yourself here.
Moody explains how he approached word2vec’s huge dimensional vector table using Hadoop‘s Map functions. To speed computation, he tried a number of tools: NumPy, Cython, Numba, and Numexpr. Near the end of the article, Moody shares links to his code and notebook experiments. The write-up is worth a look for anyone interested in the development of natural language algorithms.
Cynthia Murrell, December 02, 2013
Sponsored by ArnoldIT.com, developer of Augmentext