GIF Images Get Easy Button Thanks to Google
April 3, 2013
Is there anything Google isn’t affiliated with these days? I didn’t think so. Wired reports in its article on “Google Image Search: Now With More GIF Action” that information powerhouse is now turning its sights on graphics interchange format(GIF).
“On Tuesday, Google announced via Google+ that Image Search now has an “Animated” filter. That means that if you’re only searching for animated magic, you need never be bothered with a still image again. Finally that search for Jennifer Lawrence GIFs from the Academy Awards just got a whole lot easier.”
GIF’s have been around since 1987 and have become the go to for short animations on the Web . The feature is still being worked out but for now when you search an image in Google Images, you can select the drop down menu in the Search Tools category and simply click on animations.
It doesn’t seem like a significant change to the Google lineup but it does have a consumer first approach to the addition. If Google is the only place you can filter your content to find the exact information you want, well, Google then becomes the go-to.
Leslie Radcliff, April 3, 2013
Sponsored by ArnoldIT.com, developer of Augmentext
Medical Search Engine Identifies Rare Diseases
April 3, 2013
Specialized search engines are often used to located subject-specific professional or academic information in databases. Certain professions are used to shying away from the open web for fear of retrieving poor quality information. However, a new project is proving that quality medical information can be retrieved from the open web. Read more about FindZebra.com in the article, “New Medical Search Engine Quickly IDs Rare Diseases.”
The article states:
“In medical school, students are taught to concentrate on more common diseases, not ‘zebras’–slang for a surprising diagnosis. Now, the zebras have taken to the web at FindZebra.com, a new search engine for medical professionals which navigates the web quickly to identify rare and genetic diseases. Researchers . . . sought out to assess how well web search engines, such as Google, work for diagnostic queries, and what contributes to web research success or failure. The results determined that FindZebra outperformed Google Search. The authors concluded that a specialized search engine can improve online diagnostic quality without a loss of ease of use that popular search engines possess.”
It seems that quality results can be retrieved easily. This is the ultimate aim of search, quick, effective, and easy. LucidWorks aims to achieve the same goal in the much more difficult environment of enterprise search. Their expertise combines solid open source infrastructure, built on Lucene/Solr, with award winning customer support.
Emily Rae Aldridge, April 3, 2013
Sponsored by ArnoldIT.com, developer of Beyond Search
Specialized Search Engine Helps Diagnose Rare Diseases
April 3, 2013
A recent piece from the MIT Technology Review that examines “The Rare Disease Search Engine That Outperforms Google” compares apples with oranges. The real takeaway is much bigger than a swipe at Google—that technical innovation is being used to help humanity.
Rare diseases are notoriously difficult to diagnose, and medical professionals have been using an Internet search engine, usually Google, to help with the process for years. Of course, Google was not designed for that use, so researchers have created a tailor-made engine to streamline this difficult but essential task. The article informs us:
“Radu Dragusin at the Technical University of Denmark and a few pals unveil an alternative. These guys have set up a bespoke search engine dedicated to the diagnosis of rare diseases called FindZebra, a name based on the common medical slang [“zebra”]for a rare disease. After comparing the results from this engine against the same searches on Google, they show that it is significantly better at returning relevant results.”
Is this supposed to be a surprise? Google does ads, not rare diseases. Ah well, the important thing is that doctors have a powerful new tool to help folks with diseases that stoutly defy accurate identification. How did the team from the Technical University of Denmark do it? The write-up goes on to say:
“The magic sauce in FindZebra is the index it uses to hunt for results. These guys have created this index by crawling a specially selected set of curated databases on rare diseases. . . . They then use the open source information retrieval tool Indri to search this index via a website with a conventional search engine interface. The result is FindZebra.”
Though the zebra engine is still an in-progress research project, the team has made it publically available at www.findzebra.com. Medical professionals can already use the innovation to help patients who might otherwise be doomed to years of painful frustration. Hooray, progress!
Cynthia Murrell, April 03, 2013
Sponsored by ArnoldIT.com, developer of Augmentext
Ghost Sites Generate Traffic But Not Much Else
April 1, 2013
Due to a low rate of turnover and clickthroughs and the often unreliable world of ad exchange and technology that promises sky high viewership that they just can not deliver on, the rise of digital tricksters is at an all time high.
You’ve heard of a ghost writer, well, “Meet the Most Suspect Publishers On the Web: The Rise of Ghost Sites, Where Traffic is Huge but People are Few.”
“Increasingly, digital agencies and buy-side technology firms are seeing massive traffic and audience spikes from groups of Web publishers few people have ever heard of. These sites—billed as legitimate media properties—are built to look authentic on the surface, with generic, nonalarm-sounding content. But after digging deeper, it becomes evident that very little of these sites’ audiences are real people. Yet big name advertisers are spending millions trying to reach engaged users on these properties.”
That is right, companies like DigiMogul and Alphabird are getting advertisers to pay to leave an impression on a viewer that may or may not exist. The problem with this is you get pretty lousy search results due to the lack of actual humans hitting and working on the site. But with bots driving up traffic, those big names like BMW, Pillsbury, and JetBlue are clamoring to throw their money at the company in an effort to reach “consumers.”
Sounds a little backward to us.
Leslie Radcliff, April 3, 2013
Sponsored by ArnoldIT.com, developer of Augmentext
Soutron and EBSCO Join Forces
April 1, 2013
Could the library be a gold mind just waiting to be tapped for its financial resources? The Examiner article “Soutron and EBSCO Enter Partnership Agreement” talks about the technology partnership that Soutron Global and EBSCO forged. With this new partnership Soutron Global will begin to integrate EBSCO Discovery Services with Soutron’s Library and Knowledge Management system. This collaboration will provide clients with a single integrated search environment that they can use for research and information resources. Tony Saadat, President and CEO of Soutron Global made the following statement.
“This partnership means that libraries, knowledge management centers, and information resource portals can ensure optimal access to knowledge assets, physical resources, and digital resources, thus ensuring optimal exploitation of resources.”
EBSCO Publishing is the company behind EBSCOhost, which is a fee-based online research service. A variety of libraries including educational, medical and public use EBSCO services. EBSCO Discovery Service (EDS) provides better indexing and full-text searching than any other discovery service. Graham Beastall, Managing Director, UK hade the following to say regarding the collaboration.
“Soutron is very excited to be working with EBSCO on what we regard as a key initiative to develop access to digital and physical resources in an organization. It will allow us to offer customers using Soutron additional opportunities to maximize use of their collection through EDS single search indexing technologies. Our goal is to make life easier for end users and for library managers.”
Never really thought of library catalogs as a way to financial security but could they be the next technology gold mind. Looking at the big picture I think the answer is no. Most libraries already work on a limited budget and it’s unlikely that they will suddenly get additional funds. With their proven technology EBSCO should focus on acquiring library cataloging and services companies for an extra boost. “Might as well be all or nothing.”
April Holmes, April 01, 2013
Sponsored by ArnoldIT.com, developer of Augmentext
Promise Best Practices: Encouraging Theoretical Innovation in Search
March 29, 2013
The photo below shows the goodies I got for giving my talk at Cebit in March 2013. I was hoping for a fat honorarium, expenses, and a dinner. I got a blue bag, a pen, a notepad, a 3.72 gigabyte thumb drive, and numerous long walks. The questionable hotel in which I stayed had no shuttle. Hitchhiking looked quite dangerous. Taxis were as rare as an educated person in Harrod’s Creek, and I was in the same city as Leibnitz Universität. Despite my precarious health, I hoofed it to the venue which was eerily deserted. I think only 40 percent of the available space was used by Cebit this year. The hall in which I found myself reminded me of an abandoned subway stop in Manhattan with fewer signs.
The PPromise goodies. Stuffed in my bag were hard copies of various PPromise documents. The most bulky of these in terms of paper were also on the 3.73 Gb thumb drive. Redundancy is a virtue I think.
Finally on March 23, 2013, I got around to snapping the photo of the freebies from the PPromise session and reading a monograph with this moniker:
Promise Participative Research Laboratory for Multimedia and Multilingual Information Systems Evaluation. FP7 ICT 20094.3, Intelligent Information Management. Deliverable 2.3 Best Practices Report.
The acronym should be “PPromise,” not “Promise.” The double “P” makes searching for the group’s information much easier in my opinion.
If one takes the first letter of “Promise Participative Research Laboratory for Multimedia and Multilingual Information Systems Evaluation” one gets PPromise. I suppose the single “P” was an editorial decision. I personally like “PP” but I live in a rural backwater where my neighbors shoot squirrels with automatic weapons and some folks manufacture and drink moonshine. Some people in other places shoot knowledge blanks and talk about moonshine. That’s what makes search experts and their analyses so darned interesting.
To point out the vagaries of information retrieval, my search to a publicly accessible version of the PPromise document returned a somewhat surprising result.
A couple more queries did the trick. You can get a copy of the document without the blue bag, the pen, the notepad, the 3.72 gigabyte thumb drive, and the long walk at http://www.promise-noe.eu/documents/10156/086010bb-0d3f-46ef-946f-f0bbeef305e8.
So what’s in the Best Practices Report? Straightaway you might not know that the focus of the whole PPromise project is search and retrieval. Indexing, anyone?
Let me explain what PPromise is or was, dive into the best practices report, and then wrap up with some observations about governments in general and enterprise search in particular.
Search Evaluation in the Wild
March 26, 2013
If you are struggling with search, you may be calling your search engine optimization advisor. I responded to a query from an SEO expert who needed information about enterprise search. His clients, as I understood the question, were seeking guidance from a person with expertise in spoofing the indexing and relevance algorithms used by public Web search vendors. (The discussion appeared in the Search-Based Applications (SBA) and Enterprise Search group on LinkedIn. Note that you may need to be a member of LinkedIn to view the archived discussion.)
The whole notion of turning search into marketing has interested me for a number of year. Our modern technology environment creates a need for faux information. The idea, as Jacques Ellul pointed out in Propaganda, is that modern man needs something to fill a void.
How can search deliver easy, comfortable, and good enough results? Easy. Don’t let the user formulate a query. A happy quack to Resistance Quotes.
It, therefore, makes perfect sense that a customer who is buying relevance in a page of free Web results would expect an SEO expert to provide similar functionality for enterprise search. Not surprisingly, the notion of controlling search results based on an externality like key word stuffing or content flooding is a logical way to approach enterprise search.
Precision, recall, hard metrics about indexing time, and the other impedimenta of the traditional information retrieval expert are secondary to results. Like the metrics about Web traffic, a number is better than no number. If the number’s flaws are not understood, the number is better than nothing. In fact, the entire approach to search as marketing is based on results which are good enough. One can see the consequences of this thinking when one runs a query on Bing or on systems which permit users’ comments to influence relevancy. Vivisimo activated this type of value adding years ago and it still is a good example of trying to make search useful. A result which delivers a laundry list of results which forces the user to work through the document list and determine what is useful is gone. If a document has internal votes of excellence, that document is the “right” one. Instead of precision and recall, modern systems are delivering “good enough” results. The user sees one top hit and makes the assumption that the system has made decisions more informed.
There are some downsides to the good enough approach to search which deliver a concrete result which, like Web traffic statistics, looks so solid, so meaningful. That downside is that the user consumes information which may not be accurate, germane, or timely. In the quest for better search, good enough trumps the mentally exhausting methods of the traditional precision and recall crowd.
To get a better feel for the implications of this “good enough” line of thinking, you may find the September 2012 “deliverable” from Promise whose acronym should be spelled PPromise in my opinion, “Tutorial on Evaluation in the Wild.” The abstract for the document does not emphasize the “good enough” angle, stating:
The methodology estimates the user perception based on a wide range of criteria that cover four categories, namely indexing, document matching, the quality of the search results and the user interface of the system. The criteria are established best practices in the information retrieval domain as well as advancements for user search experience. For each criterion a test script has been defined that contains step-by-step instructions, a scoring schema and adaptations for the three PROMISE use case domains.
The idea is that by running what strike me as subjective data collection from users of systems, an organization can gain insight into the search system’s “performance” and “all aspects of his or her behavior.” (The “all” is a bit problematic to me.)
Taming Unstructured Information
March 25, 2013
Right now, as you read this, your company’s data are piling up. Scarier yet, most don’t have a way to structure all that precious information, so it goes to waste. Thankfully, clarity is on the way as we found in a recent Paradigma Labs story, “Unstructured Information Extraction: A Sample Case with a Unitex-Manager.”
The article lays out the problem:
There is a lot of information in today’s companies flowing from one computer to another like e-mails, documents, many kinds of files and, of course, the webs the employees surf through. These electronic documents probably contain part of the core knowledge of the company or, at least, very useful information which besides of being easily readable by humans is unstructured and impossible to be processes automatically using computers. The amount of unstructured information in enterprises is around 80% [1] to 85% [2] nowadays, and such a situation is a disadvantage…
This has been an elephant in the room for many preparing to start squeezing help from their data. Unstructured data can derail good intentions by making it impossible to sort out. Thankfully, there are companies with experience in structuring the unstructured and then forming useful analytic insights from this info. One of our favorites is the international firm, Sinequa who boast an incredible two-plus decades in the business.
Patrick Roland, March 25, 2013
Sponsored by ArnoldIT.com, developer of Beyond Search.
Government Initiatives and Search: A Make-Work Project or Innovation Driver?
March 25, 2013
I don’t want to pick on government funding of research into search and retrieval. My goodness, pointing out that payoffs from government funded research into information retrieval would bring down the wrath of the Greek gods. Canada, the European Community, the US government, Japan, and dozens of other nation states have poured funds into search.
In the US, a look at the projects underway at the Center for Intelligent Information Retrieval reveals a wide range of investigations. Three of the projects have National Science Foundation support: Connecting the ephemeral and archival information networks, Transforming long queries, and Mining a million scanned books. These are interesting topics and the activity is paralleled in other agencies and in other countries.
Is fundamental research into search high level busy work. Researchers are busy but the results are not having a significant impact on most users who struggle with modern systems usability, relevance, and accuracy.
In 2007 I read “Meeting of the MINDS: An Information Retrieval Research Agenda.” The report was sponsored by various US government agencies. The points made in the report were, like the University of Massachusetts’ current research run down, were excellent. The 2007 recent influences are timely six years later. The questions about commercial search engines, if anything, are unanswered. The challenges of heterogeneous data also remain. Information analysis and organization which is today associated with analytics and visualization-centric systems could be reprinted with virtually no changes. I cite one example, now 72 months young, for your consideration:
We believe the next generation of IR systems will have to provide specific tools for information transformation and user-information manipulation. Tools for information transformation in real time in response to a query will include, for example, (a) clustering of documents or document passages to identify both an information group and also the document or set of passages that is representative of the group; (b) linking retrieved items in timelines that reflect the precedence or pseudo-causal relations among related items; (c) highlighting the implicit social networks among the entities (individuals) in retrieved material;
and (d) summarizing and arranging the responses in useful rhetorical presentations, such as giving the gist of the “for” vs. the “against” arguments in a set of responses on the question of whether surgery is recommended for very early-stage breast cancer. Tools for information manipulation will include, for example, interfaces that help a person visualize and explore the information that is thematically related to the query. In general, the system will have to support the user both actively, as when the user designates a specific information transformation (e.g., an arrangement of data along a timeline), and also passively, as when the system recognizes that the user is engaged in a particular task (e.g., writing a report on a competing business). The selection of information to retrieve, the organization of results, and how the results are displayed to the user all are part of the new model of relevance.
In Europe, there are similar programs. Examples range from Europa’s sprawling ambitions to Future Internet activities. There is Promise. There are data forums, health competence initiatives, and “impact”. See, for example, Impact. I documented Japan’s activities in the 1990s in my monograph Investing in an Information Infrastructure, which is now out of print. A quick look at Japan’s economic situation and its role in search and retrieval reveals that modest progress has been made.
Stepping back, the larger question is, “What has been the direct benefit of these government initiatives in search and retrieval?”
On one hand, a number of projects and companies have been kept afloat due to the funds injected into them. In-Q-Tel has supported dozens of commercial enterprises, and most of them remain somewhat narrowly focused solution providers. Their work has been suggestive, but none has achieved the breathtaking heights of Facebook or Twitter. (Search is a tiny part of these two firms, of course, but the government funding has not had a comparable winner in my opinion.) The benefit has been employment, publications like the one cited above, and opportunities for researchers to work in a community.,
On the other hand, the fungible benefits have been modest. As the economic situation in the US, Europe, and Japan has worsened, search has not kept pace. The success story is Google, which has used search to sell advertising. I suppose that’s an innovation, but it is not one which is a result of government funding. The Autonomy, Endeca, Fast Search-type of payoff has been surprising. Money has been made by individuals, but the technology has created a number of waves. The Hewlett Packard Autonomy dust up is an example. Endeca is a unit of Oracle and is becoming more of a utility than a technology game changer. Fast Search has largely contracted and has, like Endeca, become a component.
Some observations are warranted.
First, search and retrieval is a subject of intense interest. However, the progress in information retrieval is advancing just slowly in my opinion. I think there are fundamental issues which researchers have not been able to resolve. If anything, search is more complicated today than it was when the Minds Agenda cited above was published. The question is, “Maybe search is more difficult than finding the Higgs Boson?” If so, more funding for search and retrieval investigations is needed. The problem is that the US, Europe, and Japan are operating at a deficit. Priorities must come into play.
Second, the narrow focus of research, while useful, may generate insights which affect the margins of larger information retrieval questions. For example, modern systems can be spoofed. Modern systems generate strong user antipathy more than half the time because they are too hard to use or don’t answer the user’s question. The problem is that the systems output information which is quite likely incorrect or not useful. Search may contribute to poor decisions, not improve decisions. The notion that one is better off using more traditional methods of research is something not discussed by some of the professionals engaged in inventing, studying, or selling search technology.
Third, search has fragmented into a mind boggling number of disciplines and sub-disciplines. Examples range from Coveo (a company which has ingested millions in venture funding and support from the province of Québec) which is sometimes a customer support system and sometimes a search system to Palantir (a recipient of venture funding and US government funding) which outputs charts and graphs, relegating search to a utility function.
Net net: I am not advocating the position that search is unimportant. Information retrieval is very important. One cannot perform some work today unless one can locate a specific digital item in many cases.
The point is that money is being spent, energies invested, and initiatives launched without accountability. When programs go off the rails, these programs need to be redirected or, in some cases, terminated.
What’s going on is that information about search produced in 2007 is as fresh today as it was 72 months ago. That’s not a sign of progress. That’s a sign that very little progress is evident. The government initiatives have benefits in terms of making jobs and funding some start ups. I am not sure that the benefits affect a broader base of people.
With deficit financing the new normal, I think accountability is needed. Do we need some conferences? Do we need giveaways like pens and bags? Do we need academic research projects running without oversight? Do we need to fund initiatives which generate Hollywood type outputs? Do we need more search systems which cannot detect semantically shaped or incorrect outputs?
Time for change is upon us.
Stephen E Arnold, March 25, 2013
It Is Movie Search Time
March 25, 2013
Google, Bing, and DuckDuckGo are the preliminary search engines users turn to for locating information. One of the problems, even with advanced search options, is sifting through the search results. Any search expert will tell you if the desired information is not in the first or second page of results, users move on. Does this call for a specialization in search engines? It just might for a subject as all encompassing as movies. MoreFlicks searches through the popular video streaming Web sites:Hulu, Netflix, Vudu, Fox, Crackel, and BBC iPlayer for movies and TV Shows.
It takes a page out of Google’s book by displaying basic facts about a movie or show: summary, genre, release date along with where it can be viewed online. Search results can be sorted by genre, most popular, new arrivals, and what is soon expiring. It will come in hand when you are searching for an obscure title. Downsides are that it only browses through legal channels. YouTube has been given the boot for these results. MoreFlicks is a niche search engine, possibly the lovechild of Google and IMDB, but how long it stays depends on content relevance or until Google snaps it up. Zeus eating Athena anyone?
Whitney Grace, March 25, 2013
Sponsored by ArnoldIT.com, developer of Beyond Search