Facebook, Search, and the Real World

February 16, 2013

I think there is or was a television program about the “real world.” I am hazy on this, but I perceive “reality television” as a semi scripted, low cost way to fill the gaping maw of 24×7 programming at a bargain basement price. In fact, when anyone suggests that something is “real” I take a second look. This applies to “real stories”, “real life examples”, and “real consulting insights”. In today’s world, the notion of “real” is slippery. I think of Plato, Hollywood special effects, and marketing baloney.

I read “You’re Not Gonna Like It: Facebook’s New Search Struggles with the Real World.” The title caught my attention because of its use of the familiar “you,” the word “gonna”, the inclusion of “search”, and the phrase “real world.” In a horse race there is a big payday from picking win, place, and show. Here the headline snags the top four spots in the social media World Cup.

The article points out that some of the features of Facebook search need to be rethought. That’s a fair statement. The product is a beta and represents the first somewhat edible fruit of the marriage between the Facebook crowd, the injected Googlers, and the post IPO attention of the kind and loving stakeholders.

Facebook has to produce revenues, keep its costs under control, and cope with a number of darned exciting issues. These include the mandatory registration Google has slapped on Google Plus and the awareness by some Facebookers that there may be something else to do with the time invested in posting information about one’s comings and goings.

Here’s the passage I noted:

Facebook launched Graph Search at a big press event at its Menlo Park, CA headquarters almost exactly one month ago. CEO Mark Zuckerberg delivered a large part of the event keynote himself, highlighting the feature as one of three “pillars of Facebook” alongside the News Feed and Timeline. Graph Search is supposed to help you gather friends for a Twin Peaks marathon, find photos taken in London on your last trip, and see which sushi places are most popular among your friends. After a month of testing Graph Search, I’ve found that it’s fantastic at finding people and photos, but not so good at finding anything else.

Is any search system able to do more than one or two things well? Google does the ad thing. Lexis does the legal laundry list thing. Chemical Abstracts does the structure thing. Sure, these systems purport to provide more functions than a bucket of Swiss Army knives.

But the reality of search and information retrieval is that each system has a strength. Each system has gaps, blind spots, and stuff that just does not work as the users expect.

The write up identifies some of Facebook’s notable gaps; for example, dirty data. Don’t most Facebook users perceive content in Facebook is as accurate?

Net net: Facebook social search is a beta. What changes are coming? Wait and see.

Stephen E Arnold, February 16, 2013

Solr Unleashed Offered by LucidWorks

February 15, 2013

LucidWorks is a company offering commercial support, consulting, training, and value-added software to the open source Apache Lucene and Solr technologies. LucidWorks not only builds upon trusted open source technologies, it supports open source technology by employing committers. They also offer professional training on the open source components, even for those who are not interested in their LucidWorks Search or LucidWorks Big Data solutions. One such training opportunity is Solr Unleashed.

Read about upcoming classes:

“Having consulted with clients on Lucene and Solr for the better part of a decade, we’ve seen the same mistakes made over and over again: applications built on shaky foundations, stretched to the breaking point. In this two day class, learn from the experts about how to do it right and make sure your apps are rock solid, scalable, and produce relevant results. Also check the course outline.”

Register early for a discount on the two-day class. Opportunities are available stateside, as well as in Europe. Developers are the primary audience for the sessions, but system administrators can benefit as well. For more opportunities and to stay in the loop, contact the LucidWorks University team.

Emily Rae Aldridge, February 15, 2013

Sponsored by ArnoldIT.com, developer of Beyond Search

Sinequa France: Update 2013

February 14, 2013

My research team was winnowing our archive of information about European search vendors. Since Martin White’s article for eContent in 2011, a number of changes have swept through the search and content processing sector. Some changes were significant; for example, HP’s stunning acquisition of Autonomy. Others were more modest; for example, the steady progress of such companies as Sinequa and Spotter, among others.

The European technical grip on search is getting stronger. Google is the dominant player in Web search. But in enterprise content processing, some European firms are moving more rapidly than their North American or Pacific Rim counterparts.

image

The Sinequa tag cloud. See http://www.sinequa.com/en/page/solutions/category-1.aspx

One interesting example is Sinequa, based in Paris. The company, like other French technology firms, has a staff of capable engineers and managers. However, unlike some other companies, Sinequa has continued to establish a track record as a company innovating in technology and capturing some important accounts; for example, Siemens, the German industrial powerhouse.

Sinequa’s approach is to emphasize that enterprise search has moved to unified information access. A number of companies make similar claims. Sinequa has established that its technology can deliver the type of one-stop access to structured and unstructured content that almost every vendor claims to deliver. You can get a useful overview of the architecture of the Sinequa platform at http://www.sinequa.com/en/page/product/product.aspx.

A relatively recent addition to the Sinequa.com Web site are case analysis videos. I find case examples extremely useful. The presentation of this type of information in rich media format makes it easier for me to get a sense of the value of the solution a vendor delivers. I found the Mercer video particularly interesting. You can find these testimonials at http://www.sinequa.com/en/page/clients/clients-video.aspx.

The trajectory of European search, content processing, and analytics vendors is difficult to plot in today’s uncertain economic climate. Sinequa warrants a close look for organizations seeking an integrated approach to its content assets. For more information about Sinequa’s current activities, tap into the firm’s blog at http://blog.sinequa.com/

Stephen E Arnold, February 14, 2013

Sponsored by EMRxNow, the information service which tracks automated indexing of electronic medical records

Behind the Scenes at DuckDuckGo

February 14, 2013

High Scalability gives us an in-depth look at the burgeoning DuckDuckGo derived from an interview with the site’s founder, Gabriel Weinberg, in “DuckDuckGo Architecture—1 Million Deep Searches a Day and Growing.” Writer Todd Hoff notes that the Duck is proudly famous for (or famously proud of) refusing to collect data on their users. Though it is understandable that Weinberg emphasizes that popular stance, Hoff is more interested in the mechanics behind the service. He writes:

“What I found most compelling is DDG’s strong vision of a crowdsourced network of plugins giving broader search coverage by tying an army of vertical data suppliers into their search framework. For example, there’s a specialized Lego plugin for searching against a complete Lego database. Use the name of a spice in your search query, for example, and DDG will recognize it and may trigger a deeper search against a highly tuned recipe database. Many different plugins can be triggered on each search and it’s all handled in real-time.

“Can’t searching the Open Web provide all this data? No really. This is structured data with semantics. Not an HTML page. You need a search engine that’s capable of categorizing, mapping, merging, filtering, prioritizing, searching, formatting, and disambiguating richer data sets and you can’t do that with a keyword search.”

He’s right. I do turn to DuckDuckGo for such deep searches, but I often go back to Google if I need a broader one. It is good to have a variety of tools. All else being equal, I do prefer the Duck’s privacy policy.

That bragging point, however, comes at a cost. Like other Web search engines, DuckDuckGo is ad-supported, but their key policy makes it impossible to take advantage of the most lucrative source of revenue—the targeted ad. Our view is that 2013 is about revenue, not about bits and bytes, or about popularity. We hope our fellow water-fowl makes it through okay.

Do check out Hoff’s article if you are interested in the mechanics behind DuckDuckGo. It is chock-full of detailed information.

Cynthia Murrell, February 14, 2013

Sponsored by ArnoldIT.com, developer of Augmentext

SOLR Relevancy Tuning from Search Technologies

February 11, 2013

Search Technologies introduced “Solr Lucene Relevancy Tuning.” Search Technologies will supply services to improve the relevancy of results within an existing Solr/Lucene implementation. If the service works as advertised, this could be a boon to many organizations awash with extraneous data. The announcement explains:

This engagement will provide powerful relevancy ranking improvements in an existing Solr installation. This includes setting up a basic system for relevancy evaluation, based on a set of sample queries, so that improvements can be quantitatively measured. Additions to the default relevancy formula in Solr Lucene can dramatically improve search results, solving many of the most thorny relevancy problems including:

  • Reducing the impact of peripheral content (sidebars, ads, tangential discussions, etc.)
  • Automatically handling word phrases in a flexible manner, reducing the need to use complex query constructions to obtain good search results.”

The Search Technologies’ solution changes the default Solr/Lucene functionality, which can overemphasize document size and term frequency. Search Technologies’ new Parameterized Document Similarity Function provides more control over these formulas through configurable parameters. The company’s Gradient Proximity Boost operator eliminates the need to tweak Solr/Lucene’s default “hard window,” the term-proximity parameters which can trigger a document boost. The method does this by measuring the density and completeness of terms across each document, gradually boosting documents in which terms cluster.

The post identifies the expected engagement tasks and deliverables associated with this software. The only pre-requisite listed is the presence of a working Solr /Lucene system with already-indexed documents. The firm promises ongoing maintenance and support services, including an optional round-the-clock support package.

Founded in 2005, Search Technologies bills themselves as the largest (independent) IT services company dedicated to search-engine implementation, consulting, and managed services. Staffed with veterans of the search field, the company prides itself on innovation. Search Technologies is headquartered in Herndon, Virginia, and maintains two other U.S. offices as well as locations in Berkshire, U.K., and San Jose, Costa Rica.

Ken Toth, February 11, 2013

Sponsored by ArnoldIT.com, developer of Beyond Search

Yahoo Back On Search

February 11, 2013

Before Google came into the spotlight, Yahoo used to have a series of commercials where its subjects were put in hilarious situations they wanted to get out of. By using Yahoo search, they were able to find a solution. At the end of every commercial a yodeler yodeled “Ya-ho-oo!” Everybody was “yahooing” and everyone thought Yahoo was number one. They were wrong. Computer World reports that Yahoo wants to snatch the crown, “Yahoo To Focus On Search—And Google.”

Marissa Mayer the Yahoo CEO plans on taking on Google in Internet search. She became the CEO after a successful career at Google, but Yahoo pulled her in to save its floundering tail. Mayer more than anyone else, knows what it means to take on the search giant. Yahoo needs to do something very new and very bold to have the smallest glimmer of hope in competing. Mayer will focus on building technology to improve search results and to extend the reach to desktop/mobile device users.

“’There’s a lot more potential here,’ Mayer said. ‘Overall, search is a key area of investment for us. All the innovations in search are going to happen at the user interface level going forward. We need to invest in those features, both for desktop and mobile [devices]. I think both ultimately will be key plays for us.’”

The new strategy does not call for the end of the Yahoo/Microsoft partnership, Mayer instead hopes Bing will help Yahoo. In 2010, Yahoo ditched its own search engine for Bing. In order to even make a dent in the market, Yahoo needs to grasp onto something that Google misses. Yahoo stinks and needs help. A former Googler is pulled into help. Talk about knowing thy enemy.

Whitney Grace, February 11, 2013

Sponsored by ArnoldIT.com, developer of Beyond Search

From Jeopardy to Cancer Treatment: An IBM Story

February 10, 2013

I read “IBM Supercomputer Watson to Help in Cancer Treatment.” I am burned out on the assertions of search, content processing, and analytics vendors. The algorithms predict, deliver actionable information, and answer tough questions. Okay, I will just believe these statements. Most of the folks with whom I interact either believe these statements or do not really care.

Watson, as you may know, takes open source goodness, layers on a knowledge base, and wraps the confection in layers of smart software. I am simplifying, but the reality is irrelevant given the marketing need.

Here’s the passage I noted:

A year ago, a team at Memorial Sloan-Kettering started working with an IBM and a WellPoint team to train Watson to help doctors choose therapies for breast and lung cancer patients. They continue to share their knowledge and expertise in oncology and information technology, beginning with hundreds of lung cancers, the aim being to help Watson learn as much as possible about cancer care and how oncologists use medical data, as well as their experiences in personalized cancer therapies. During this period, doctors and technology experts have spent thousands of hours helping Watson learn how to process, analyze and interpret the meaning of sophisticated clinical data using natural language processing; the aim being to achieve better health care quality and efficiency.

There you go. For the dozens of companies working to create next generation information retrieval systems which are affordable, actually work, and can be deployed without legions of engineers—game over. IBM Watson has won the search battle. Now for the optimists who continue to pump money into decade old search companies which have modest revenue growth, kiss those bucks goodbye. For the PhD students working on the revolutionary system which promises to transform findability, get a job at Kentucky Fried Chicken. And Google? Well, IBM knows your limits so stick to selling ads.

IBM is doing it all:

Manoj Saxena, IBM General Manager, Watson Solutions, said:

“IBM’s work with WellPoint and Memorial Sloan-Kettering Cancer Center represents a landmark collaboration in how technology and evidence based medicine can transform the way in which health care is practiced. breakthrough capabilities bring forward the first in a series of Watson-based technologies, which exemplifies the value of applying big data and analytics and cognitive computing to tackle the industry’s most pressing challenges.”

How different is Watson from the HP Autonomy, Recommind, or even the DR LINK technology? Well, maybe the open source angle is the same. But IBM needs to do more than make assertions and buy analytics companies as the company recycles open source technology in my opinion. I thought IBM was a consulting firm? Here I am wrong again. Watson probably “knew” that after hours of training, tuning, and talking. But in the back of my mind, I ask, “What if those training data are inapplicable to the problem at hand? What if the journal articles are fiddled by tenure seekers or even pharmaceutical outfits or institutions trying to maximize insurance payouts or careless record keeping by medical staff? Nah, irrelevant questions. IBM has this smart system nailed. Search solved. What’s next IBM?

Stephen E Arnold, February 10, 2013

Google: Objective Indexing and a Possible Weak Spot

February 6, 2013

A reader sent me a link to “Manipulating Google Scholar Citations and Google Scholar Metrics: Simple, Easy, and Tempting.” I am not sure how easy and tempting the process of getting a fake scholarly paper into the Google index is, but the information provided is food for thought. Worth a look, particularly if you are a fan of traditional methods for building a corpus and delivering on point results which the researcher can trust. The notion of “ethics” is an interesting additional to a paper which focuses on fake or misleading research.

Stephen E Arnold, February 7, 2013

Independent News in Eastern Europe Bolstered by Solr

February 6, 2013

Independent news agencies have a hard time escaping the tight grasp of government in restricted countries. In the nation of Georgia, the non-profit Sourcefabric has developed Newscoop based on open source software. Open source is not only contributing to profitable business, but to political and ideological freedom. CMS Wire has a full story in, “Newscoop CMS 4.1 Integrates Solr for Search, GeoLocation Tools.”

But our interest here is in the technology, and how Newscoop has been boosted by the power of Solr. The article states:

“The enhanced search functionality in 4.1 is made possible by an integration with Solr, an open source search project out of the Apache Lucene effort, and it is designed to facilitate the ability of site visitors to find relevant content on the news site or in connected blogs. Solr features full-text search, hit highlighting, database integration, auto-suggestion and advanced ranking.”

Many software and enterprise solutions also find their strength in the solid base of Apache Lucene and Solr, the two most trusted names in the Apache open source community. One such solution is LucidWorks. While LucidWorks’ ultimate aim is in efficient enterprise search, its commonality with Newscoop is its sturdy and reliable infrastructure.

Emily Rae Aldridge, February 6, 2013

Sponsored by ArnoldIT.com, developer of Beyond Search

Repercussions of Facebook Graph Search

February 6, 2013

As with the arrival of most new things, no one is quite sure what the results of Facebook’s venture into search will be. Forbes investigates the possibilities in, “Facebook Graph Search is a Disruptive Minefield of Unintended Consequences.” It is good to see we are not the only ones who think this development could shake up the search terrain.

Journalist Anthony Wing Kosner begins by noting that Graph Search is not something users have requested, but rather a marketing initiative. For the feature to work, users will have to help by continuing to populate Facebook with data in the form of likes, check-ins, photos, and profile info. Somehow, I don’t think that’s a big hurdle, even if some users do get spooked by the very real search-related privacy concerns. More tricky, perhaps, is convincing users they want to narrow their searches from the World Wide Web to their own Facebook network.

Kosner writes:

“I think Graph Search is indeed important, but the results of Facebook’s search for increased relevance may be both more and less than it intends. Its users may find the utility of searching their own social graph to be hit-or-miss, but they also may find themselves feeling much more exposed in the searches of others than they ever intended to be. Rather than phrase this negatively, however, I want to try to identify the potentially explosive issues, land mines if you will, that Facebook will encounter in its path to build out its third pillar and suggest what it needs to do to avoid or diffuse them.”

Not surprisingly, the main suggestion is to make it easier for users to protect their privacy. The current process can be cumbersome, and not even a Zuckerberg can be certain the results will be as expected. With Graph Search in particular, the inability of algorithms to understand irony or a love of randomness, both hallmarks of today’s youth culture, can result in acute misrepresentation of someone’s views. Sometimes this could simply be amusing, but other times, it could cause real damage. And you might never know.

If you are concerned about these issues (and if you or someone you love uses Facebook, you should be), check out this detailed article. I suppose we will just have to wait and see where the chips fall, while helping spread the word—be careful out there.

Cynthia Murrell, February 06, 2013

Sponsored by ArnoldIT.com, developer of Augmentext

« Previous PageNext Page »

  • Archives

  • Recent Posts

  • Meta