Going Beyond Google for Searches
September 6, 2010
There are search engines specially tailored to provide specific information. We found an IBNlive photo gallery displaying “Top 10 search Engines beyond Google” that may not be compelling replacements for Google, but are quite interesting.
You may have heard about Answers.com, one of the most reliable Q&A websites, or Healthbase.com that delivers results for health-related queries. There are more innovative search engines like WolframAlpha.com, that makes all systematic knowledge immediately computable and accessible to everyone, or the fashion search engine EMPORA.com bringing together thousands of stores and brands. You can search your music on Guruji.com, and track major Bit Torrent websites on isoHunt.com. GazoPa performs remarkable ‘similar image search’, where you can draw instead of typing the words, and Pipl.com provides the most comprehensive people search on the web. Yummly.com is fascinating for people who love to cook and search for recipes, and Like.com boasts of being the “first true visual image search” platform, now acquired by Google.
Leena Singh, September 6, 2010
Freebie
Exclusive Podcast Interview: David Fishman
August 31, 2010
I did a follow up interview with David Fishman, Lucid Imagination’s vice president of marketing. In this 10 minute discussion, the topic is the technical plumbing of Lucene/Solr. In the podcast, Mr. Fishman describes the free converter SolrCELL, the faceting capabilities of the Lucene/Solr system, and the Carrot-2 clustering software. If you want to know how Lucene/Solr can help meet your enterprise’s search-and-retrieval needs, you will want to navigate to ArnoldIT.com and listen to the podcast. If you want to learn more about Lucid Imagination, navigate to the firm’s Web site. The Lucene Revolution Conference will be held in about six weeks. More information is available at www.lucenerevolution.com.
Stuart Schram IV, August 31, 2010
Sponsored post.
YouTube Factoids
August 28, 2010
Short honk: YouTube.com factoids designed for easy consumption. No effort required. Point your browser thingy at “Things You Didn’t Know about YouTube.” Snarf such facts as YouTube.com loses money, that NigaHiga is its most popular channel, and Google uses rickrolls.
Stephen E Arnold, August 28, 2010
Freebie
Exclusive Interview: Satish Gannu, Cisco Systems Inc.
August 24, 2010
I made my way to San Jose, California, to find out about Cisco Systems and its rich media initiatives. Once I located Cisco Way, the company’s influence in the heart of Silicon Valley, I knew I would be able to connect with Satish Gannu, a director of engineering in Cisco’s Media Experience and Analytics Business Unit. Mr. Gannu leads the development team responsible for Cisco Pulse, a method for harnessing the collective expertise of an organization’s workforce. The idea is to apply next generation technology to the work place in order to make it quick and easy for employees to find the people and information they need to get their work done “in an instant.”
I had heard that Mr. Gannu is exploring the impact of video proliferation in the enterprise. Rich media require industrial-strength, smart network devices and software, both business sectors in which Cisco is one of the world’s leading vendors. I met with Mr. Gannu is Cisco Building 17 Cafeteria (appropriate because Mr. Gannu has worked at Cisco for 17 years). Before tackling rich media, he served as Director of Engineering in Cisco’s Security Technology Group. I did some poking around with my Overflight intelligence system and picked up signals that he is responsible for media transcoding, a technology that can bring some vendors’ network devices to their knees. Cisco’s high performance systems handle rich media. Mr. Gannu spearheads Cisco’s search and speech-to-text activities. He is giving a spotlight presentation at the October 7-8, 2010, Lucene Revolution Conference in Boston, Massachusetts. The conference is sponsored by Lucid Imagination.
Satish Gannu, Director of Engineering, Cisco Systems Inc.
The full text of my interview with Mr. Gannu appears below:
Thanks for taking the time to talk with me?
No problem.
I think of Cisco as a vendor of sophisticated networking and infrastructure systems and software? Why is Cisco interested in search?
We set off to do the Pulse project in order to turn people’s communications in to a mechanism for finding the right people in your company. For finding people, we asked how do people communicate what they know? People communicate what they know through documents — web page, or an email, or a Word document, or a PDF, and now, Video. Video is big for Cisco
Videos are difficult to consume or even find. The question we wanted to answer was, “Could we build a business-savvy recommendation engine. We wanted to develop a way to learn from user behavior and then recommend videos to people, not just in an organization but in other settings as well. We wanted to make videos more available for people to consume. Video is the next big thing in digital information, from You Tube coming to enterprise world. In many ways, video represents a paradigm shift. Video content takes a a lot of storage space. We think that video is also difficult to consume, difficult to find. In search, we’ve always worked from document-based view. We are now expanding the idea of a document from text to rich media. We want to make video findable, browseable, and searchable. Obviously the network infrastructure must be up to the task. So rich media is a total indexing and search challenge.
Is there a publicly-accessible source of information about Cisco’s Pulse project?
Yes. I will email you the link and you may insert it in this interview. [Click here for the Pulse information.]
No problem. Are you using open source search technology at Cisco.
Yes, we believe a lot in the wisdom of the crowds. The idea that a community and some of the best minds can work together to develop and enhance search technology is appealing to us. We also like the principle that we should not invent something that is already available.
I know you acquired Jabber. Is it open source?
Yes, in late 2008 we purchased Cisco bought the company called Jabber. The engineers had developed a presence and messaging protocol and software. Cisco is also active in the Open Social Platform.
Would you briefly describe Open Social?
Sure. “Open Social” is a platform with a set of APIs developed by a community of social networking developers and vendors to structure and expose social data over the network, at opensocial.org. We’ve adopted Open Social to expose the social data interfaces in our product for use by our customers, leveraging both the standardization and the innovation of this process to make corporate data available within organizations in a predictable, easy-to use platform.
Why are you interested in Lucene/Solr?
We talked to multiple companies, and we decided that Lucene and Solr were the best search options. As I said, we didn’t want to reinvent the wheel. We looked at available Lucene builds. We read the books. Then we started working with Lucid. Our hands on testing actually validated the software. We learned how mature it is. The road map for things which are coming up was important to us.
What do you mean?
Well, we had some specific ideas in mind. For example, we wanted to do certain extensions on top of basic Lucene. With the road map, open source gives us an an opportunity to do our own intellectual property on the top of Lucene/Solr.
Like video?
Yes, but I don’t want to get into too much detail. Lucene for video search is different. With rich media sources we worry about how transcribe it, and then we have to get into how the system can implement relevancy and things like that.
One assumption we made is how people speak at a rate of two to three words per second. So when we were doing tagging, we could calculate the length of the transcript and size of the document.
That’s helpful. What are the primary benefits of using Lucene/Solr?
One of our particular interests is figuring out how we can make it easy for people in an organization to find a person with expertise or information in a particular field. At Cisco, then, how our systems help users find people with specific expertise is core to our product.
So open source gives us the advantage of understanding what the software is doing. Then we can build on top of those capabilities., That’s how we determine what, which one to choose for.
Does the Lucene/Solr community provide useful developments?
Yes, that’s the wisdom of the crowds. In fact, the community is one of the reasons open source is thriving. In my opinion, the community is a big positive for us. In our group, we use open social too. At Cisco, we are part of the enterprise Open Social consortium, and we play an active role in it. We also publish an open source API.
I encourage my team be active participants in that and contribute. Many at Cisco are contributing certain extensions. We have added these on top of open social. We are giving our perspective to the community from our Pulse learnings. We are doing the same type of things for for Lucene/Solr.
My view is that if useful open source code is out there, everyone can make the best utilization of it. And if a developer is using open source, there is the opportunity for making some enhancement on top of the existing code. It is possible to create your own intellectual property around open source too.
How has Lucid Imagination contributed to your success in working with Solr/Lucene?
We are not Lucene experts. We needed to know whether it’s possible, not possible, what are the caveats. The insight, which we got from consulting with Lucid Imagination helped open our eyes to the possibilities. That clinical knowledge is essential.
What have you learned about open source?
That’s a good question. Open source doesn’t always come for free. We need to keep that in mind. One can get open source software. Like other software, one needs to maintain it and keep it up to date.
Where’s Lucid fit in?
Without Lucid We would have to send an email to the community, and wait for somebody to respond. Now I ping Lucid.
Can you give me an example?
Of course. If I have 20,000 users, I can have 100 million terms in one shard. If I need to scale this to 100,000 users and put up five shards, how do I handle these shards so that each is localized? What is the method for determining relevancy of hits in a result set? I get technical input from Lucid on these types of issues.
When someone asks you why you don’t use a commercial search solution, what do you tell them?
I get this question a lot. In my opinion, the commercial search systems are often in a black box. We occasionally want to have use this type of system. In fact, we do have a couple of other related products which use commercial search technologies.
But for us, analysis of context is the core. Context is what the search is about. And when you look at the code, we realized, how we use this functionality is central to our work. How we find people is one example of what we need. We need an open system. For a central function, the code cannot be a black box. Open source meets our need.
Thank you. How can a reader contact you?
My email is sgannu at cisco dot com.
Stephen E Arnold, August 24, 2010
Sponsored post
Another Business Sector Resists the Google
August 20, 2010
When it attempts to get into a new industry, Google’s reputation precedes it by long mile. Its latest conquest, the television market, is not going as smoothly as the search giant would like. A recent Broadband Reports article, “Google TV Running into Stubborn Broadcasters,” showcased the problems they are running into. The proposition seems logical, marrying the internet with broadcast and cable television to create a monolithic entertainment giant. But not everyone sees it that way. The LA Times put it best, stating: “The prospect of Google getting into television frightens many in Hollywood, who worry that Silicon Valley will upend the entertainment industry just like the Internet ravaged the music and newspaper industries.”
The world (big shock!) is not here for Google’s cherry picking. We’re going to keep an eye on this battle to see if the search behemoth backs down or just chops down the whole tree.
Pat Roland, August 20, 2010
Arnold / Oldham Podcast on Process Monitoring
August 2, 2010
Dr. Tyra Oldham, president of LAND CC, an engineering services firm, spoke with Stephen E Arnold in an ArnoldIT.com podcast about process monitoring. The topics covered included manufacturing, business, and software processes. The need for monitoring in real time is going up because the cost of a failure can be catastrophic.
Dr. Tyra Oldham, founder and president of LAND Construct. Dr. Oldham holds an MBA with a focus on information technology management.
Dr. Oldham and Stephen Arnold discuss these ideas and touch upon the innovative software available from IGear, a company that is redefining monitoring for production and manufacturing operations. You can listen to the podcast via the ArnoldIT.com Podcast page at http://arnoldit.com/podcasts/. The program runs 15 minutes. Information about Dr. Oldham is here.
Ken Toth, August 2, 2010
Sponsored post
A Search Hum-Dinger
July 28, 2010
We all knew the day would come when we could hum an unknown melody and our computer or phone would name that tune. What is surprising is how this wondrous little gadget will populate its data. Technology Review recently broke down the frontier Tunebot is trying to conquer in the article, “Query-by-Humming Musical Search Engine Launched”. Tunebot essentially allows users to hum their song and the possible results appear, but the designers had to find a way to populate the millions of songs with a user’s inability to match key, notes, timbre and other factors. The ingenious answer, which the article considered, “an elegant solution to the problem of melody recognition,” was to have a karaoke contest where users populate all the possible hums themselves. Tunebot’s brilliant collision of search technology and online community, makes this upstart a program to watch.
Pat Roland, July 28, 2010
Freebie
Quote to Note: YouTube and Cold, Hard Cash
July 27, 2010
Here’s a quote that caught my attention. The source is “Google Exec Speaks Of YouTube, Imminent Profitability.” The Googler making the statement is allegedly Nikesh Arora, a Google executive and super smart person. Did you update your business controlled vocabulary to make Google a related term for “wizard”. I did. Anyway, here’s the alleged quote:
“YouTube is on the verge of imminent profitability.”
Okay. l think I understand. There’s the payout price for YouTube.com. The litigation. The marketing costs. The expense of making YouTube into a quality video service. Bandwidth. Staff. Yep, imminent with interest.
Stephen E Arnold, July 27, 2010
Freebie.
CMS Vendors Face Old Age, Maybe Need HGH?
July 20, 2010
Content management systems and CMS consultants are an interesting mix. On the lower digit end of the CMS spectrum are the lightweight content management systems. Four years ago, the capabilities of even the vaunted Google’s Blogger.com, which seems frozen in time to me, were like Lance Armstrong’s 2010 Tour de France.
On the end of the spectrum where the big numbers are round, the industrial strength records management systems were found. The addled goose honks about IBM, but when properly configured, IBM’s FileNet can perform some nifty CMS tricks.
So the CMS spectrum ran from the citizen journalism functions to the mad scientist mode. The consultants followed suit. I don’t recall getting spam from IBM about FileNet. Sure, IBM – like any $100 billion outfit – has its weak moments, but shoving FileNet at the addled goose has never happened. Probably won’t even happen opine I.
The reason is that when you move to the double digit end of the CMS spectrum you enter a world where a document error can shut down a nuclear power plant after a US government inspection or a really friendly CEO gets to spend time with prisoners in the “yard.” The vast majority of CMS consultants trample around in the lightweight end of the CMS market.
The problem is that the lightweight systems are now looking more sophisticated, and some venture firms and corporations are taking a hard look at these former wimps.
Don’t believe me. Navigate to “Squarespace Gets $38M to Compete With WordPress and Six Apart”. The write up calls attention to three outfits with CMS that can do interesting things and seem to be growing as my son did when he was in the third grade. Every day he needed a new pair of sneakers with the French chicken on them. Le Coq Sportif for those who are not into suburban Maryland fashions. I noted this passage in the write up:
The size of the investment that Squarespace has managed to attract from Accel and Index indicates that these investors see the potential to take the company’s software and services beyond simple blogging and into the broader world of content-management systems. Although some media companies have been experimenting with open-source software such as Drupal and Joomla for web publishing, both of these are fairly complex to manage, and a hosted solution could appeal to publishers such as the Telegraph Group, which is already using a number of cloud-based services.
Squarespace is quite interesting. The company makes it dead simple to create a blog, a photo gallery, even a complete Web site. The user can drag and drop. Sure, SquareSpace allows coders to fiddle, but the company seems to draw the line with some potentially interesting live database action from its pages. Aside from that prudent step, SquareSpace is a CMS for the person or company frustrated with a traditional CMS.
Is the SquareSpace system right for managing nuclear power plant records? Probably, but I wouldn’t use the system for that purpose. Nor would I rely on SquareSpace for information likely to be probed for effective safeguards against spoliation. For other work, SquareSpace looks mighty tasty as it is.
What will happen with $38 million? Traditional content management vendors may want to pay some attention to the fun loving folks at this outfit. Also, the CMS consultants may find themselves having to work much harder to get those high-paying, wild and crazy CMS product reviews. SquareSpace makes it dead simple to play with the system any time, for free, for a couple of weeks.
Times are a’changin’ in CMS and CMS consulting I conclude.
Stephen E Arnold, July 20, 2010
Freebie
YouTube Plays Nice with Music Copyrights
July 8, 2010
Must read despite the wooden shoes title: “Add Licensed Legal Copyright Music to YouTube Videos for Free with AudioSwap”. The write explains a new service from Google. Here’s the key passage:
AudioSwap is an initiative from YouTube, owned by Google, that allows video uploaders to automatically and easily add music to YouTube videos, for free, without cost nor payment. AudioSwap, as its name implied, also allows users to swap and change the existing copyright infringed music on the video with all rights cleared audio tracks. AudioSwap contains extensive list of songs which are provided by Friendly Music. AudioSwap is probably an effort by Google to avoid many video clips been removed due to audio tracks that violate copyright. As the royalty fees for a mainstream commercial and popular songs are astronomical, so don’t expect to find your many favorite classic love songs or hits on the free catalog.
If you want to know the nitty gritty, navigate to the original.
Stephen E Arnold, July 8, 2010
Freebie

