Exclusive Interview: Satish Gannu, Cisco Systems Inc.

August 24, 2010

I made my way to San Jose, California, to find out about Cisco Systems and its rich media initiatives. Once I located Cisco Way, the company’s influence in the heart of Silicon Valley, I knew I would be able to connect with Satish Gannu,  a director of engineering in Cisco’s Media Experience and Analytics Business Unit.  Mr. Gannu leads the development team responsible for Cisco Pulse, a method for harnessing the collective expertise of an organization’s workforce. The idea is to apply next generation technology to the work place in order to make it quick and easy for employees to find the people and information they need to get their work done “in an instant.”

I had heard that Mr. Gannu is exploring the impact of video proliferation in the enterprise. Rich media require industrial-strength, smart network devices and software, both business sectors in which Cisco is one of the world’s leading vendors. I met with Mr. Gannu is Cisco Building 17 Cafeteria (appropriate because Mr. Gannu has worked at Cisco for 17 years). Before tackling rich media, he served as Director of Engineering in Cisco’s Security Technology Group. I did some poking around with my Overflight intelligence system and picked up signals that he is responsible for media transcoding, a technology that can bring some vendors’ network devices to their knees. Cisco’s high performance systems handle rich media. Mr. Gannu spearheads Cisco’s search and speech-to-text activities. He is giving a spotlight presentation at the October 7-8, 2010, Lucene Revolution Conference in Boston, Massachusetts. The conference is sponsored by Lucid Imagination.

cisco satish gannu

Satish Gannu, Director of Engineering, Cisco Systems Inc.

The full text of my interview with Mr. Gannu appears below:

Thanks for taking the time to talk with me?

No problem.

I think of Cisco as a vendor of sophisticated networking and infrastructure systems and software? Why is Cisco interested in search?

We set off to do the Pulse project in order to turn people’s communications in to a mechanism for finding the right people in your company. For finding people, we asked how do people communicate what they know?  People communicate what they know through documents — web page, or an email, or a Word document, or a PDF, and now, Video. Video is big for Cisco

Videos are difficult to consume or even find. The question we wanted to answer was, “Could we build a business-savvy recommendation engine. We wanted to develop a way to learn from user behavior and then recommend videos to people, not just in an organization but in other settings as well. We wanted to make videos more available for people to consume. Video is the next big thing in digital information, from You Tube coming to enterprise world.  In many ways, video represents a paradigm shift. Video content takes a a lot of storage space. We think that video is also difficult to consume, difficult to find. In search, we’ve always worked from document-based view. We are now expanding the idea of a document from text to rich media. We want to make video findable, browseable, and searchable. Obviously the network infrastructure must be up to the task. So rich media is a total indexing and search challenge.

Is there a publicly-accessible source of information about Cisco’s Pulse project?

Yes. I will email you the link and you may insert it in this interview. [Click here for the Pulse information.]

No problem. Are you using open source search technology at Cisco.

Yes, we believe a lot in the wisdom of the crowds. The idea that a community and some of the best minds can work together to develop and enhance search technology is appealing to us. We also like the principle that we should not invent something that is already available.

I know you acquired Jabber. Is it open source?

Yes, in late 2008 we purchased Cisco bought the company called Jabber. The engineers had developed a presence and messaging protocol and software. Cisco is also active in the Open Social Platform.

Would you briefly describe Open Social?

Sure. “Open Social” is a platform with a set of APIs developed by a community of social networking developers and vendors to structure and expose social data over the network, at opensocial.org. We’ve adopted Open Social to expose the social data interfaces in our product for use by our customers, leveraging both the standardization and the innovation of this process to make corporate data available within organizations in a predictable, easy-to use platform.

Why are you interested in Lucene/Solr?

We talked to multiple companies, and we decided that Lucene and Solr were the best search options. As I said, we didn’t want to reinvent the wheel.  We looked at available Lucene builds. We read the books. Then we started working with Lucid. Our hands on testing actually validated the software. We learned how mature it is. The road map for things which are coming up was important to us.

What do you mean?

Well, we had some specific ideas in mind. For example, we wanted to do certain extensions on top of basic Lucene. With the road map, open source gives us an an opportunity to do our own intellectual property on the top of Lucene/Solr.

Like video?

Yes, but I don’t want to get into too much detail. Lucene for video search is different.  With rich media sources we worry about how transcribe it, and then we have to get into how the system can implement relevancy and things like that.

One assumption we made is how people speak at a rate of two to three words per second.  So when we were doing tagging, we could calculate the length of the transcript and size of the document.

That’s helpful. What are the primary benefits of using Lucene/Solr?

One of our particular interests is figuring out how we can make it easy for people in an organization to find a person with expertise or information in a particular field. At Cisco, then, how our systems help users find people with specific expertise is core to our product.

So open source gives us the advantage of understanding what the software is doing. Then we can build on top of those capabilities., That’s how we determine what, which one to choose for.

Does the Lucene/Solr community provide useful developments?

Yes, that’s the wisdom of the crowds. In fact, the community is one of the reasons open source is thriving. In my opinion, the community is a big positive for us. In our group, we use open social too.  At Cisco, we are part of the enterprise Open Social consortium, and we play an active role in it.  We also publish an open source API.

I encourage my team be active participants in that and contribute. Many at Cisco are contributing certain extensions. We have added these on top of open social. We are giving our perspective to the community from our Pulse learnings. We are doing the same type of things for for Lucene/Solr.

My view is that if useful open source code is out there, everyone can make the best utilization of it.  And if a developer is using open source, there is the opportunity for making some enhancement on top of the existing code. It is possible to create your own intellectual property around open source too.

How has Lucid Imagination contributed to your success in working with Solr/Lucene?

We are not Lucene experts. We needed to know whether it’s possible, not possible, what are the caveats. The insight, which we got from consulting with Lucid Imagination helped open our eyes to the possibilities. That clinical knowledge is essential.

What have you learned about open source?

That’s a good question. Open source doesn’t always come for free.  We need to keep that in mind. One can get open source software. Like other software, one needs to maintain it and keep it up to date.

Where’s Lucid fit in?

Without Lucid We would have to send an email to the community, and wait for somebody to respond. Now I ping Lucid.

Can you give me an example?

Of course. If I have 20,000 users, I can have 100 million terms in one shard. If I need to scale this to 100,000 users and put up five shards, how do I handle these shards so that each is localized? What is the method for determining relevancy of hits in a result set? I get technical input from Lucid on these types of issues.

When someone asks you why you don’t use a commercial search solution, what do you tell them?

I get this question a lot. In my opinion, the commercial search systems are often in a black box. We occasionally want to have use this type of system. In fact, we do have a couple of other related products which use commercial search technologies.

But for us, analysis of context is the core. Context is what the search is about. And when you look at the code, we realized, how we use this functionality is central to our work. How we find people is one example of what we need. We need an open system. For a central function, the code cannot be a black box. Open source meets our need.

Thank you. How can a reader contact you?

My email is sgannu at cisco dot com.

Stephen E Arnold, August 24, 2010

Sponsored post

Another Business Sector Resists the Google

August 20, 2010

When it attempts to get into a new industry, Google’s reputation precedes it by long mile. Its latest conquest, the television market, is not going as smoothly as the search giant would like. A recent Broadband Reports article, “Google TV Running into Stubborn Broadcasters,” showcased the problems they are running into. The proposition seems logical, marrying the internet with broadcast and cable television to create a monolithic entertainment giant. But not everyone sees it that way. The LA Times put it best, stating: “The prospect of Google getting into television frightens many in Hollywood, who worry that Silicon Valley will upend the entertainment industry just like the Internet ravaged the music and newspaper industries.”

The world (big shock!) is not here for Google’s cherry picking. We’re going to keep an eye on this battle to see if the search behemoth backs down or just chops down the whole tree.

Pat Roland, August 20, 2010

Arnold / Oldham Podcast on Process Monitoring

August 2, 2010

Dr. Tyra Oldham, president of LAND CC, an engineering services firm, spoke with Stephen E Arnold in an ArnoldIT.com podcast about process monitoring. The topics covered included manufacturing, business, and software processes. The need for monitoring in real time is going up because the cost of a failure can be catastrophic.

tyra small

Dr. Tyra Oldham, founder and president of LAND Construct. Dr. Oldham holds an MBA with a focus on information technology management.

Dr. Oldham and Stephen Arnold discuss these ideas and touch upon the innovative software available from IGear, a company that is redefining monitoring for production and manufacturing operations. You can listen to the podcast via the ArnoldIT.com Podcast page at http://arnoldit.com/podcasts/. The program runs 15 minutes. Information about Dr. Oldham is here.

Ken Toth, August 2, 2010

Sponsored post

A Search Hum-Dinger

July 28, 2010

We all knew the day would come when we could hum an unknown melody and our computer or phone would name that tune. What is surprising is how this wondrous little gadget will populate its data. Technology Review recently broke down the frontier Tunebot is trying to conquer in the article, “Query-by-Humming Musical Search Engine Launched”. Tunebot essentially allows users to hum their song and the possible results appear, but the designers had to find a way to populate the millions of songs with a user’s inability to match key, notes, timbre and other factors. The ingenious answer, which the article considered, “an elegant solution to the problem of melody recognition,” was to have a karaoke contest where users populate all the possible hums themselves. Tunebot’s brilliant collision of search technology and online community, makes this upstart a program to watch.

Pat Roland, July 28, 2010

Freebie

Quote to Note: YouTube and Cold, Hard Cash

July 27, 2010

Here’s a quote that caught my attention. The source is “Google Exec Speaks Of YouTube, Imminent Profitability.” The Googler making the statement is allegedly Nikesh Arora, a Google executive and super smart person. Did you update your business controlled vocabulary to make Google a related term for “wizard”. I did. Anyway, here’s the alleged quote:

“YouTube is on the verge of imminent profitability.”

Okay. l think I understand. There’s the payout price for YouTube.com. The litigation. The marketing costs. The expense of making YouTube into a quality video service. Bandwidth. Staff. Yep, imminent with interest.

Stephen E Arnold, July 27, 2010

Freebie.

CMS Vendors Face Old Age, Maybe Need HGH?

July 20, 2010

Content management systems and CMS consultants are an interesting mix. On the lower digit end of the CMS spectrum are the lightweight content management systems. Four years ago, the capabilities of even the vaunted Google’s Blogger.com, which seems frozen in time to me, were like Lance Armstrong’s 2010 Tour de France.

On the end of the spectrum where the big numbers are round, the industrial strength records management systems were found. The addled goose honks about IBM, but when properly configured, IBM’s FileNet can perform some nifty CMS tricks.

So the CMS spectrum ran from the citizen journalism functions to the mad scientist mode. The consultants followed suit. I don’t recall getting spam from IBM about FileNet. Sure, IBM – like any $100 billion outfit – has its weak moments, but shoving FileNet at the addled goose has never happened. Probably won’t even happen opine I.

The reason is that when you move to the double digit end of the CMS spectrum you enter a world where a document error can shut down a nuclear power plant after a US government inspection or a really friendly CEO gets to spend time with prisoners in the “yard.” The vast majority of CMS consultants trample around in the lightweight end of the CMS market.

The problem is that the lightweight systems are now looking more sophisticated, and some venture firms and corporations are taking a hard look at these former wimps.

Don’t believe me. Navigate to “Squarespace Gets $38M to Compete With WordPress and Six Apart”. The write up calls attention to three outfits with CMS that can do interesting things and seem to be growing as my son did when he was in the third grade. Every day he needed a new pair of sneakers with the French chicken on them. Le Coq Sportif for those who are not into suburban Maryland fashions. I noted this passage in the write up:

The size of the investment that Squarespace has managed to attract from Accel and Index indicates that these investors see the potential to take the company’s software and services beyond simple blogging and into the broader world of content-management systems. Although some media companies have been experimenting with open-source software such as Drupal and Joomla for web publishing, both of these are fairly complex to manage, and a hosted solution could appeal to publishers such as the Telegraph Group, which is already using a number of cloud-based services.

Squarespace is quite interesting. The company makes it dead simple to create a blog, a photo gallery, even a complete Web site. The user can drag and drop. Sure, SquareSpace allows coders to fiddle, but the company seems to draw the line with some potentially interesting live database action from its pages. Aside from that prudent step, SquareSpace is a CMS for the person or company frustrated with a traditional CMS.

Is the SquareSpace system right for managing nuclear power plant records? Probably, but I wouldn’t use the system for that purpose. Nor would I rely on SquareSpace for information likely to be probed for effective safeguards against spoliation. For other work, SquareSpace looks mighty tasty as it is.

What will happen with $38 million? Traditional content management vendors may want to pay some attention to the fun loving folks at this outfit. Also, the CMS consultants may find themselves having to work much harder to get those high-paying, wild and crazy CMS product reviews. SquareSpace makes it dead simple to play with the system any time, for free, for a couple of weeks.

Times are a’changin’ in CMS and CMS consulting I conclude.

Stephen E Arnold, July 20, 2010

Freebie

YouTube Plays Nice with Music Copyrights

July 8, 2010

Must read despite the wooden shoes title: “Add Licensed Legal Copyright Music to YouTube Videos for Free with AudioSwap”. The write explains a new service from Google. Here’s the key passage:

AudioSwap is an initiative from YouTube, owned by Google, that allows video uploaders to automatically and easily add music to YouTube videos, for free, without cost nor payment. AudioSwap, as its name implied, also allows users to swap and change the existing copyright infringed music on the video with all rights cleared audio tracks. AudioSwap contains extensive list of songs which are provided by Friendly Music. AudioSwap is probably an effort by Google to avoid many video clips been removed due to audio tracks that violate copyright. As the royalty fees for a mainstream commercial and popular songs are astronomical, so don’t expect to find your many favorite classic love songs or hits on the free catalog.

If you want to know the nitty gritty, navigate to the original.

Stephen E Arnold, July 8, 2010

Freebie

Why Is Google Reminding Us of Its Dominance in Online Video

July 1, 2010

Fresh from its 800 score on the DMCA test, Google seems to be eager to point out how much video it pumps. Navigate to “YouTube Streams 14.6B Videos, 100 Videos Viewed per User.” The numbers are interesting, but frankly I am not sure how the Google flow compares to an outfit like Insight or Time Warner on their systems. Maybe it is apples and oranges because Insight and Time Warner use different technology and make money from their video services. According the write up YouTube accounted for 43.1 percent of all videos viewed online, comScore said June 23.” The write up include some comparative information:

When people weren’t watching videos on YouTube, they were watching them on Hulu (1.2 billion videos), Microsoft Sites (642 million videos) or Vevo (430 million videos) and YouTube court buddy Viacom Digital (347 million videos).

Perhaps the key metric for me is the amount of money Google is making from YouTube. The data in the article are similar to Amazon’s method of explaining how successful its cloud services are. Where are the revenue and cost data?

Stephen E Arnold, July 1 2010

Freebie

OpenText Nstein: Confusing Information Surfaces

June 28, 2010

Update, June 29, 2010:

Quite a flurry of comments from OpenText about this post. This citation turned up in my newsreader and I could not figure it out. In fact, I pointed out that the article was confusing and probably an error in a content management system. Nevertheless, I think that vendors of content management systems need to make certain that their date and time stamp functions are operating correctly. If a crash forces a system restore, I think it is useful to put clear date markers on restored documents. If this tagging is not applied or in some way flawed, newsreaders snap up content and happily shovel it to people like me with a current date and time stamp. My suggestion is to work with the source of the write up. I don’t do “news”; I point to sources that are available in open source. My opinions are clearly marked. In this particularly article, I point out that when glitches like this occur, competitors can point to the write up and raise questions about clarity. I reproduced the content and provided a link to the source. I did not create the 2003 gobbledegook; I just alerted my two or three readers to the issue. The problem originated with an outfit doing publishing as Asset Management Software. No date but BuddyPress, identified with the source article, might be the outfit with which OpenText wishes to speak. Or, in the language of the source article I used: “New guided navigation module: navigation NretrieverNretriever is a powerful tool for research, which brings a direct connection with the search experience for end users.” Confusing in my opinion. Also, note the date in the url, gentle reader: http://asset-management-software.bloghubpage.com/2010/06/12/asset-management-software-nstein-introduces-version-3-0-of-its-award-winning-content-management-platform-nserver-suite/. I put the date in bold.

This sure seems like a current date to me.

The point is that content management vendors deliver products that can be used to generate data that lacks useful metadata and produce pages that spiders and addled geese see as “current.” When a vendor is in the content management business, perhaps looking at the cause and not the effect are useful exercises?

Original Post: June 28, 2010 below:

Two companies that strike me as pioneers in moving beyond search are Autonomy and OpenText. I don’t want to take sides. In the last two or three years, the firms have been pursuing somewhat similar strategies. Both have pushed from search into specialized markets such as eDiscovery. Both have information retrieval technologies gathered from acquisitions. Both are no longer properly classified in my opinion and search and retrieval specialists. The companies offer a wide range of information services. Both have blown past first Microsoft Fast and then Endeca. OpenText snapped up the gasping Nstein for something like $0.65 on the dollar. Under the broad wing span of OpenText, Nstein has rolled out Version 3.0 of what it calls “its award-winning content management platform.” You can get more details in the write up “Asset Management Software: Nstein Introduces Version 3.0 of Its Award Winning Content Management Platform Nserver Suite.” Quite a title and probably good spider food. But I don’t know what Nstein is * really * delivering. Customers may not know either.

For me I found this passage quite interesting:

nStein Technologies Inc…,  a global leader in unstructured content management solutions, today announced at the annual conference of the Special Library Association (SLA) version 3.0 of its award-winning content management platform, September nserver concept nStein extraction, categorization, organization provides production began, seals and restart guided navigation modules.

Must be a glitch in the content management system.

I also noted:

  • The use of the phrase “guided navigation”. Endeca has been closely associated with facets and “guided navigation” may catch that company’s attention
  • A reference to Nretriever as a “feeding technology.” The word “Nretriever” suggests a query and a results list to me, not a feed or stream of content. Maybe the writer wanted me to think of an alert pushed to me via email?
  • A description of Ncategorizer that “includes the improvement of classification.” I am not sure if the product improves a previous Nstein system or improves the performance delivered by a competitor’s system.

The write up includes some links to information for me to read. Two links date from 2002 and 2003 and not from the post acquisition period in which I have an interest. The third link is more current but I did not see any mention of Nstein. The other links are circular; that is, pointing back to the article that caught my attention.

I am baffled. I am not sure if this is a legitimate write up about OpenText / Nstein or an error due to a flawed editorial system or process.

With promotional announcements like this one, Autonomy is almost certain to lick its chops and begin to think about taking a bite out of OpenText / Nstein’s marketing messages.

Stephen E Arnold, June 28, 2010

Freebie

Podcast Interview with Paul Doscher, Part 3: Exalead and User Experience

June 28, 2010

Exalead’s Paul Doscher talks about Exalead and user experience, sometimes shortened to “UX” on the June 28, 2010, ArnoldIT Beyond Search podcast. Exalead, now part of the large French software and services company Dassault, is entering a new phase of growth. (You can read about this tie up in “Exalead Acquired by Dassault” and “Exalead and Dassault Tie Up, Users Benefit.”

In this podcast, Mr. Doscher talks about Exalead’s technical approach to enabling licensees to use a wide range of graphical user interfaces and display conventions. The Exalead user experience approach makes it possible to support iPhone-type interfaces and presentations tailored to the needs of a particular user or workgroup.

You can listen to the podcast on the ArnoldIT.com Web site. More information about Exalead is available from www.exalead.com. The ArnoldIT podcast series extends the Search Wizards Speak series of interview beyond text into rich media. Watch this blog for announcements about other rich media programs from the professionals who move information retrieval beyond search.

Stephen E Arnold, June 28, 2010

Sponsored by Stephen E. Arnold

« Previous PageNext Page »

  • Archives

  • Recent Posts

  • Meta