Interview with Rahul Agarwalla, Uchida Spectrum
April 28, 2011
Introduction
After breaking my foot and cancelling my trips for April and May 2011, I spoke with Rahul Agarwalla, one of the individuals whom I wanted to meet at the upcoming Lucene Revolution Conference. For more information about this important open source search event, click the banner at the top of the blog page or navigate to www.lucenerevolution.com.
Rahull Agarwalla, Uchida Spectrum Inc.
Rahul Agarwalla heads international business for Uchida Spectrum Inc, Japan. Previously he has built and exited two content/technology ventures including Matrix Information, the pioneer of digital content syndication in India. He has over 14 years of experience with various search technologies like Verity, Fast ESP (enterprise search platform) and Solr/Lucene.
Mr. Agarwalla is the founder of Uchida Spectrum, a key factor in Japan’s search and content processing market. USI provides SMART InSight, a search application used by many Fortune 500 companies for specialized industry applications like R&D and quality assurance for manufacturing, claims and customer management etc.
Originally SMART/InSight was based on Microsoft Fast Search technology. SMART InSight, a search application that integrates and analyzes enterprise information.The solution is used by such organizations as Canon and Moody’s. Uchida Spectrum is working with Lucid Imagination across Asia as the Strategic Alliance Partner to integrate LucidWorks Enterprise into its products and offer Lucene/Solr support services to clients.
Mr. Agarwalla’s firm has become something of a specialist in moving organizations from the ageing Fast Search & Transfer search platform to the newer open source solutions available today. I spoke with him on April 27, 2011. The full text of the interview appears below.
The Interview
What’s the principal business focus of Uchida Spectrum?
Uchida Spectrum is one the leaders in the Japan search market. It all started in 2002, when we saw opportunity at the intersection of software and information. That was the inspiration to launch the search business. Our product, SMART InSight, is a search application that integrates information from across the enterprise in easy–to-navigate cross department information chains, and adds visual summaries that add value through contextual metadata and analytics.
Instead of focusing on enterprise search as a horizontal solution, we found companies placed great value on Line of Business applications. SMART InSight is used for specific applications; for example, quality assurance, research and development, product development, claims and customer management. Those interested in our solutions are large multi-nationals across various sectors, such as electronics, automotive, chemicals, finance and engineering, as well as smaller organizations.
When did you become interested in text and content processing?
During my MBA, I won a competition for my paper on the power of information and the Internet. One of the judges, S. Swaminathan, pushed me towards the emerging internet/information sector. I went on to set up India’s first digital content syndication business where we had to deal with massive amounts of content from our more than 200 content partners. Doing this manually was just not scalable. Therefore, we embraced content and text processing technologies. We were partly inspired by the Dialog Information Service, which was once part of Lockheed. In 1998, Dialog had more content than all of the World Wide Web. I think the number I heard was 12 terabytes of data. The massive growth in information flows since then means today the challenges of extracting, normalizing and adding metadata, are common to all businesses.
Uchida has been investing in search and content processing for several years, and has recently moved from embedding Fast Search’s technology to Open Source Solr and Lucene. What’s been the big payoff from your work with Lucid Imagination?
The tie up has been very positive.
Our product, SMART InSight, uses search to integrate and retrieve information — so scalability and reliability, at reasonable cost, are critical factors. Lucene/Solr has delivered this in spades. The amount of data we can index on a server and the ability to scale in a linear fashion are unmatched. For instance, in one project we found a 10x improvement due to lower cost of ownership combined with higher performance.
As the quantity of data grows unabated, customers are extremely concerned about cost of future expansion. Working with Lucid Imagination we are able to meet such technical and business challenges and build a future proof foundation.
Are you doing research and development too?
Absolutely.
Our research and development efforts are focused on dealing with massive disparate data. MNC [Multi-National Corporations] have to deal with multiple languages, complex security rules, and different data formats and structures.
Integrating this data in a user-friendly manner involves much more than conventional content normalization. We need to understand the meaning of the information and its context. Presenting various types of information in an intuitive interface and quickly is our second challenge. Here we are looking at constantly improving our mash up and RIA [Rich Internet Application] widgets technology.
Many vendors argue that mash ups are and data fusion are “the way information retrieval will work going forward”. What’s your view of the structured – unstructured data approach?
Information fusion is fundamental to cognition. For example, if you have a stuffy nose, the food you eat can be tasteless. We are better off because we use all our senses together. Information retrieval needs to be adapted to our way of working and not the other way around.
Islands of information have come into being due to the historical approach of information technology. Search gives us a paradigm to overcome this. For instance, customer surveys, warranty claims, quality testing should all have a relationship with product and part data to analyze defects and impact. The massive automobile recalls in 2009 are symptomatic of not connecting the dots. Companies today are much better equipped for product improvement and quick failure detection to the extent they have integrated car performance data (from the car’s onboard systems) with other data sources. The same holds true when we look at voice of customer data or degree customer management.
Based on the experience of our customers, we strongly believe search has more value when you add more data sources. The key is to enable users to understand and explore the inter-relationships between different datasets in an intuitive manner.
Without divulging your firm’s methods, will you characterize a typical use case for your firm’s content processing, tagging, and search and retrieval capabilities?
SMART InSight enables customers to integrate multiple data sets from one or more departments, to easily navigate across these datasets and to analyze massive data sets using charts and tables driven by search. The application interface is determined by the data and by what kind of discovery & analysis customer needs – somewhat similar to a BI system.
Can you give me an example?
Sure. I want to use an automotive industry use case.
We fed data from the American agency NHTSA (National Highway and Transport Safety Administration) into SMART InSight. We shared this prototype with some of the large Japanese auto firms. Their analysts discerned found trends and common issues using the standard charts and other features.
We then integrated customer’s datasets with the NHTSA data to build a powerful analysis application. One of its key features is, what we call: “Data Chain”.
What’s a “data chain”?
A data chain uses fields with similar meaning—for example, component category or VIN [Vehicle Identification Number–to create data driven inter-linkages. These linkages allow you to navigate dynamically across the data sets. So users starting from part failure were able to ‘data chain’ to performance data for the affected cars, related claims, NHTSA reports etc. Engineers were thus able to comprehend the problem in a comprehensive way, what I call a 360 degree view. Analysts were then able with a mouse click to drill down to causes and solutions. The analysts could also comment on and tag the data to create a shared context and highlight trends and issues for their co-workers. All the tagging and other usage and sharing information is part of an automated learning loop, which constantly improves the search relevancy and makes users more productive.
Services are often among the highest margin offerings an organization can offer. Is the need to sell consulting altering the simplicity of the installation, configuration, tuning, and customization of search applications?
Information, even if accessed on an iPad, is a complex challenge.
While we provide services, our business model is built around providing the product and related services like maintenance & support. We want our customers and partners to be able to manage the technology, as they best know how to maximize the value. We advise on best practices, help them overcome technical hurdles and provide support to ensure risk is minimized.
What’s the upside?
The key benefits depend upon each offering. First, we have a product which delivers upon installation a rich feature set in a reliable and scalable product. This enables customers to build solutions that address their use cases by focusing on the business logic without worrying about the technology.
Next our approach includes maintenance and support. We know that customers want support in order to reduce risk and ensure a successful experience during the life of the solution.
Finally, we help our clients create an internal team, which can manage and expand the search solution in tight synchronization with evolving business requirements.
How does an information retrieval engagement move through its life cycle?
It usually starts with customers asking us to help them understand how information retrieval can play a part in meeting their business challenges. We then get the customer’s sample data and wrap SMART InSight around it.
The approach involves some data analysis to integrate the information and building a few sample screens using our Ajax portal interface. Once users play with the data using the sample screens, they can imagine how best to analyze the information and what kind of application UI is required. We recommend this approach, as customers do not need to first create requirements specifications. Customers and users find it is much easier to change and improve the interface working from this kind of prototype than it is to start from a blank page.
The final implementation focuses on helping customers tune our widget library and pages to build the required application UI. Once this is underway, we then map mapping the data to configure correctly the content processing and index schema.
In our projects, ground up development is minimal as our feature set includes content ingestion, search, portal and collaboration features. Post implementation moves to training the customer team and helping them maintain and enhance the solution through support services.
One challenge to those involved with squeezing useful elements from large volumes of content is the volume of content AND the rate of change in existing content objects. What does your firm provide to customers to help them deal with the volume (scaling) challenge?
We serve two market segments: An Intranet within the licensee’s enterprise, and Internet or outside the firewall information.
In the enterprise market, update frequency is relatively lower except while dealing with transactional databases. We have implemented customer solutions with over 100 million records from 10 data sources. There were no latency issues.
Things get more challenging in the Internet segment. We are currently dealing with a project in China where not only does the data have over 300 facets, its volume and update frequency are both amongst the largest in the world. Having the expertise brought by Lucid Imagination becomes critical in such situations. Together, Lucid Imagination and Uchida Spectrum are helping this customer architect a large scale system by optimizing queries and the schema with multiple indexing and search nodes.
Another challenge, particularly in professional intelligence operations, is moving data from point A to point B; that is, information enters a system but it must be made available to an individual who needs that information or at least must know about the information. What does your firm offer licensees to address this issue of content “push”, report generation, and personalization within a workflow?
That’s a great question.
My colleagues and I believe that the “Right information, right person, right time” is the critical need of many of our customers. SMART InSight offers sophisticated alerting to achieve this. Multi-parameter rule driven alerts can be sent out in real time. We also offer daily or weekly digests for other information needs. The value of alerting increases as we add more data sources into the index. Users are then able to monitor and track all relevant information flows.
There has been a surge in interest in putting “everything” in a repository and then manipulating the indexes to the information in the repository. On the surface, this seems to be gaining traction because network resident information can “disappear” or become unavailable. What’s your view of the repository versus non-repository approach to content processing?
As you know, Solr/Lucene creates an index of the information and does not store the actual information. With this approach one of significant advantages we experience is flexibility. Typically, repository solutions are developed following strict waterfall methodology with stable requirement specifications. We think this approach may be a bit out of step with today’s rapidly evolving information climate. By comparison we can be far more flexible, for example, by using dynamic fields in Solr/Lucene and readily changing the ranking algorithm.
We use connectors bundled with LucidWorks Enterprise to pull the data from databases and other content repositories. In some cases, our system integration partners or us may build a custom connector. The LucidWorks Enterprise connector framework we get from Lucid Imagination makes this much easier.
Visualization has been a great addition to briefings. On the other hand, visualization and other graphic eye candy can be a problem to those in stressful operational situations. What’s your firm’s approach to presenting “outputs” for end user reuse or for mobile access? Is there native support in Solr/Lucene or via Lucid Imagination for results formats?
What and how much information to put on the screen is always a challenge; SMART InSight resolves the clutter problem in two ways.
First, visualization, when used correctly, is extremely powerful. For this reason, our solution implementation focuses on designing the right application UI. We have built up a great deal of experience over multiple projects and are able to guide customers to design screens for experts and different ones for the simple user.
Second, we also enable users to build their own UI by selecting widgets much like iGoogle or My Yahoo. Thus a user who prefers graphs can add chart widgets and manipulate what should be the X an Y axes. We use LucidWorks Enterprise features like faceting and scoring to build accurate charts. Control over the widgets and what content fields users would like to see enables fully personalized information consumption.
I am on the fence about the merging of retrieval within other applications. What’s your take on the “new” method that some people describe as “search enabled applications”? Autonomy and Endeca each have work flow components as part of their search platforms? What’s Uchida Spectrum’s capability in workflow or similar enterprise embedding of search?
SMART InSight is both “search enabled” and “database enabled”. I wonder if any vendor uses the term “database enabled application”. The point of the “search enabled” jargon is that search is a relatively newer technology than databases. As technology becomes embedded into our lives it is no longer noticed.
Search is much more than a search box and a set of results. I think some of the work being done here by Autonomy and Endeca is commendable. The question is whether they can deliver value at a reasonable price point and thus cater to more customer segments. In this context, we are using the Lucene/Solr open technology as the foundation because we are able to deliver high return on investment with a flexible and scalable solution.
We believe this will expand the market for search and thus, hopefully, make the phrase “search enabled application” redundant.
I see you will be speaking at the forthcoming Lucene Revolution conference. What are the key trends you expect to see materialize there?
One of the key debates is search versus database. Lucene Revolution will inform this debate by showcasing how more and more large firms are choosing search. This impacts the perception of search as an enterprise ready technology. As a snowball effect, I see search augmenting databases in many applications. Companies will then need to build search expertise much the same way they have database architects and developers. I believe Lucid Imagination will play a central role in making this happen.
Lucene Revolution brings higher cohesiveness to the Lucene/Solr movement and makes visible its size. Its disruptive innovation and open source model poses a strong challenge for the established commercial vendors. The mainstreaming of the interest in Lucene/Solr means these players need to fashion a cogent strategy response. Might this trigger realignment within the search industry – mergers, diversification or focus on niche markets?
Our priority is expanding the search story in relatively under-penetrated markets like China & India. The large IT pool especially in India offers an opportunity to expand the Lucene/Solr movement. Today these engineers have developed the habit of only using databases in their solution architecture – and as the adage goes “if you have a hammer, everything looks like a nail”. We need to train them on search so it is a default part of their solution toolkit. This becomes imperative as China and India will be at the center of the Internet due to the size of their fast growing online populations and rising income levels.
ArnoldIT Comment
If you are seeking a resource to assist your organization in moving from Fast Search’s ageing technology to the Lucene/Solr platform, you will want to speak with Uchida Spectrum. You can get more information about Uchida Spectrum at the Lucene Revolution Conference and from the firm’s Web site at http://www.spectrum.co.jp/.
Stephen E Arnold, April 28, 2011
Interview courtesy of Lucid Imagination
Agarwalla of Uchida Spectrum Discusses Open Source Search
April 28, 2011
If you are tracking the maturation of open source search, you will want to read the exclusive interview with Rahul Agarwalla of Uchida Spectrum. Uchida Spectrum provides products and services in enterprise search and content processing. Mr. Agarwalla has built a number of successful Internet businesses. In addition, he is an expert in Fast ESP and other search systems, including Lucene/Solr.
He said:
Uchida Spectrum is one the leaders in the Japan search market. It all started in 2002, when we saw opportunity at the intersection of software and information. That was the inspiration to launch the search business. Our product, SMART InSight, is a search application that integrates information from across the enterprise in easy–to-navigate cross department information chains, and adds visual summaries that add value through contextual metadata and analytics.
On the subject of his product SMART InSight he said:
Our product, SMART InSight, uses search to integrate and retrieve information — so scalability and reliability, at reasonable cost, are critical factors. Lucene/Solr has delivered this in spades. The amount of data we can index on a server and the ability to scale in a linear fashion are unmatched. For instance, in one project we found a 10x improvement due to lower cost of ownership combined with higher performance.
You can get more information about Uchida Spectrum at www.spectrum.co.jp. The full text of the interview is available at http://wp.me/pf6p2-4vg. Mr. Agarwalla will be speaking at the Lucene Revolution in San Francisco at the end of May 2011.
Stephen E Arnold, April 28, 2011
Freebie
Deloitte on Top Tech Trends: Where Is Search?
April 27, 2011
Editor’s Note: This is a article written by Iain Fletcher, vice president of Search Technologies. We found his comments about a recent study authored by a top notch team at one of the world’s leading consulting firms interesting and thought provoking.
My colleagues and I were in a client meeting and had a break. One of the documents available to us was Deloitte’s report “Technology Trends 2011. The Natural Convergence of Business and IT.” The report looked interesting and we were able to download a copy of this report from the Deloitte Web site without a fee and without registering.
We found this passage particularly interesting:
… important developments are underway this year, adding compelling new dimensions to the decision process. We recommend taking a fresh look at each (Re)Emerging Enabler to see how it can apply to you in the near term, and whether new investments make sense. Disruptive Deployments require a more creative lens.
We thought the Deloitte approach of identifying enablers such as visualization and security was useful. The report then put the future in perspective by describing disruptive technologies. Among these were analytics, social computing, and mobile solutions. What struck us as interesting was the peppering of “search” throughout the book. There was no pivot point for findability. In our work, we have learned that there is an urgent need to process structured and unstructured information, making it easy for employees to locate needed information in an efficient way, and coping with the problems of “big data”.
I spoke with my colleagues at Search Technologies, which is one of the largest independent search application implementation companies. We agree with most of the Deloitte trends. My take away from our discussion was that unstructured data quality was a key issue for both search across an enterprise and for the identified emerging trend of information visualization. Visualization is an increasingly important part of business intelligence and relies on the quality of the input data. Poor data in means ill-informed decision out, whether via search or any other means.
In today’s financial climate, organizations need to reduce costs. In our experience, employees hunting for information is expensive and inefficient. The cost control is important. As important is the need to improve the efficiency of information retrieval. With search and content processing embedded in work flows, we see search and content processing as a foundation, not an add on or a spice in a consulting engagement.
Second, the merger of business processes and information access extends to the integration of different software systems. There are many buzzwords in use to describe what most senior managers intuitively know; namely, it is easier to make sense of disparate data if the information is presented in a context. Visualization, as Deloitte noted, is an enabler. However, the plumbing and the configuration of the output systems are as important as the attractive graphics. Third, young university graduates do not understand why “silos” of information force them to use multiple enterprise systems and findability solutions. Deloitte did not emphasize the generational divide that we find in some of our engagements. As today’s recent college graduates move upwards and outwards in their careers, their impact will be significant.
For more information about our firm’s approach to technical, engineering, and business consulting, visit www.searchtechnologies.com
Iain Fletcher, April 27, 2011
Search Technologies
Oracle Text Installation Help
April 25, 2011
We ran a query for IBM OmniFind and Oracle Text on Google and found that Beyond Search is one of the sources for information about how to configure these systems and obtain documentation about specific methods. We were hoping that IBM and Oracle would occupy the top spot, but it Harrod’s Creek is the go to place, we are okay with that.
As we have said, Oracle Text is a handy tool for building text queries and document classification applications. For some reason, directions for manually installing Oracle Text are somewhat elusive. That being said, Beyond Search has some good news.
So, if you’ve ever wondered how to “Install Oracle Text on Oracle Database 11gR2”, this blog post is for you. The link contains the code snippets required to install the text, install the language and verify. Appropriate explanations are included.
We’d say it is definitely worth tucking into your Oracle Text tips folder for safekeeping.
Sarah Rogers, April 25, 2011
Freebie
Is IBM Reshaping Its Approach to Enterprise Search?
April 25, 2011
IBM is a mysterious and baffling outfit to me. One day I get a call from eager IBMers panting to find out what I know about the vendors in enterprise search. content processing, and semantics. Then weeks, maybe months go by, before an IBM person emails me a message like “We’ve been really busy” or “We don’t have a very big budget but maybe you could talk for free”. The classic IBM input I had this year is from a person who agreed to participate in a Search Wizards Speak interview via email. Months after the deadline, I was told an excuse similar to those I heard when I was a freshman in college and a classmate was explaining that his mother and dog died on the same day.
A better search or a more complex guitar? Source: http://www.heirloomradio.com/history.htm
Imagine my surprise when I received a link to a story from Yomiuri Online. “Natural Language Analysis Software, IBM Japan” contained what may be an compass reading about IBM’s enterprise search strategy. In a nutshell, IBM may be hooking together a content analytics component with the Lucene based OmniFind Enterprise Edition 9.1. Instead of offering what I can download from Apache or Lucid Imagination, IBM has grafted on text analytics.
The product, which becomes available on April 26, 2011, in Japan. IBM Content Analytics with Enterprise Search mashes up text mining software and information retrieval software. For good measure, IBM includes natural language analysis technology.
The other shocker, if the person translating the article was accurate, is that IBM will compete aggressively on price. I am not sure how IBM prices its products in Japan, but the software could, for all practical purposes be free. IBM makes its money on hardware and services with services becoming increasingly important in my opinion.
The product will handle social content, the unstructured data that plagues customer service operations, and email, among other source and file types. The system classifies content and outputs analytics, which may mean anything from a simple frequency count to a more elaborate SPSS type of function. If prices are indeed low, my hunch is that the SPSS type horsepower will not be present in full royal wedding regalia.
Some questions:
- Will this approach make IBM a bigger contender in enterprise search? No. IBM may be trying to carve a new niche for itself but Autonomy and Exalead are already there.
- Will this play explain the role of Watson or what IBM is doing with the dozens of analytics companies it has acquired? No.
- Is this a new trend in enterprise search? No.
- Will IBM continue to make sales to organizations who want to “go IBM”? Yep.
Vendors have been trying to distance themselves from the word “search” for years. In a sense, IBM is just late to the party. But with its financial resources and clout, tardiness may not matter.
Stephen E Arnold, April 25, 2011
Freebie unlike IBM professional services or a technical roll for a FRU.
Protected: SharePoint Content and Editing Tips
April 25, 2011
PolySpot Names David Fischer Head of Research and Development
April 24, 2011
We learned last week that Polyspot, a vendor of search and content processing systems, named David Fischer to the post of director of research and development. As the firm’s chief technical officer, he will be responsible for the definition and implementation of technology policy solutions. Prior to joining Polyspot, Mr. Fischer worked at Apple and Business Objects. He is a graduate of ENST.
According to the company’s official announcement:
PolySpot is at a crucial stage in its development and is working to bring fundamental charges to its platform in order to continue to be a player in the search market and access to information ahead of its time and its competitors.
For more information about PolySpot, navigate to www.polyspot.com.
Stephen E Arnold, April 24, 2011
Freebie
Autonomy Financials via a Mid Tier Consultant
April 23, 2011
In my email this morning, was a short item that pointed me to Autonomy’s 2011, first quarter financial results. I took a quick look at the top line revenues, multiplied by four and concluded:
- Autonomy has a better than even chance of breaking $1 billion in revenue before the end of its current fiscal year
- Autonomy was growing and rolling out new products and services, including an interesting medical and health product, other vendors of search were floundering (Google), giving away search as part of bundles and other deals (Microsoft, Oracle), struggling to be findable by potential customers (Thunderstone, a search vendor whose name is now used by a band and a game), or repositioning themselves to be something other than a vendor of enterprise search (Brainware for scanning, Coveo for customer support).
- Autonomy was reporting growth in its various of lines of business at a decent rate; 28 percent organic growth if I read the report correctly.
The story was ignored by most of the financial wizards who monitor search for the bottom tier and mid tier consulting firms. I read one “analysis” from an outfit called Gerson Lehrman Group which was written by a single individual but presented with a royal “we”. What struck me was that individuals seem happy pontificating about search, financials, and a darned complex technology using sentences that remind me of the rhetoric for the royal wedding. Wedding coverage has more substance than analyses of enterprise search I think.
In my new landscape of search study for Pandia.com, I analyze Autonomy, finding enough bone and gristle to fill 13 pages with technical goodies, comments, and critical evaluation of a company that blew past Convera, Delphes, Endeca, Entopia, Fast Search & Transfer, Powerset, Radar Networks, and a bunch of others.
If you want a free run down on what Autonomy has been doing in the last two years, just do the query “Autonomy” in the search box on the splash page of this blog or click this link. We changed our search results display to make it easier for users to get a sense of search vendor activities. For the more timely information, click this link for my free Overflight “what’s happening” report.
Stephen E Arnold, April 23, 2011
Freebie unlike low and mid tier consulting services
Enterprise Search Reaches Out to Video
April 22, 2011
Probably many of us are familiar with video in the workplace, but with limited applications like training. The next step is finding more ways to make video work for us, as stated in “Searching for Value: Overcoming the Challenges of Video in the Enterprise”.
The referenced article focuses on the obstacles associated with implementing video as a vehicle for knowledge sharing. Namely, the amount of bandwidth required to process videos can become a nightmare for ill-prepared companies and in turn disrupt other services. Even more importantly:
“A few reasons why video poses challenges go beyond bandwidth, but confront issues associated with ownership, archival and business value. Companies serious about video need to consider a few necessary additions to their search infrastructure…”
The author recommends these additions include enterprise search technology, digital asset management and hosted video solutions.
There are a number of outfits who have already been successfully solving these problems. Exalead’s Voxalead and Autonomy’s Virage systems can both process video, making it searchable and providing an expansive toolset to the user. Even Cisco recently announced including video search capabilities in its TelePresence package. So no need to reinvent the wheel on this one; jumping on the latest corporate trend can be easier than ever before. Or should I say more robust?
Sarah Rogers, April 22, 2011
Freebie
Protected: Microsoft SharePoint a Swiss Army Knife? Almost.
April 22, 2011


