An Interview with Seth Grimes
At the April 2010 Boston Search Engine Meeting, I engaged Seth Grimes in brief conversation which was the trigger for this interview. Mr. Grimes is an analytics strategy consultant. He is founding chair of the Sentiment Analysis Symposium and of the Text Analytics Summit, contributing editor at Intelligent Enterprise magazine, and text analytics channel expert at the Business Intelligence Network. Seth founded Washington DC-based Alta Plana Corporation in 1997. He consults, writes, and speaks on information-systems strategy, data management, and analysis systems, industry trends, and emergine analytical technologies.
The full text of our conversation about his Universal search system and the future of search and retrieval appears below.
What was the magnet that pulled you toward text processing?
My first text processing project was in '96-'97, when I worked for a Web-development firm. We stored text in an object-relational database system, Illustra, with a text "data blade" that added text types and indexes and search functions to the DBMS, which we used to drive dynamically generated Web applications for clients that included National Geographic, FedEx, and Radisson Hotels: pioneering stuff in the early days of the commercial Web. I quit that job in 1997 to go out on my own. I named my company Alta Plana, a name that (I figured) would seem familiar to potential customers and others in IT because Alta Vista was then the leading Web-search engine.
But really my heart is in data analysis, and my current involvement with text dates to 2002-3, when I first saw the feasibility of applying information-extraction technology to extend BI and data mining initiatives for knowledge discovery in text. So where in the mid-late '90s I was doing integrated data-text query-search, a few years later I saw the potential and promise of doing integrated data-text analysis. And that's what I've been doing and promoting since, as an analytics-strategy consultant, industry analyst, and writer.
What's the problem with basic search?
The problem with basic search is basic results: somewhat sense-less hit lists of documents with too-often dubious relevance. Fortunately, Web search is no longer so basic. Google, Bing, and a host of specialized engines have steadily (and often quietly) incorporated a succession of semantic indexing and query capabilities that have transformed them from search engines into, effectively, information-access tools.
Enterprise search is in ways more advanced than Web search, in ways more primitive. Enterprise search has to deal with security concerns, varied notions of relevance, application and portal integration -- but unlike Google and Bing and other Web-search engines (which, by the way, are used far more often than "enterprise-search" engines for enterprise purposes), for end users they are revenue sinks rather than sources, which perhaps creates innovation and uptake lag.
So it's the enterprise-search business case that's a difficult problem to crack: Enterprises should spend money on tools that deliver lesser enterprise value at higher cost than Web-search engines deliver?!
Martin White and I conducted some research for our book Successful Enterprise Search Management (Galatea, 2009), and we discovered that most organizations have five or more search systems. What does this say about the present situation in getting employees the information each requires to complete a task?
I've in the past characterized search as evidence of a failure of design. If information were correctly and adequately categorized and organized and made accessible, we wouldn't need search, would we? I've retreated from that view as I've seen search evolve into information access, into technology that not only finds but also organizes results from sources the user likely-as-not didn't know about. Yet I'd call my statement still largely true when it comes to the enterprise's own data holdings: Search is necessitated by a failure of design. Do a better job organizing information as it's created or acquired, and also, by the way, stop allowing application vendors to bring in siloed search applications, and the in-organization situation will improve.
Among the technologies getting attention are systems that apply index terms and classification tags that place a document in a context or in an enriched index. The buzzwords for this type of process range from metatagging to ontological indexing. What's your view of rich indexing?
I think that rich indexing is a good thing. Yet indexes can be rigid, prejudging user (searcher) intent, and they may not accommodate in-flowing information and real-time needs, what I'd call "situational search." These are reasons to build intelligence into search query-processing rather than just into better indexing.
The Semantic Web contributed some important methods, including a focus on both content and document structure. Google, Microsoft, and Yahoo, as well as vendors focused on the enterprise, describe their systems as having "semantic" functions. What's your view of semantic technology?
No one should confuse semantic technologies in general with one of their applications, the Semantic Web. Google and the other major search engines have great semantic technologies – take, as a simple example, their ability to recognize "map Massachusetts" or "38+5" as queries and deliver appropriate answers, also Google and Bing health maps – that have nothing, zero, to do with the Semantic Web. Google and the others are already delivering on the "Web of Data" that the Semantic Web aspires to be, even while the Semantic Web has been and remains a parallel, incomplete, never-up-to-date subset of the World Wide Web and the databases accessible through it.
As for semantic technology: I spend a lot of my time working with and around it in the form of text analytics (or text mining, if you prefer), algorithms, software, and processes that discover meaning and relationships in natural-language sources. This stuff is here-and-now, delivering business value for a broad array of applications, from customer-experience to drug discovery, in every major industry and government sector.
In the last year, I have noticed that search has been in some cases embedded in other enterprise functions. Google enhances its enterprise Apps and search is just "there". Document management, customer support, and back office enterprise systems are offering more robust search systems. What do you make of this growing interest in embedding search?
Search should be embedded where it can enhance a user's ability to carry out a business function. Embedding, in particular, allows the user's focus to remain on the business function rather than a supporting task.
Stand alone search and content processing systems can be expensive. In your experience, what can an organization do to get the functionality and control the costs for indexing?
Stand alone search and content-processing systems can also be inexpensive. Costs stay under control when the use of these systems is fully supported by requirements analysis – well, that's an IT truism – and when they're well integrated into a rational enterprise computing architecture.
Now I admit that I've just made a couple of high-concept statements. As a practical suggestion: There's great free, open source software out there, for instance, Apache Lucene/Solr for search, GATE for information extraction (supporting semantic indexing), Carrot2 for results clustering, and so on. There are lots more options.
Enhanced text processing can be computationally demanding. What technical innovations have you noticed that deliver enhanced content processing without requiring significant investments in computing infrastructure?
Cloud computing, obviously: hosted or delivered "as a service."
New vendors are entering the market on a weekly, maybe daily basis. I just wrote about FirstRain, Vic Consult, and FBLite, to name just three new pl ayers. Yet most of the 300 search and content processing companies I track seem to be struggling to get market traction. Are search and content processing technologies on a gerbil track, lots of activity but no progress?
I like FirstRain, which gathers and organizes information about corporations and employees. I've never heard of the other two you cite, which is a data point. I figure you get noticed if you have a "value proposition" (are folks still using that jargon?) and make the right people aware of it. So I might not be the right person for those latter two, or ...
I'm definitely a "Let a Hundred Flowers Bloom" guy (although not otherwise a Maoist), and I see a crowded solutions market as vibrant and full of promise.
A number of vendors have shown me very fancy interfaces. Are today's info literate online users ignoring information provenance and accuracy for ease and convenience?
The better of the new, or built-out, interfaces keep information in the fore-front and put all the new options on the sidelines, available when wanted. Google does this. And those options aren't eye candy. They allow the user to explore the result set in order to get more out of it. You can be sure, by the way, that Google and the others are studying how you use them in order to improve them.
Sure, of course we often ignore information provenance. That's why we, collectively, listen when corporate pitches are delivered by sports and other entertainment figures. Do Accenture, Nike, American Express, AT&T, Gatorade, et al. offer better or lesser products because of Tiger Woods' (former) paid endorsements? We make lots of decisions based on criteria that may seem irrational or unjustified.
What are the hot trends in search for the next 12 to 24 months?
You hit the #1 trends a few questions back: Semantics and rich indexing are important. I mentioned another: Results structuring based on inferred user intent, the use of semantics to transform search into information access.
Where can people get more information?
Web searches, of course! If you want to follow the stuff I put out as a writer and analysts, follow me on Twitter at @sethgrimes (http://twitter.com/sethgrimes) and visit my on-line business card at http://sethgrimes.com.
We recommend that you learn more about Mr. Grimes and his firm’s services. He is a frequent contributor to professional publications and a popular lecturer at conferences.
Stephen E. Arnold, June 9, 2010