Interview: Forensic Logic CTO, Ronald Mayer
May 20, 2011
Introduction
Ronald Mayer has spent his career with technology start-ups in a number of fields ranging from medical devices to digital video to law enforcement software. Ron has also been involved in Open Source for decades, with code that has been incorporated in the LAME MP3 library, the PostgreSQL database, and the PostGIS geospatial extension. His most recent speaking engagement was when he gave a presentation on a broader aspect of this system to the SD Forum’s Emerging Tech SIG titled “Fighting Crime: Information Choke Points & New Software Solutions.” His Lucene Revolution talk is at http://lucenerevolution.org/2011/sessions-day-2#highly-mayer.
Ronald Mayer, Forensic Logic
The Interview
When did you become interested in text and content processing?
I’ve been involved in crime analysis with Forensic Logic for the past eight years. It quickly became apparent that while a lot of law enforcement information is kept in structured database fields, often richer information is in their text narratives, word documents on their desktops, or internal email lists. Police officers are all-to-familiar with long structured search forms for looking stuff up in their systems that are built on top of relational databases. And there are adequate text-search utilities for searching the narratives in their various systems one at a time. And separate text-search utilities for searching their mailing lists. But what they really need is something as simple as Google that works well on all the information they’re interested in–both their structured and unstructured content–both their internal data documents and ones from other sources; so we set out to build one.
What is it about Lucene/Solr that most interests you, particularly as it relates to some of the unique complexity law enforcement search poses?
The flexibility of Lucene and Solr interest are what really attracted me to Solr. There are many factors that contribute to how relevant a search is to a law enforcement user. Obviously traditional text-search factors like keyword density, and exact phrase matches matter. How long ago an incident occurred is important (a recent similar crime is more interesting than a long-ago similar crime). And location is important too. Most police officers are likely to be more interested in crimes that happen in their jurisdiction or neighboring ones. However, a state agent focused on alcoholic beverage licenses may want to search for incidents from anywhere in a state but may be most interested in ones that are at or near bars. The quality of the data makes things interesting too. Victims often have vague descriptions of offenders, and suspects lie. We try to program our system so that a search for “a tall thin teen male” will match an incident mentioning “a 6’3″ 150lb 17 year old boy.” There’s been a steady emergence of information technology in law enforcement, such as in New York City’s CompStat.
What are the major issues in this realm, from an information retrieval processing perspective?
We’ve had meetings with the NYPD’s CompStat group, and they have inspired a number of features in our software including powering the CompStat reports for some of our customers. One of the biggest issues in law enforcement data today is bringing together data from different sources and making sense of it. These sources could be from different systems within a single agency like records management and CAD (Computer Aided Dispatch) systems and internal agency email lists – or groups of cities sharing data with each other – or federal agencies sharing data with state and local agencies.
Is this a matter of finding new information of interest in law enforcement and security? Or is it about integrating the information that’s already there? Put differently, is it about connecting the dots you already have, or finding new dots in new places?
Both. Much of the work we’re doing is connecting dots between data from two different agencies; or two different software systems from within a single agency. But we’re also indexing a number of non-obvious sources as well. One interesting example is a person who was recently found in our software, and one of the better documents describing a gang he’s potentially associated with a Web page about one of his relatives in Wikipedia.
You’ve contributed to Lucene/Solr. How has the community aspect of open source helped you do your job better, and how do you think it has helped other people as well?
It’s a bit early to say I’ve contributed – while I posted my patch to their issue tracking Web site, last I checked it hadn’t been integrated yet. There are a couple users who mentioned to me and the mailing lists that they are using it and would like to see it merged. The community help has been incredible. One example is when we started a project to make a minimal simple user interface to let novice users find agency documents. We noticed that the University of Virginia/Stanford/etc.’s Project Blacklight which is a beautiful library search product built on Solr/Lucene. Our needs for one of our products weren’t too different – just for an internal collection of documents with a few additional facets. With that as a starting point we had a working prototype in a few man-days of work; and a product in a few months.
What are some new or different uses you would like to see evolve within search?
I’d be interesting if the search phrases can be aware of what adjectives go with which nouns. For example a phrase like
‘a tall white male with brown hair and blue eyes and
a short asian female with black hair and brown eyes’
should be a very close match to a document that says
‘blue eyed brown haired tall white male; brown eyed
black haired short asian female’
Solr’s edismax’s “pf2” and “pf3” can do quite a good job at this by considering the distance between words, but note that in the latter document the “brown eyes” clause is nearer to the male than the female; so there’s some room for improvement. I’d like to see some improved spatial features as well. Right now we use a single location in a document to help sort how relevant it might be to a user (incident’s close to a user’s agency are often more interesting than ones half way across the country). But some documents may be highly relevant in multiple different locations, like a drug trafficking ring operating between Dallas and Oakland.
When someone asks you why you don’t use a commercial search solution, what do you tell them?
I tell them that where appropriate, we also use commercial search solutions. For our analysis and reporting product that works mostly with structured data we use a commercial text search solution because it integrates well with the relational tables that also filter results for such reporting. The place where solr/lucene’s flexibility really shined for us is in our product that brings structured, semi-structured, and totally unstructured data together.
What are the benefits to a commercial organization or a government agency when working with your firm? How does an engagement for Forensic Logic move through its life cycle?
Our software is used to power the Law Enforcement Analysis Portal (LEAP) project which is a software-as-a-services platform for law enforcement tools not unlike Salesforce.com is for sales software. The project started in Texas and has recently expanded to include agencies from other states and the federal government. Rather than engaging us directly, a government agency would engage with the LEAP Advisory Board, which is a group of chiefs of police, sheriffs, and state and federal law enforcement officials. We provide some of the domain-specific software, while other partners such as Sungard manage some operations and other software and hardware vendors provide their support. The benefits of government agencies working with us are similar to the benefits of an enterprise working with Salesforce.com – leading edge tools without having to buy expensive equipment and software and manage it internally.
One challenge to those involved with squeezing useful elements from large volumes of content is the volume of content and the rate of change in existing content objects. What does your firm provide to customers to help them deal with the volume scaling) challenge? What is the latency for index updates? Can law enforcement and public security agencies use this technology to deal with updates from high-throughput sources like Twitter? Or is the signal-to-noise ratio too weak to make it worth the effort?
In most cases when a record is updated in an agency’s records management system, the change pushed to our system in a few minutes. For some agencies – mostly with older mainframe based systems, the integration’s a nightly batch job. We don’t yet handle high-throughput sources like Twitter. License plate readers on freeways are probably the highest throughput data source we’re integrating today. But we strongly believe it is worth the effort to handle the high-throughput sources like Twitter, and that it’s our software’s job to deal with the signal-to-noise challenges you mentioned to try to present more signal than noise to the end user.
Visualization has been a great addition to briefings. On the other hand, visualization and other graphic eye candy can be a problem to those in stressful operational situations? What’s your firm’s approach to presenting “outputs” for end user reuse or for mobile access? Is there native support in Lucid Imagination for results formats?
Visualization’s very important to law enforcement; with crime mapping and reporting being very common needs. We have a number of visualization tools like interactive crime maps, heat maps, charts, time lines, and link diagrams built into our software, and we also expose XML Web services to let our customers integrate their own visualization tools. Some of our products were designed with mobile access in mind. Others have such complex user interfaces you really want a keyboard.
There seems to be a popular perception that the world will be doing computing via iPad devices and mobile phones. My concern is that serious computing infrastructures are needed and that users are “cut off” from access to more robust systems? How do you see the computing world over the next 12 to 18 months?
I think the move to mobile devices is *especially* true in law enforcement. For decades most officers have “searched” their systems by using the radio they carry to verbally ask for information about people and property. It’s a natural transition for them to do this on a phone or iPad instead. Similarly, their data entry is often done first in paper in the field, and then re-data-entered into computers. One agency we work with will be getting iPads for each of their officers to replace both of those. We agree that serious computing infrastructures are needed, but our customers don’t want to manage those themselves. Better if an SaaS vendor manages a robust system, and what better devices than iPads and phones to access it. That said, for some kinds of analysis a powerful workstation is useful, so good SaaS vendors will provide Web services so customers can pull whatever data they need into their other applications.
Put on your wizard hat. What are the three most significant technologies that you see affecting your search business? How will your company respond?
Entity extraction from text documents is improving all the time; so soon we’ll be able to distinguish if a paragraph mentioning “Tom Green” is talking about a person or the county in Texas. For certain types of data we integrate, XML standards for information sharing such as the National Information Exchange Model are finally gaining momentum. As more software vendors support it, it’ll make it easier to inter-operate with other systems. Rich-media processing–like facial recognition, license plate reading, OCR, etc.–are making new media types searchable and analyzable as well.
I note that you’re speaking at the Lucene Revolution conference. What effect is open source search having in your space? I note that the term ‘open source intelligence’ doesn’t really overlap with ‘open source software’. What do you think the public sector can learn from the world of open source search applications, and vice versa?
Many of the better tools are open source tools. In addition to Lucene/Solr, I’d note that the PostGIS extension to the PostgreSQL database is leading the commercial implementations of geospatial tools in some ways. That said, there are excellent commercial tools too. We’re not fanatic either way. Open Source Intelligence is important as well; and we’re working with universities to bring some of the collected research that they do on organized crime and gangs into our system. Regarding learning experiences? I think the big lesson is that easy collaboration is a very powerful tool – whether it’s sharing source code or sharing documents and data.
Lucene/Solr seems to have matured significantly in recent years, achieving a following large and sophisticated enough to merit a national conference dedicated to the open source projects, Lucene Revolution. What advice do you have for people who are interested in adopting open source search, but don’t know where to begin?
If they’re interested, one of the easiest ways to begin is to just try it. On Linux you can probably install it with your OS’s standard package manager with a command like “apt-get install solr-jetty” or similar. If they have a particular need in mind, they might want to look if someone already built a Lucene/Solr powered application similar to their needs. For example, we wanted a searchable index for a set of publications/documents, and Project Blacklight gave us a huge head start.
David Fishman, May 20, 2011
Post sponsored by Lucid Imagination. Posted by Stephen E Arnold
SharePoint: In a Tuxedo and Ready for the Big Time
May 20, 2011
We at Search Technologies read the short news article called “SharePoint Naked” in Beyond Search on May 18, 2011. We found the write up somewhat amusing, but we also think that the comments about SharePoint as a development platform were at odds with our experience.
First, please, point your browser to the MSDN Developer Team Blog and the story “SharePoint 2010 Development Platform Stack.” The diagram presents the major building blocks of the SharePoint system.
This type of diagram presents what my college psychology professor called the gestalt.These types of broad views serve the same purpose as a city map. One has to know where the major features are, what roadways lead into and out of the city, and a range of other high level information.
The Microsoft blog diagram serves this function for a professional working with SharePoint.In fact, I doubt that a busy financial officer would look at this road map. Financial people monitor other types of information. The CFO works in one city and the SharePoint developer in another.
Both use maps, just different ones.Second, we think this diagram is extremely useful. It identifies the relationship among key components of the SharePoint development stack.
I found the inclusion of the Windows Server 2008 and the SharePoint Server 2010 as “book ends” insightful. Between these digital bookends, the focus on SharePoint Foundation 2010 was useful, clear, and complete. Third, the number of components in an enterprise system does not automatically mean increased costs.
Microsoft is doing an outstanding job of providing “snap in” components, tools, and documentation. In our experience, Search Technologies’ engineers can move from concept to operational status in a short span of time.
The foregoing does not mean that SharePoint is easier or harder than any other enterprise software. SharePoint is a robust system, which when appropriately configured and provisioned, can deliver outstanding return on investment and an excellent user experience.
Encouragingly for us, we’re finding that SharePoint adoptees– especially the big ones–get the importance of great search functionality as a foundation of productivity across the application spectrum. Encouragingly for Microsoft, who paid $1.2 billion for a Norwegian search company a couple of years ago, Fast Search for SharePoint fits the bill very nicely. We currently have a dozen organizations using our Fast Search for SharePoint proof of concept service.
Iain Fletcher, May 20, 2011
Search Technologies
More from IBM Watson: More PR That Is
May 19, 2011
IBM keeps flogging Watson, which seems to be Lucene wrapped with IBM goodness. We have reported on the apparent shift in search strategy at IBM; to wit, search now embraces content analytics. Many vendors are trying to spit shine worn toe cap oxfords in an effort to make search into a money machine. Good luck with that.
Network World tells us that “Watson Teaches ‘Big Analytics.’” Ah, more Watson hyperbole.
Skillful big analytics is necessary to make use of big data, of course, and in most cases speed is also a factor. Watson demonstrated proficiency at both with its Jeopardy win. Now, IBM hopes to use those abilities in enterprise products. As well they should; the need for such tools is expanding rapidly.
“Businesses successfully utilizing big analytics can take this process of knowledge discovery even further, identifying questions, exploring the answers and asking new questions based on those answers. This iterative quality of data analysis, rather than incremental exploration, can lead to a deeper understanding of business and markets, and begin to answer questions never before considered.”
Yep, we think we get it: Big data and a robust big analytic product are increasingly necessary to stay competitive. What we want to know, though, is this: when is all this going to change Web or Internet search? When will the Watson product be “a product”? Enough PR. That’s easy. How about a useful service we can test and compare to other systems?
Cynthia Murrell May 19, 2011
Freebie
The SharePoint Skeleton Exposed
May 19, 2011
Short honk: I absolutely love diagrams that explain SharePoint. First, end users do not want to look at this diagram. Second, chief financial officers must be distracted so that knowledge of this diagram does not reach their eyes. Consultants, certified SharePoint experts, and assorted SharePoint experts—You folks can wallow in this diagram all day long.
Here’s the “SharePoint 2010 Development Platform Stack.”
Elegant, clear, and inter-dependencies galore. Now what happens when you toss in Fast Search, its hundreds of configuration settings, and the bits and pieces needed to make Fast Search the lean, mean retrieval machine of your dreams? Well, you get to spend lots of time, brain cycles, and money to get everything humming right along.
Stephen E Arnold, May 18, 2011
Freebie unlike faux SharePoint expertise
Search: An Information Retrieval Fukushima?
May 18, 2011
Information about the scale of the horrific nuclear disaster in Japan at the Fukushima Daiichi nuclear complex is now becoming more widely known.
Expertise and Smoothing
My interest in the event is the engineering of a necklace of old-style reactors and the problems the LOCA (loss of coolant accident) triggered. The nagging thought I had was that today’s nuclear engineers understood the issues with the reactor design, the placement of the spent fuel pool, and the risks posed by an earthquake. After my years in the nuclear industry, I am quite confident that engineers articulated these issues. However, the technical information gets “smoothed” and simplified. The complexities of nuclear power generation are well known at least in engineering schools. The nuclear engineers are often viewed as odd ducks by the civil engineers and mechanical engineers. A nuclear engineer has to do the regular engineering stuff of calculating loads and looking up data in hefty tomes. But the nukes need grounding in chemistry, physics, and math, lots of math. Then the engineer who wants to become a certified, professional nuclear engineer has some other hoops to jump through. I won’t bore you with the details, but the end result of the process produces people who can explain clearly a particular process and its impacts.
Does your search experience emit signs of troubles within?
The problem is that art history majors, journalists, failed Web masters, and even Harvard and Wharton MBAs get bored quickly. The details of a particular nuclear process makes zero sense to someone more comfortable commenting about the color of Mona Lisa’s gown. So “smoothing” takes place. The ridges and outcrops of scientific and statistical knowledge get simplified. Once a complex situation has been smoothed, the need for hard expertise is diminished. With these simplifications, the liberal arts crowd can “reason” about risks, costs, upsides, and downsides.
A nuclear fall out map. The effect of a search meltdown extends far beyond the boundaries of a single user’s actions. Flawed search and retrieval has major consequences, many of which cannot be predicted with high confidence.
Everything works in an acceptable or okay manner until there is a LOCA or some other problem like a stuck valve or a crack in a pipe in a radioactive area of the reactor. Quickly the complexities, risks, and costs of the “smoothed problem” reveal the fissures and crags of reality.
Web search and enterprise search are now experiencing what I call a Fukushima event. After years of contentment with finding information, suddenly the dashboards are blinking yellow and red. Users are unable to find the information needed to do their job or something as basic as locate a colleague’s telephone number or office location. I have separated Web search and enterprise search in my professional work.
I want to depart for a moment and consider the two “species” of search as a single process before the ideas slip away from me. I know that Web search processes publicly accessible content, has the luxury of ignoring servers with high latency, and filtering content to create an index that meets the vendors’ needs, not the users’ needs. I know that enterprise search must handle diverse content types, must cope with security and access controls, and perform more functions that one of those two inch wide Swiss Army knives on sale at the airport in Geneva. I understand. My concern is broader is this write up. Please, bear with me.
New Landscape of Enterprise Search Details Available
May 18, 2011
Stephen E Arnold’s new report about enterprise search will be shipping in two weeks. The New Landscape of Enterprise Search: A Critical Review of the Market and Search Systems provides a fresh perspective on a fascinating enterprise application.
The centerpiece of the report are new analyses of search and retrieval systems offered by:
Unlike the “pay to play” analyses from industry consultant and self-appointed “experts,” Mr. Arnold’s approach is based on his work in developing search systems and researching search systems to support certain inquiries into systems’ performance and features.
, to focus on the broad changes which have roiled the enterprise search and content processing market. Unlike his first “encyclopedia” of search systems and his study of value added indexing systems, this new report takes an unvarnished look at the business and financial factors that make enterprise search a challenge. Then he uses a historical base to analyze the upsides and downsides of six vendors’ search solutions. He puts the firm’s particular technical characteristics in sharp relief. A reader gains a richer understanding of what makes a particular vendor’s system best suited for specific information access applications.
Other features of the report include:
- Diagrams of system architecture and screen shots of exemplary implementations
- Lists of resellers and partners of the profiled vendors
- A comprehensive glossary which attempts to cut through the jargon and marketing baloney which impedes communication about search and retrieval
- A ready-reference table for more than 20 vendors’ enterprise search solutions
- An “outlook” section which offers candid observations about the attrition and financial health of the hundreds of companies offering search solutions.
More information about the report is available at http://goo.gl/0vSql. You may reserve your copy by writing seaky2000 @ yahoo dot com. Full ordering information and pricing will be available in the near future.
Donald C Anderson, May 18, 2011
Post paid for by Stephen E Arnold
RedMonk and Open Source
May 18, 2011
If you have worked with traditional consulting firms, you know that “open” is not part of the standard method. At RedMonk, open is a pivot point. The company provides a range of services to organizations world wide which have a need for intelligence about open source software. RedMonk has emerged as one of the leaders in the open source community, providing traditional advisory services as well as specialized capabilities tailored to the fast growing open source sector.
You can learn read an exclusive interview with Stephen O’Grady, one of the founders of RedMonk. In a wide ranging discussion with Stephen E Arnold, publisher of Beyond Search, Mr. O’Grady talks about open source technology and its impact on traditional commercial, proprietary software.
In response to a question about the business implications of open source software, he said:
As with every other market with credible open source alternatives, the commercial landscape of search has unquestionably been impacted. Contrary to some of the more aggressive or doom crying assertions, open source does not preclude success for closed source products. It does, however, force vendors of proprietary solutions to compete more effectively. We talk about open source being like a personal trainer for commercial vendors in that respect; they can’t get lazy or complacent with open source alternatives readily available.
He continued:
Besides pushing commercial vendors to improve their technology, open source generally makes pricing more competitive, and search is no exception here. Closed source alternatives remain successful, but even if an organization does not want to use open source, search customers would be foolish not to use the proverbial Amdahl mug as leverage in negotiations.
You can read the complete interview with Mr. O’Grady at http://wp.me/pf6p2-4A2. He will be participating in the Lucene Revolution Conference as well.
Don C. Anderson, May 18, 2011
Freebie
Exclusive Interview: Stephen O’Grady, RedMonk
May 18, 2011
Introduction
The open source movement is expanding, and it is increasingly difficult for commercial software vendors to ignore. Some large firms have embraced open source. If you license, IBM OmniFind with Content Analytics, you get open source plus proprietary software. Oracle has opted for a different path, electing to acquire high profile open source solutions such as MySQL and buying companies with a heritage of open source. Sun Microsystems is now part of Oracle, and Oracle became an organization of influence with regard to Java. Google is open source, or at least Google asserts that it is open source. Other firms have built engineering and consulting services around open source. A good example is Lucid Imagination, a firm that provides one click downloads of Lucene/Solr and value-add software and consulting for open source search. The company also operates a successful conference series and has developed specialized systems and methods to handle scaling, big data, and other common search challenges.
I wanted to get a different view of the open source movement in general and probe about the more narrow business applications of open source technology. Fortunately I was able to talk with Stephen O’Grady, the co-founder and Principal Analyst of RedMonk, a boutique industry analyst firm focused on developers. Founded in 2002, RedMonk provides strategic advisory services to some of the most successful technology firms in the world. Stephen’s focus is on infrastructure software such as programming languages, operating systems and databases, with a special focus on open source and big data. Before setting up RedMonk, Stephen worked as an analyst at Illuminata. Prior to joining Illuminata, Stephen served in various senior capacities with large systems integration firms like Keane and consultancies like Blue Hammock. Regularly cited in publications such as the New York Times, NPR, the Boston Globe, and the Wall Street Journal, and a popular speaker and moderator on the conference circuit, Stephen’s advice and opinion is well respected throughout the industry.
The full text of my interview with him on May 16, 2011 appears below.
The Interview
Thanks for making time to speak with me.
No problem.
Let me ask a basic question. What’s a RedMonk?
That’s my favorite question. We are a different type of consultancy. We like to say we are “not your parents’ industry analyst firm.” We set up RedMonk in 2002.
Right. You take a similar view of industry analysts and mid tier consulting firms that I do as I recall.
Yes, pretty similar. We suggest that the industry analysis business has become a “protection racket… undoubtedly a profitable business arrangement, but ultimately neither sustainable nor ethical. In fact, we make our content open and accessible in most cases. We work under yearly retained subscriptions with clients.
Over the last nine years we have been able to serve big household names to a large number of startups. We deliver consulting hours, press services, and a variety of other value adds.
Quite a few firms say that. What’s your key difference?
We are practical.
First, RedMonk is focused on developers, whom we consider to be the new “kingmakers” in technology. If you think about it, most of the adoption we’ve seen in the last ten years has been bottom up.
We’re “practitioner-focused” rather than “buyer-focused”. RedMonk is focused on developers, whom we consider to be the new “kingmakers” in technology. If you think about it, most of the adoption we’ve seen in the last ten years has been bottom up. Our core thesis is that technology adoption is increasingly a bottom up proposition, as demonstrated by Linux, Apache, MySQL, PHP, Firefox, or Eclipse. Each is successful because these solutions have been built from the ground floor, often in grassroots fashion.
Third, we are squarely in the big data space. The database market was considered saturated, but it exploded with new tools and projects. A majority of these are open source, and thus developer friendly. We are right in the epicenter of that shift.
Do you do commissioned research?
No, we don’t do commissioned research of any kind. We just don’t see it as high value, even if the research is valid.
How has the commercial landscape of search specifically, and data infrastructure generally, been impacted – for better or for worse – by open source?
As with every other market with credible open source alternatives, the commercial landscape of search has unquestionably been impacted. Contrary to some of the more aggressive or doom crying assertions, open source does not preclude success for closed source products. It does, however, force vendors of proprietary solutions to compete more effectively. We talk about open source being like a personal trainer for commercial vendors in that respect; they can’t get lazy or complacent with open source alternatives readily available.
Isn’t there an impact on pricing?
Great point.
Besides pushing commercial vendors to improve their technology, open source generally makes pricing more competitive, and search is no exception here. Closed source alternatives remain successful, but even if an organization does not want to use open source, search customers would be foolish not to use the proverbial Amdahl mug as leverage in negotiations.
When the software is available for free, what are customers paying for?
Revenue models around open source businesses vary, but the most common is service and support. The software, in other words, is free, and what customers pay for is help with installation and integration, or the ability to pick up the phone when something breaks.
A customer may also be paying for updates, whereby vendors backport fixes or patches to older software versions. Broadly then, the majority of commercial open source users are paying for peace of mind. Customers want the same assurances they get from traditional commercial software vendors. Customers want to know that there will be there someone to help when bugs inevitably appear: open source vendors provide that level of support and assurance.
What’s the payoff to the open source user?
That’s my second favorite question.
The advantages to this model from the customer perspective are multiple, but perhaps the most important is what Simon Phipps once observed: users can pay at the point of value, rather than acquisition. Just a few years ago, if you had a project to complete, you’d invite vendors in to do a bake off. They would try to prove to you in an hour or two demo that their software could do the job well enough for you to pay to get it.
This is like an end run, right?
In general, but we believe open source software inverts the typical commercial software process. You download the software for free, employ it as you see fit and determine whether it works or not. If it does, you can engage a commercial vendor for support. If it doesn’t, you’re not out the cost of a license. This shift has been transformative in how vendors interact with their customers, whether they’re selling open source software or not.
The general complexion of software infrastructure appears to be changing. Relational databases, once the only choice, are becoming rather one of many. Where does search fit in, and how do customers determine which pieces fit which needs?
The data infrastructure space is indeed exploding. In the space of eighteen months we’ve gone from relational databases are the solution to every data problem to, seemingly, a different persistence mechanism per workload.
As for how customers put the pieces together, the important thing is to work backwards from need. For example, customers that have search needs should, unsurprisingly, look at search tools like Solr. But the versatility of search makes it useful in a variety of other contexts; AT&T for example uses it for Web page composition.
What’s driving the adoption of search? Is it simply a function of data growth, as the headlines seem to imply, or is there more going on?
Certainly data growth is a major factor. Every year there’s a new chart asserting things like we’re going to produce more information in the next year than in all of recorded history, but the important part is that it’s true. We are all–every one of us–generating massive amounts of information. How do you extract, then, the proverbial needle from the haystack? Search is one of the most effective mechanisms for this.
Just as important, however, has been the recognition amongst even conservative IT shops that the database does not need to be the solution to every problem. Search, like a variety of other non-relational tools, is far more of a first class citizen today than it was just a few short years ago.
What is the most important impact effective search can have on an organization?
That’s a very tough question. I would say that one of the most important impacts search can have is that a good answer to one question will generate the next question. Whether it’s a customer searching your Web site for the latest Android handset or your internal analyst looking for last quarter’s sales figures, it’s crucial to get the right answer quickly if you ever want them to ask a second.
If your search fails they don’t ask a second question, you’ll either have lost a potential customer or your analyst is making decisions without last quarter’s sales figures. Neither is a good outcome.
Looking at the market ahead, what trends do you see impacting the market in the next year or two? What should customers be aware of with respect to their data infrastructure?
There are a great many trends that will affect search, but two of the most interesting from my view will be the increasing contextual intelligence of search and the accelerating integration of search into other applications. Far from being just a dumb search engine, Solr increasingly has an awareness of what specifically it is searching, and in some cases, how to leverage and manipulate that content whether it’s JSON or numeric fields. This broadens the role that search can play, because it’s no longer strictly about retrieval.
And integration?
Okay, as for integration, data centers are increasingly heterogeneous, with databases deployed alongside MapReduce implementations, key-value stores and document databases.
Search fills an important role, which is why we’re increasingly seeing it not simply pointed at a repository to index, but leveraged in conjunction with tools like Hadoop.
What kind of threat does Oracle’s lawsuit over Google plus Java pose to open source?How does it compare to the SCO controversy with Linux some years back?
In my view, Oracle’s ongoing litigation of Google over Java related intellectual property has profound implications for both participants, but also for the open source community as a whole.
The real concern is that the litigation, particularly if it is successful, could have chilling effects on Java usage and adoption. As far as SCO is concerned, this is somewhat different in that it targets a reimplementation of the platform in Android rather than the Java platform itself. SCO was threatening Linux rather than a less adopted derivative.
While users of both Java and MySQL should be aware of the litigation, however, realistically the implications for them, if any are, are very long term. No one is going to abandon Java based open source projects, for example, based on the outcome of Oracle’s suit.
It seems like everyone who is anyone in the software world has an open source strategy, even through to Microsoft’s embrace of php. Should information technology executives and decision makers, who were once suspicious of open source, be suspicious of software vendors without a solid open source strategy?
With the possible exception of packaged applications, open source is a fact of life in most infrastructure software markets. Adoption is accelerating, the volume of options is growing, and – frequently – the commercial open source products are lower cost. So it is no surprise that vendors might feel threatened by open source.
But even if they choose not to sell open source software, as many do not, those without a solid open source interoperability and partnership story will be disadvantaged in a marketplace that sees open source playing crucial roles at every layer of the data center. Like it or not, that is the context in which commercial vendors are competing. Put more simply, if you’re building for a market of all closed source products, that’s not that large a market. In such cases, then, I would certainly have some hard questions for vendors who lack an open source strategy.
Where can a reader get more information about RedMonk?
Please, visit our Web site at www.redmonk.com.
ArnoldIT Comment
RedMonk’s approach to professional services is refreshing and a harbinger of change in the consulting sector. But more importantly, the information in this interview makes clear that open source solutions and open source search technology are part of the disruption that is shaking the foundation of traditional computing. Vendors without an open source strategy are likely to face both customer and price pressure. Open source is no longer a marginalized option. Companies from Twitter to Cisco Systems to Skype, now a unit of Microsoft, rely on open source technology. RedMonk is the voice of this new wave of technical opportunity.
Stephen E Arnold, May 18, 2011
Protected: Fixing Development Projects in SharePoint 2010
May 18, 2011
Protected: SharePoint Retrospect: A Productive Decade
May 17, 2011