An Attensity About Face?
February 3, 2010
Update, February 3, 2010, 9 pm Eastern. A person suggested that this administrative move is designed to get around one or more procurement guidelines. Sounds reasonable but if the marketing push were ringing the cash register, would such a shift be necessary?–Stephen E Arnold
I learned that the Attensity Group has set up a business unit to sell to the Federal government. I thought Attensity’s roots were in selling to the Federal government and that the company’s diversification into marketing was a way to break free of the stereotypical vendor dependent on US government projects. Guess I was wrong again.
A reader sent me a link to this January 28, 2010, news release “Attensity Government Systems Launches as a Wholly Owned US Subsidiary of Attensity Group.” I noted this passage in the news release:
AGS offers a unique combination of the world’s leading semantic technologies: Attensity Group’s full offering of semantic engines and applications along with Inxight technologies from SAP BusinessObjects. Government agencies can now leverage — for the first time – the powerful capabilities enabled by the combination of Inxight’s multi-lingual advanced entity and event extraction with that of Attensity Group’s patented Exhaustive Extraction. Exhaustive Extraction automatically identifies and transforms the facts, opinions, requests, trends and trouble spots in unstructured text into structured, actionable intelligence and then connects it back to entities – people, places and things. This new combined solution provides researchers with the deepest and broadest capabilities for identifying issues hidden in mountains of unstructured data — inside emails, letters, social media sites, passenger manifests, websites, and more.
In my experience, this is a hybrid play. Along with consulting and engineering services, Attensity will make its proprietary solutions available.
According Attensity, AGS, short for Attensity Government Systems, will:
provides semantic technologies and software applications that enable government agencies to quickly find, understand, and use information trapped in unstructured text to drive critical decision-making. AGS solutions pre-integrate nouns (entities) together with verbs, combining leading semantic technologies, such as Inxight ThingFinder, with Attensity’s unique exhaustive extraction and other semantic language capabilities. This creates a unique capability to see important relationships, create link analysis charts, easily integrate with other software packages, and connect the dots in near real-time when time is of the essence. The comprehensive suite of commercial off-the-shelf applications includes intelligence analysis, social media monitoring, voice of the citizen, automated communications response and routing, and the industry’s most extensive suite of semantic extraction technologies. With installations in intelligence, defense and civilian agencies, Attensity enables organizations to better track trends, identify patterns, detect anomalies, reduce threats, and seize opportunities faster.
I did a quick check of my files on Inxight. A similar functionality may be part of the Powerset technology that acquired acquired. My hunch is that Attensity wants to go after government contracts with a broader offering than its own deep extraction technology. The play makes sense, but I wonder if it will confuse the ad execs who use Attensity technology for quite different purposes than some US government agencies.
Will Attensity be a front runner in this about face, or will the company build out other specialized business units? I can see a customer support unit coming from a vendor, maybe Attensity, maybe not? The bottom line is that search and content processing vendors are scrambling in order to avoid what some business school egg heads call “commoditization.”
Stephen E Arnold, February 3, 2010
No one paid me to write about vendors selling to the US government. I will report this to the US government, maybe the GAO just to show that I am intrinsically responsible.
Inside Search: Raymond Bentinck of Exalead, Part 1
February 3, 2010
Editor’s introduction: Raymond Bentinck (who now works at Exalead) and I have discussed—maybe argued about–search and content processing every month or so for several years. He has deep experience in enterprise software, including stints at Verity, IBM and Oracle.
Our chosen field of intellectual combat for this conversation was a restaurant in Florida. On January 26, 2010, he and I engaged in a discussion of the woes that one-size-fit-all search vendors now face. In Europe, some customers want a single company like SAP to provide a full service solution. But SAP has met strong financial resistance due to the costs of this type of approach. In North America, some pundits have pointed out that the explosion of vendors offering bargain basement eDiscovery and customer support versions of their search and content processing technology represent the a new frontier in search. Other consultants tout the open source search solutions. Still others push appliances or search toasters. The text of our most recent discussion appears below:
Raymond, have you been keeping up with the consultants who are pointing out that search is now the equivalent of a discount store like Wal*Mart or Tesco?
I’ve kept abreast of consultants that say that search is a commodity with some amusement. If you think of search as being the ability to search over a companies Intranet without any security requirements and simply bring back some results with no context to the user’s query then they could be right.
But no sensible consultant would ever describe, being able to provide query and precise results over billions of up-to-the minute records including the ability to analyze the effectiveness of a companies mission critical operations as commodity.
Right, what I call azure chip consultants.
That’s a telling phrase. I think consultants often confuse the clients. This adds to the complexity of the decision process in my opinion.
But let me jump back to this point: Exalead is delivering information solutions for our customers today. The solution uses sophisticated data and content processing methods. Exalead’s approach demonstrates just how far search has progressed in the past few years. I think that the Exalead approach delivers the business intelligence layer to perform analytics on how to improve business moving forward, increase operational efficiency, reduce costs and improve margins.
Isn’t Exalead moving beyond traditional search and retrieval?
Yes and no. If you think of search as retrieval of information, on this basic level is does not really matter whether this data are structured or unstructured. In fact, even at Verity we delivered embryonic solutions around CRM for financial services or conflict checking in legal. However, the legacy search engines, in my opinion, are not capable of delivering solutions for the mainstream because of their lack of functionality and their complexity. Exalead is a new generation of solution that has been designed from the ground up to deliver these capabilities. These sort of mission critical business applications go under the heading of what I call Search Based Applications.
Can you give me an example of a search enabled application?
Certainly. One of Exalead’s clients, for example, replaced a traditional solution provided by Business Objects / SAP and Oracle. There were significant savings in license fees because this customer no longer needed the aging Business Objects system. Other savings resulted from trimming the number of Oracle licenses needed to run the older business intelligence system. The Exalead solution is now used by thousands more users who require no training. Exalead also slashed the latency in the system response time by a factor of 100. A query that once took 60 seconds to process and display, now processes in less than a second on a fully utilized hardware infrastructure. In addition, our solution delivered more functionality, halved the production costs, but importantly queries the up-to-the-minute data, not data that were hours or days out of date.
Are you saying that the commodity or open source solutions lack the engineering fire power of the Exalead system?
Yes. Even in traditional enterprise search type solutions, I do not see the word “commodity” used by our clients. Let me give you another example. You seem skeptical.
No, I am not skeptical. I saw a demonstration of the new Exalead system in December 2009, and I was impressed with the low latency and the way in which the system delivered answers, not a list of results.
Right. One of our recent new clients has a user base world-wide in excess of a hundred and fifty thousand and uses search over most of this global firm’s content repositories. The firm is now replacing its legacy enterprise search product, Verity K2.
Wow, Verity dates from the mid to late 1980s. I did not know that big name outfits were still using this technology. Can you give me some details?
I can tell you that this Exalead client was previously a flagship implementation for Verity for many years. This client is swapping out Autonomy / Verity for Exalead because the aging search solution was exceptionally hard to manage. In addition, the aging system was expensive to customize. The client’s engineers could not see how to utilize it to meet new and demanding information retrieval requirements moving forward. A final problem was the time required to fiddle with the Autonomy / Verity system to get it to deliver what the users needed. The long development times created staff frustration.
After several months of intense technical evaluations around the World with all the leading search vendors they chose Exalead. I do not think that they would have undertaken this expensive and time consuming exercise if they thought that search was a commodity problem.
I saw a demonstration of Exalead’s indexing method for video. Is that in production now?
Yes. Exalead has made a demonstration available on our Labs’s site at http://voxaleadnews.labs.exalead.com/ .
This solution indexes radio and video news from around the world in several languages. In addition to this, we extract in real-time relevant entities from the news items such as people, organizations and locations.
We offer what I call New Media search solutions, Exalead is demonstrating with customers such as Rightmove in the UK that we are able to provide next generation information management solutions. When I say “next generation” I mean that Exalead delivers advantageous semantic capabilities and operational benefits. Even after doing this, the Exalead solution reduced costs by 80 percent.
There is a revolution going on around search which has led well informed and respected analysts such as Sue Feldman from IDC to state that: “The next generation of information work will be search based.” You know Sue don’t you?
Yes, I have worked with her and also done some work for her at IDC.
In my opinion, the consultants who still state that search is a commodity are out of touch with what is gaining traction in savvy firms. Exalead has had a record year, and our growth in the midst of the economic downturn has been stronger than in previous years.
In your opinion, why are some consultants ignoring the search-based application revolution?
I think this is one of your key points. Many of the people advising enterprises about search lack the hands-on experience to know what the pitfalls are that will create problems for some of the traditional solutions. Let’s face it. Many of the flagship systems date from the mid 1990s. Exalead is a newer code base, and it was engineered to scale, be agile, and be easy to integrate with existing enterprise systems.
Can you expand on this idea? I am not sure we are on the same page?
Sure, we recently attended a business intelligence and data warehouse conference. all the traditional business intelligence vendors were there. Putting search in BI is a very hot topic within organizations at the moment.
In reality organizations want business intelligence solutions that a professional can use with no user training. Users want to be presented with data in a way that makes sense for them. Few want to do huge amounts of design work upfront that second guesses the questions that users want to ask. Traditional BI systems are not agile. As a result, when the business changes, an ever expanding army of programmers is required to re-engineer the solution. The idea is to deploy a system in weeks or months, not months or years. BI systems have to be able to extract structured data from unstructured content in order to perform both quantitative and qualitative analysis. BI systems have to be flexible in order to meet the needs of a user. BI systems have to be able to work with ever growing volumes of data. Stale data is just not acceptable which means the systems must be able to process new data quickly.
How much BI experience have you tallied?
I have worked in business intelligence for many years. What struck me at this conference was how little the messaging of the traditional vendors has changed and more importantly how ill suited they are to meet the above requirements. The limitations that organizations face around business intelligence are driven not by the limitations of the companies vision but more by the limitations of traditional technologies. In a world where it is a challenge for many organizations to meet simple requirements around query and reporting against operational data without huge investments you know that there are major issues with traditional technologies. The ability to meet these and many more requirements is Exalead’s advantage in business intelligence.
What’s your view of this trend that a customer can buy a one size fits all or a very narrow solution from the same vendor?
A customer can buy a one size fits all solution but only if the vendor has a one size fits all product. An appliance is not a one size fits all solution. The appliance becomes a spider in the center of a Web of customized code. An open source search solution is a box of components, a bit like the old Fast Search & Transfer technology. The licensee either assembles the solution or pays a lot of money for engineers to build the solution.
Don’t some vendors let marketing promise the world and then hope the engineers can code what’s just been sold?
Absolutely.
Some vendors have solutions that were designed to be easy to deploy for simple needs but customers hit the wall when they start to expand their requirements or push the product into other areas. Other vendors have more advanced capabilities but they take a huge amount of resources to deploy and lots of difficult customization, often with limited success. These more complex solutions tend not to be widely implemented outside of the core initial requirement.
At Exalead, it is striking how usage of an Exalead-enabled solution jumps. Many traditional information systems seem to turn off large segments of the user population in an organization.
What’s the angle for Exalead?
Our platform is unique in having the same core platform that works on a single laptop for desktop search that scales to millions of users and billions of documents on, for example, our showcase Web search site, by new media companies to provide next generation search based applications, by organizations to provide internal and external search and in ever increasing numbers by organizations to allow them to build agile solutions to retrieve mission critical data from operational databases through to business intelligence, data Warehouses and master data management.
I disagree. How can a single vendor handle the rigors of a foreign language search system with a system that lacks the technical support to deliver on what the marketing folks promise?
One of the frustrating things when I worked for some software vendors was that some prospective clients could not believe whether a capability in the product was reality or just an overblown marketing claim. Some vendors have and still make some unbelievable claims around the capabilities of their products. As people’s knowledge has not been as great around search as say traditional databases or business intelligence solutions, these claims have too often been taken on face value by customers and some analysts.
Why should I believe Exalead?
First, you know me, and you know that I focus on demonstrable evidence of the capabilities of a system.
Second, one of the refreshing things about Exalead is that our marketing is very conservative. Our marketing team never claims something that has not either come from an actual customer’s implementation or been passed directly by our engineers as a capability that the solution can and does deliver. It seems quite obvious but this is not how many marketing departments operate in the industry which has in the past been dominated by “snake-oil” marketing.
This doesn’t of course mean that we promise to deliver less than our competitors. It simply means that we have the proven technology to match our promises.
This is the end of Part 1 of the interview with Mr. Bentinck, Exalead. Part 2 appears on February 5, 2010.
Stephen E Arnold, February 3, 2010
I wrote this post without any compensation. However, Mr. Bentinck, who lives in a far off land, offered to buy me haggis, and I refused this tasty bribe. Ah, lungs! I will report the lack of payment to the National Institutes of Health, an outfit concerned about alveoli.
Exclusive Interview: Digital Reasoning
February 2, 2010
Tim Estes, the youthful founder and chief technologist, for Digital Reasoning, a search and content processing company based in Tennessee, reveals the technology the is driving the company’s growth. Mr. Estes, a graduate of the University of Virginia, tackled the problem of information overload with a fresh approach. You can learn about Digital Reasoning’s approach that delivers a system that “deeply, conceptually searches within unstructured data, analyzes it and presents dynamic visual results with minimal human intervention. It reads everything, forgets nothing and gets smarter as you use it.”
Mr. Estes explained:
Digital Reasoning’s core product offering is called “Synthesys.” It is designed to take an enterprise from disparate data silos (both structured and unstructured), ingest and understand the data at an entity level (down to the “who, what, and wheres” that are mentioned inside of documents), make it searchable, linkable, and provide back key statistics (BI type functionality). It can work in an online/real-time type fashion given its performance capabilities. Synthesys is unique because it does a really good job at entity resolution directly from unstructured data. Having the name “Umar Farouk Abdul Mutallab” misspelled somewhere in the data is not a big deal for us – because we create concepts based on the patterns of usage in the data and that’s pretty hard to hide. It is necessarily true that a word grounds its meaning to the things in the data that are of the same pattern of usage. If it wasn’t the case no receiving agent could understand it. We’ve figured out how to reverse engineer that mental process of “grounding” a word. So you can have Abdulmutallab ten different ways and it doesn’t matter. If the evidence links in any statistically significant way – we pull it together.
You can read the full-text of this exclusive interview with Tim Estes on the ArnoldIT.com site in the Search Wizard Speak series. You can get more information about Digital Reasoning from the company’s Web site.
The Search Wizards Speak series provides the largest collection of free, detailed information about major enterprise search systems.Why pay the azure-chip consultants for sponsored listings, write ups prepared by consultants with little or no hands on experience, and services that “sell” advertorials. You hear in the developer’s, founders, and CEO’s own words what a system does and how it solves content-related problems.
Stephen E Arnold, February 2, 2010
No one paid me to write about my own Web site. I will report this charitable act to the head of the Red Cross.
More Mainframe Woes; IBM STAIRS Where Are You?
February 1, 2010
“The Great Mainframe Shakeout”, which appeared in Ecommerce Times, had some harsh words for mainframes. For example: “IT budget planners are using the strident economic environment to force a harder look at alternatives to inflexible and hard-to-manage legacy systems, especially as enterprises seek to cut their total and long-term IT operations spending.” The write up seems to run counter to the PR that IBM has been generating about the importance of mainframes, their economic payoff, and their uptake in Namibia. The article includes a link to a podcast. Some gems from this article in my opinion were:
- An analyst recently looked me in the face and said, “People want to get off the mainframe. They understand now that the costs associated with it are just not supportable and are not necessary.”
- By not needing to touch the mainframe code or the business rules, we were able to complete this project in a period of six months, from beginning to end. The user tells us that they are saving over $1 million today in avoiding the large costs associated with mainframe software, as well as maintenance and depreciation on the mainframe environment. …
- Just within the past few months, there was a survey by AFCOM, a group that represents data-center workers. It indicated that, over the next two years, 46 percent of the mainframe users said that they’re considering replacing one or more of their mainframes.
These quotes come from different participants in the podcast, so listen to the audio before recycling their message.
Not a peep about the STAIRS and Search Master users? A platform switch means a new system will be needed. I received a New Year’s message from Exalead yesterday. That email explained that Exalead could handle industrial strength applications. Maybe Exalead can help those abandoning the mainframe for new, more economical methods? Why not walk the STAIRS to CloudView?
Stephen E Arnold, February 2, 1010
No one paid me to write about mainframes. I love mainframes. That’s why I will report not getting paid to the Pentagon, an outfit with some big iron according to the DC rumor mill.
Squiz Funnelback Releases New Version
February 1, 2010
The Australian firm Squiz Funnelback has released Version 9 of its enterprise search system. The article “Funnelback Version 9 Released – Includes New Reporting System, Pattern Detection and Numerous Feature Improvements” said:
Funnelback 9 includes new features and functionality improvements which enhance the performance and usability of its Internet and Enterprise search solutions, and enable organizations to use pattern based strategy to expose search query trends which could impact their business.
The new version includes a Reports Dashboard, a Pattern Detection and Alerting System, faster contextual navigation, improved spelling suggestions, and near duplicate detection.
You can get more information from the Funnelback Web site at http://www.funnelback.com. Squiz is an Australian slang term for have a look at something; for example, open source solutions.
Stephen E Arnold, February 1, 2010
No one paid me to write this article. I will report this fact to the Australian embassy when I am next in Washington, DC.
H5 and Its Classifier for Information Governance
February 1, 2010
I came across H5’s technology in the course of a medical information project. A while later, we encountered the firm’s technology in a government information application. My recollection is that the firm’s software works fine. I read the news item “H5 Introduces H5 EDGE Classifier for Enterprise Information Governance” because I did not understand the relationship between text metatagging and “information governance.”
Source: H5.
I learned that the acronym EDGE is short hand for “Electronic Data Governance Engine.” After reading further, I learned:
“Many organizations have invested in enterprise search, archiving, and other technologies in order to better manage information, but the return on these investments cannot be fully realized unless organizations can determine what to retain and what to discard with considerable certainty,” said Nicolas Economou, H5’s CEO. “If companies don’t have a method that assures principled, accurate, document-by-document decisions on information, they’re leaving themselves open to substantial risk and cost. Because the H5 EDGE Classifier provides a proven means to achieve accurate assessments, our clients benefit from significant and measurable reductions in data volume and in associated costs and risks.”
The firm’s PDF brochure said:
H5 EDGE Classifier is a document classification application that seamlessly integrates with and runs on top of organizations’ existing search and classification technology to more accurately cull, filter, and classify e-mail and other electronic documents to meet information governance goals.
I remain unclear about the meaning of the phrase “information governance.” If you are looking for a content processing system that classifies, take a look at H5. If you are in the hunt for an information governance engine, H5 may be your system of choice.
Stephen E Arnold, February 1, 2010
To the US government: I was not paid to admit my lack of understanding when it comes to “information governance”, which seems to be a bit of eDiscovery, a dash of records management, and a whole lot of buzzwording.
Fighting over Services: License Fees Faltering, Consulting Fees the Future?
January 30, 2010
I read the UK newspaper Independent’s “Oracle Claims Firm Stole Its Intellectual Property.” The byline says “Reuters”, which is okay with me. The point of the story is that Oracle is taking an outfit in the consulting and services business to the legal wrecking yard. The key passage in the write up in my opinion was:
Corp has filed a suit against a little known rival that provides low-cost software maintenance services, in a case similar to one that Oracle is fighting against rival SAP AG. The lawsuit, filed in US district court in Nevada on Monday, alleges that privately held Rimini Street stole copyrighted material using the online access codes of Oracle customers.
Oracle, a firm whose pricing model when I was a wee lad, hinged on pegging software license fees to hardware. The more hardware one threw at an Oracle implementation, the more the licensee had to pay. There were options, which have increased the service choices, but the main game was license fees.
The shift that is evident in the big enterprise software world is what I watched IBM do when Microsoft figured out how to catapult a clunky PC opportunity into a $70 billion empire. IBM has been forced to become a consulting firm. Now I know that IBM sells mainframes and mounts hearty public relations campaigns to convince me that mainframes are exactly what I need to run my business. But the main event is services and consulting or what I call “soft work”.
Oracle is facing the same problem even though the cause of Oracle’s woes is deeper than a bad deal with a teenager which changed IBM decades ago. This dust up strikes me as interesting for three reasons:
First, I think this Oracle battle over companies providing “soft work” related to Oracle products and software is motivated by Oracle’s desire to get high margin business. Upstarts and interlopers are in a space of interest to Oracle.
Second, the intellectual property angle is quite important in my opinion. What constitutes know how about a complex enterprise software system? Are the scripts that people post on a forum hosted by a vendor something that a third party could use?
Third, as Oracle chops staff from Sun Microsystems, I wonder if this adds more brainpower to the third party consulting firms. For example, I think I have heard talks by Sun engineers who suggested that open source software and commodity machines were cheap and fast enough. The expertise in Sun hardware’s limitations might be ideal for some service and consulting firms to exploit. How will these bits and pieces of expertise be managed?
In a broader sense, the Oracle actions are harbingers of what other enterprise software vendors will be forced to to. With disruption of traditional enterprise software business models continuing, companies will have to ramp up their services business. Growth via acquisition only works under certain conditions. Services is a vital component of revenue growth if a firm is to survive or avoid takeover.
In the search business, I expect to see more content processing vendors chasing services as well.
Stephen E Arnold, January 30, 2010
A freebie. I shall report this sad state of affairs to the Prospect, Kentucky mayor, a person who has not been able to build a bridge for two years. A consulting firm is assisting I believe. This is why services are a big business in the post crash America.
CMS and Karma
January 27, 2010
I found “SharePoint vs Alfresco vs Nuxeo” fascinating. I was unfamiliar with the “karma” scoring method, but it was a novelty. The write up takes a look at the SharePoint Swiss Army knife of content processing functionality and compares SharePoint to two open source content management systems. SharePoint gets low marks, and the reasons are one which resonate with me. For example, SharePoint is not a standalone system. Microsoft has engineered its Swiss Army knife to require its own digital butcher, a baker, and a candlestick makers. For example, the reviewer identifies seven or eight other Microsoft products needed to make SharePoint work its magic.
The review gives Alfresco high marks. Nuxeo comes in second. The hitch with any software for content management is that most of the systems are complicated, frustrating to users and system administrators alike, and prone to sluggish performance.
Unfortunately the reviewer does not pay much attention to search. That is understandable. A CMS is designed to make content. Finding a piece of content comes along later. With CMS systems frustrating some users, a lousy search system is not a big surprise.
I liked the write up. I don’t like CMS whether proprietary or open source. I enjoy search projects that require the goslings and me to help users locate content objects. Fascinating to watch licensees struggle with these complex systems that make it difficult and expensive to find their own information.
Fascinating. An appropriate word.
Stephen E Arnold, January 27, 2010
A freebie. No one paid me to reference this interesting review. I will report this to GSA, an outfit with deep experience with content management systems.
Most Fantastic Microsoft Fast Interview Ever
January 27, 2010
I don’t know much about Fierce Media, but I do like the word “fierce”. I read a story / interview produced by Fierce. The write up was “One on One with Jared Spataro of Microsoft.” I noted several interesting (almost unusual) points in the article. Let me highlight each, and urge you to read the interview in its entirety:
- The head of Microsoft Fast enterprise search worked for a “leading content management vendor”.
- “SharePoint 2010 will ship with a fantastic search experience.”
- “FAST Search for SharePoint customers will get a great general productivity search experience that is integrated with SharePoint out-of-the-box, but they’ll also be able to use the advanced capabilities of the platform to build and deploy sophisticated search-enabled applications.”
- Our top-tier search solutions are all built on the FAST Search core, and over time FAST will become a common foundation for all of our products
- A great enterprise search system needs to connect to everything and be accessible from everywhere. Out-of-the-box integration with SharePoint provides immediate value for many customers, but we’ve designed our products so that they can be embedded in any user experience and can index content living in any location.
That’s enough. My observations:
- CMS experience is not exactly standard preparatory work for enterprise search in my experience. Most CMS don’t work very well. Well, maybe there will be some transference?
- I love the word “fantastic”. I used it in the headline for this write up.
- I love the word “great”. I love the phrase “out of the box” for a toolkit. I love the “common foundation for all of our products.” That’s a categorical affirmative, and I find that “fantastic” and “great”, just not accurate.
- I love “top tier”. The adjectival phrase sounds so top-tier. But nothing can beat using Fast as “a common foundation for all our products.” Two categorical affirmatives for a search engine. Wow.
- I love the repetition of “great”. Very poetic. I love “everywhere.” Another categorical. I love “any location.” Another categorical.
Sounds fantastic and great. Oh, “all”, of course.
Stephen E Arnold, January 29, 2010
No one paid me to write about this article, its word choice, and its penchant for categorical affirmatives. I am not sure which US government agency has responsibility for logic. Maybe the GAO? I will report a freebie to that fine group. A means “accountability”, not accounting.
Palantir Describes Lucene Searching with a Twist
January 27, 2010
If you do work in law enforcement, financial services, or intelligence (business or governmental), chances are high that you know about Palantir. The firm provides sophisticated data analysis and analytics tools for industrial-strength information jobs.
The company published in August 2009 and October 2009, a discussion of its approach to search and retrieval. I had occasion to update my file about Palantir technology, and I reviewed these two write ups. Both appeared in the Palantir Web log, and I thought that the information was relevant to some of the issues I am working on in 2010.
The first article is “Palantir: Search with a Twist (Part One: Memory Efficiency).” In that write up, the company points out that it uses the “venerable Java search engine Lucene.” Ah, open source, I thought. Palantir’s engineers encountered some limitations in Lucene and needed to work around these. The article explains that Palantir addressed Lucene’s approach to accumulating search results with a priority queue, streaming through results and inserting into the queue, and returning the set of results in the priority queue. The first article provides a useful summary of the Palantir method.
The second article is “Palantir: Search with a Twist (Part Two: Real-Time Indexing and Security).” This write up explains two approaches Palantir explored to deal with what the company calls “leaking information; namely that there’s data on this object that the user making the query is not privy to.” The write up says:
Given this problem, there are two approaches one can take: [1] Store all the information needed to decide which labels are visible to the user running the query and then use only the visible labels when calculating the relevance of a match. Note that is a pretty expensive operation. [2] Don’t use the length of match to compute relevance. Note that skipping a relevance calculation is, obviously, a very cheap thing do. Which do we do? Both.
I recommend that anyone wrestling with Lucene to take a look at these two articles. A third installment has been promised but I have not yet seen it.
Stephen E Arnold, January 27, 2010
A free search engine warrants a free post. No one paid me to write this. I will report this sad fact to the Department of Labor.

