Business Intelligence: Optimism and Palantir
June 28, 2010
Business intelligence is in the news. Memex, the low profile UK outfit, sold to SAS. Kroll, another low profile operation, became part of Altegrity, anther organization with modest visibility among the vast sea of online experts. Now Palantir snags $90 million, which I learned in “Palantir: the Next Billion Dollar Company Raises $90 Million.” In the post financial meltdown world, there is a lot of money looking for a place that can grow more money. The information systems developed for serious intelligence analysis seem to be a better bet than funding another Web search company.
Palantir has some ardent fans in the US defense and intelligence communities. I like the system as well. What is fascinating to me is that smart money believes that there is gold in them there analytics and visualizations. I don’t doubt for a New York minute that some large commercial organizations can do a better job of figuring out the nuances in their petabytes of data with Palantir-type tools. But Palantir is not exactly Word or Excel.
The system requires an understanding of such nettlesome points as source data, analytic methods, and – yikes – programmatic thinking. The outputs from Palantir are almost good enough for General Stanley McChrystal to get another job. I have seen snippets of some really stunning presentations featuring Palantir outputs. You can see some examples at the Palantir Web site or take a gander (no pun intended by the addled goose) at the image below:
Palantir is an open platform; that is, a licensee with some hefty coinage in their knapsack can use Palantir to tackle the messy problem of data transformation and federation. The approach features dynamic ontologies, which means that humans don’t have to do as much heavy lifting as required by some of the other vendors’ systems. A licensee will want to have a tame rocket scientist around to deal with the internals of pXML, the XML variant used to make Palantir walk and talk.
You can poke around at these links which may go dark in a nonce, of course: https://devzone.palantirtech.com/ and https://www.palantirtech.com/.
Several observations:
- The system is expensive and requires headcount to operate in a way that will deliver satisfactory results under real world conditions
- Extensibility is excellent, but this work is not for a desk jockey no matter how confident that person in his undergraduate history degree and Harvard MBA
- The approach is industrial strength which means that appropriate resources must be available to deal with data acquisition, system tuning, and programming the nifty little extras that are required to make next generation business intelligence systems smarter than a grizzled sergeant with a purple heart.
Can Palantir become a billion dollar outfit? Well, there is always the opportunity to pump in money, increase the marketing, and sell the company to a larger organization with Stone Age business intelligence systems. If Oracle wanted to get serious about XML, Palantir might be worth a look. I can name some other candidates for making the investors day, but I will leave those to your imagination. Will you run your business on a Palantir system in the next month or two? Probably not.
Stephen E Arnold, June 27, 2010
Freebie
OpenText Nstein: Confusing Information Surfaces
June 28, 2010
Update, June 29, 2010:
Quite a flurry of comments from OpenText about this post. This citation turned up in my newsreader and I could not figure it out. In fact, I pointed out that the article was confusing and probably an error in a content management system. Nevertheless, I think that vendors of content management systems need to make certain that their date and time stamp functions are operating correctly. If a crash forces a system restore, I think it is useful to put clear date markers on restored documents. If this tagging is not applied or in some way flawed, newsreaders snap up content and happily shovel it to people like me with a current date and time stamp. My suggestion is to work with the source of the write up. I don’t do “news”; I point to sources that are available in open source. My opinions are clearly marked. In this particularly article, I point out that when glitches like this occur, competitors can point to the write up and raise questions about clarity. I reproduced the content and provided a link to the source. I did not create the 2003 gobbledegook; I just alerted my two or three readers to the issue. The problem originated with an outfit doing publishing as Asset Management Software. No date but BuddyPress, identified with the source article, might be the outfit with which OpenText wishes to speak. Or, in the language of the source article I used: “New guided navigation module: navigation NretrieverNretriever is a powerful tool for research, which brings a direct connection with the search experience for end users.” Confusing in my opinion. Also, note the date in the url, gentle reader: http://asset-management-software.bloghubpage.com/2010/06/12/asset-management-software-nstein-introduces-version-3-0-of-its-award-winning-content-management-platform-nserver-suite/. I put the date in bold.
This sure seems like a current date to me.
The point is that content management vendors deliver products that can be used to generate data that lacks useful metadata and produce pages that spiders and addled geese see as “current.” When a vendor is in the content management business, perhaps looking at the cause and not the effect are useful exercises?
Original Post: June 28, 2010 below:
Two companies that strike me as pioneers in moving beyond search are Autonomy and OpenText. I don’t want to take sides. In the last two or three years, the firms have been pursuing somewhat similar strategies. Both have pushed from search into specialized markets such as eDiscovery. Both have information retrieval technologies gathered from acquisitions. Both are no longer properly classified in my opinion and search and retrieval specialists. The companies offer a wide range of information services. Both have blown past first Microsoft Fast and then Endeca. OpenText snapped up the gasping Nstein for something like $0.65 on the dollar. Under the broad wing span of OpenText, Nstein has rolled out Version 3.0 of what it calls “its award-winning content management platform.” You can get more details in the write up “Asset Management Software: Nstein Introduces Version 3.0 of Its Award Winning Content Management Platform Nserver Suite.” Quite a title and probably good spider food. But I don’t know what Nstein is * really * delivering. Customers may not know either.
For me I found this passage quite interesting:
nStein Technologies Inc…, a global leader in unstructured content management solutions, today announced at the annual conference of the Special Library Association (SLA) version 3.0 of its award-winning content management platform, September nserver concept nStein extraction, categorization, organization provides production began, seals and restart guided navigation modules.
Must be a glitch in the content management system.
I also noted:
- The use of the phrase “guided navigation”. Endeca has been closely associated with facets and “guided navigation” may catch that company’s attention
- A reference to Nretriever as a “feeding technology.” The word “Nretriever” suggests a query and a results list to me, not a feed or stream of content. Maybe the writer wanted me to think of an alert pushed to me via email?
- A description of Ncategorizer that “includes the improvement of classification.” I am not sure if the product improves a previous Nstein system or improves the performance delivered by a competitor’s system.
The write up includes some links to information for me to read. Two links date from 2002 and 2003 and not from the post acquisition period in which I have an interest. The third link is more current but I did not see any mention of Nstein. The other links are circular; that is, pointing back to the article that caught my attention.
I am baffled. I am not sure if this is a legitimate write up about OpenText / Nstein or an error due to a flawed editorial system or process.
With promotional announcements like this one, Autonomy is almost certain to lick its chops and begin to think about taking a bite out of OpenText / Nstein’s marketing messages.
Stephen E Arnold, June 28, 2010
Freebie
Podcast Interview with Paul Doscher, Part 3: Exalead and User Experience
June 28, 2010
Exalead’s Paul Doscher talks about Exalead and user experience, sometimes shortened to “UX” on the June 28, 2010, ArnoldIT Beyond Search podcast. Exalead, now part of the large French software and services company Dassault, is entering a new phase of growth. (You can read about this tie up in “Exalead Acquired by Dassault” and “Exalead and Dassault Tie Up, Users Benefit.”
In this podcast, Mr. Doscher talks about Exalead’s technical approach to enabling licensees to use a wide range of graphical user interfaces and display conventions. The Exalead user experience approach makes it possible to support iPhone-type interfaces and presentations tailored to the needs of a particular user or workgroup.
You can listen to the podcast on the ArnoldIT.com Web site. More information about Exalead is available from www.exalead.com. The ArnoldIT podcast series extends the Search Wizards Speak series of interview beyond text into rich media. Watch this blog for announcements about other rich media programs from the professionals who move information retrieval beyond search.
Stephen E Arnold, June 28, 2010
Sponsored by Stephen E. Arnold
EMC Beefs Up Its Content Processing
June 27, 2010
Data collection agency EMC, http://www.emc.com, has moved to build a platform for expanding business in the future, thanks to a recent partnership inked with low-profile legal discovery company Applied Discovery. Rumor has it that EMC learned about search via a marriage and divorce with the Fast Search & Transfer technology. The most recent move is to create a comprehensive service by blending SourceOne eDiscovery-Kazeon with the case discovery review power of Applied Discovery’s process and review engine. EMC started out as large storage vendor, and they bought Kazeon.Will the result be a complete solution for indexing and searching large data stores? EMC hopes this is the findability fix.
Patrick Roland, June 27, 2010
Freebie
Search Vendors Try New Sales Hooks
June 25, 2010
Forget the surveys that companies run to make clear the problems in information access. Anyone who looks for information today knows that pinpointing information to answer a business question is not exactly bulletproof. Recommind, once a vendor anchored in the legal market, stretched its wings into the enterprise. My recollection is that some of the company’s technology reminded me of Autonomy’s original approach. Now Recommind seems to be pushing into a different space, one that combines indexing, risk management, some MBA speak, and a dash of legal lingo. Navigate to “Disconnect Between Legal and IT Getting Worse, Recommind Survey Reveals.”
In my experience, information technology organizations are definitely disconnected from most of the corporate functions. I don’t think IT is at fault. IT departments are trying to protect themselves from what I call “requests from the clueless.” I know business managers are under pressure. CFOs are wild eyed in their efforts to cut costs and maximize returns. The top executives are scrambling to find ways to buy their private island, get a new BMW, and create a life without BP scale risks, bloggers, and 20somethings who want to make their bones on the corpses of today’s market leaders. Many managers see a demo or chat with pals at the country club and come to the office on Monday with requests that are essentially impossible for an IT department to meet with available resources.
What’s the Recommind survey purport to tell me? IT and legal eagles are operating on different wave lengths. I need a survey to tell me this. I don’t even operate on the same wave length as my two attorneys and I pay these guys to try and help me. For me, here’s a quote that reveals more about client management and vendors than about IT departments:
At a time when e-Discovery and regulatory issues are gaining momentum, these results don’t exactly instill confidence across the enterprise.
Here’s my view of the situation:
- Certain vendors of search technology have to find a way to make sales to keep the money pipe full. The options are market like the devil or go to Satans’ spawn and get more funding. Which path would you take? I vote for marketing. I think these types of surveys are marketing efforts and when the results are released, I know the data are viewed by the survey sponsor as a way to generate sales leads.
- Obviously plain vanilla search is not a hot ticket. I think I was one of the first people to explain that search was dead in my Searcher article for Barbara Quint four or five years ago. No search vendor is going to bridge the gap between IT and the many over stressed units in an organization. Successful vendors find ways to solve problems, not tackle the management tensions that are human centric organizational issues.
- The new lingo does not convince me that content processing software can address deeper issues with management and governance.
You may have a different view, so read the survey results. Many search vendors have marketed themselves into a corner. Now organizations have to find solutions to information access problems. I don’t think there is much margin for error. Sure, some assert the economy is improving. That’s wonderful. But the glory days of search marketing are behind us, and I think more than catch phrases, house surveys, sponsored white papers, and fawning azure chip consultants will be needed.
Here’s my checklist for starters:
- Demonstrations that solve a problem
- Clear statements of what a findability-centric software system can and cannot do
- Avoidance of MBA crazy talk, jargon, unsupported assertions, and faux case analyses
- Partnerships that give a prospect confidence that the system can be made to work at a reasonable cost in a reasonable period of time
- Focus on solutions. Search and content processing vendors are not blue chip management consultants, never will be and probably cannot afford the ministrations of Bain, Booz, Boston Consulting, or McKinsey and, therefore, have little first hand information of what is required to tackle management challenges in an organizations.
Many search vendors are scrambling for a new sales hook. What approach will work? No clue have I.
Stephen E Arnold, June 25, 2010
Freebie
Real Time Search Systems, Part 2
June 22, 2010
Editor’s note: This post tiptoes through the tulips. In this instance, tulips is a synonym for industrial strength content processing systems that can be licensed by commercial entities. governmental organizations, or individuals who want to become a baby Fuld or Kroll. Achieving this type of azure chip transcendence means that you will be a hit at the local bingo parlor when you share your insights with your table mates.
Industrial Strength Tools
The free services don’t provide the user with much in the way of post processing horsepower. Another weakness of free services is that the average user deals with what each system spits out in response to a click or a query. The industrial strength systems provide such functions as:
A system or method for “plugging” in different streams of content. Examples range from electronic mail in the wonderful Microsoft Exchange Server to proprietary content stuffed into a clunky content management system. These connectors are a big deal because without different inputs of content, a real time search engine does not have the wood to burn in the fire box.
Each system provides or supports some type of software circuit board. The idea is that the content moves from the connectors over the circuits on the circuit board to its destination. Acquired content must be processed so its first destination is a system or systems which extract data, generate metadata, and, in the case of Google, figures out the context of the message. The result is an index that contains index terms, metadata, and often such extras as a representation of the source message, precalculated values, and new information constructs.
Applications or “hooks” that make it possible for another software program to tap into the generated values and processed content to create an output. Now the outputs can vary widely. Another software system may just look up an item. Another software application might glue together different items from the index and content representation. The user sees a report, a display on a mobile phone, or maybe a mashup which allows the human to “recognize” or “spot” what’s needed. No searching required.
The Vendors
In my lectures I mentioned some different outfits in each of my two talks. I have rolled up the vendors in the list below. My suggestion is to do some research about each of these companies. I provide “additional color” on the technologies each vendor licenses, but that information is not going to find its way into a free blog posting. Problem? Read the About information available from the tab at the top of this page.
- Exalead http://www.exalead.com Robust system which handles structured and unstructured data. Outputs may be piped to other enterprise software, a report, or a peripatetic worker with a mobile phone in Starbucks.
- Fetch Technologies http://fetch.com Developed initially for certain interesting government information needs, you can customize Fetch using its graphical programming method and perform some quite useful analyses
- JackBe http://www.jackbe.com Developed initially for certain interesting government information needs, you can license JackBe and process a wide range of content.
- Silobreaker http://www.silobreaker.com Developed initially for certain interesting government information needs, you can output reports that are as good as the roll ups crafted by a trained intelligence professional.
What do these systems do in “real time?” Each of them, when properly resourced, can ingest flows of data and unstructured content, assign metadata, and output alerts, reports, or Google-style search results within minutes of the content becoming known to the system.
HMV Shifting to Endeca
June 21, 2010
The HMV Web site is, according to the rumor mill, shifting to a new search platform. Years ago, I heard that the UK music retailer relied on Dieselpoint’s system. In a brief blog post by RossBoardman.com with the title “HMV Adds Endeca Search Engine to Its Web Site”, I learned that a shift may be underway. I have not been able to confirm this change, but moving from one search system to another is almost as common than seventh graders changing their minds. If the rumor is true, Endeca’s strategy of partnering with a wide range of integrators and resellers may be bearing some fruit. You can learn more about Endeca at www.endeca.com. I hope the pink color yields to a color that is closer to the hues of an addled goose.
Stephen E Arnold, June 19, 2010
Freebie
Oracle and SAP Chase Big Data Rainbow
June 19, 2010
“Oracle, SAP Working on Exadata Support” struck me as interesting for three reasons. First, if you are managing one of these super scale storage gizmos, you have some challenges on your hands. Second, Oracle and SAP know that leasers’ of these storage devices have those problems and aim to cash in on the situation. And, third, the outfit that figures out how to make these gizmos work will have bragging rights in the hyper-expensive, enterprise storage market.
High stakes indeed.
Why are big data a problem? Many reasons. The obvious one is that big data take “time” to transform, manipulate, and crunch. The good news is that the problem can be solved by buying more Exadata database machines. Better yet, SAP wants you to buy a Sybase gizmo. The bad news is that adding machines creates more management hassles for the engineers. The less obvious one is that an Exadata gizmo is not one of the slick No SQL solutions that rely on lower cost methods. The good news is that a Fortune 100 company may not trust No SQL or not know much about a No SQL solution. The bad news is that today’s Fortune 100 company could become tomorrow’s employment grave yard. Quiet places, graveyards. The spat between the two companies is not interesting to me. Squabbles that most people don’t understand are good for the azure chip crowd and bloggers. Regular folks, not so much.
The most interesting comment in the write up was:
SAP is now supporting Oracle Database 11g R2, for applications that use SAP kernel 6.40, 7.x and beyond. The companies’ practice has been to delay certifying Oracle’s database releases until the second iteration, a process that minimizes upgrade chores for customers.
This is a service-for-a-fee game. I am not sure the needs of the customer are front and center. The gizmos, the complexity, and the support are the main event.
Stephen E Arnold, June 19, 2010
Freebie
Some Tips for Oracle Text Users
June 18, 2010
Lucky you! Your task is to use Oracle Text to search content in your Oracle Database. Oracle provides some documentation, and you can spend many pleasant hours chasing down the documentation and specific information you need to make Oracle Text do your bidding. If you are short on time, point your browser at “Setting Up Oracle Database’s Oracle Text” by Steven Callan. Mr. Callan does a good job of explaining how to get from A to B. The most significant parts of the write up in my opinion are the examples. You can learn where Oracle tucks the search system when you install the Oracle database. He provides specific information for figuring the schema of the table. Of particular value is the code example for using different index types. Recommended. If you want to crack the performance and index update problems, you may have to do some digging on your own.
Stephen E Arnold, June 18, 2010
Freebie
Cut That Security Budget, Says Azure Chip Consultancy
June 17, 2010
Now I don’t know about you but when one fires up a modern day search and content processing system, the licensee has to have its security system in World Cup form. Active Directory is a popular method. Some search systems put their moist noses in the air, sniff the Active Directory settings, ingest them, and happily index content. Then when a user runs a query, the search system respects the Active Directory security settings. The idea is that a user with certain permissions can see only the content to which that person has access. Goof up the security and permissions and you have addled geese looking at golf club contributions, drafts of documents related to some hush hush matter, or personal information about that last visit to the local doctor.
I read “Enterprises Advised to Reduce IT Security Budgets” and wondered if the headline were a typographical error. Nope. The azure chip outfit Gartner apparently recommends “a three percent cut as economic situation improves.” What? The economy is improving so cutting a security budget is a recommendation. What about exposing those contract terms to eyes not authorized to see them? What happens if medical information seeps into search results when an employee is looking for information about the company picnic? What happens when the financial details of the Board of Directors’ golf outing finds its way into the hands of a committee working on reduction in force issues?
You should navigate to this article and read it for yourself. For me the most interesting comment in the write up was:
Vic Wheatman, a research director at Gartner, explained that the average percentage of IT spending on security in 2010 is five per cent, down from six per cent last year. “In 2009, in the face of a significant IT spending downturn, security spending grew slightly as a percentage of the IT budget, while many other IT spending areas were gutted,” he added. “With the economic situation projected to improve in 2010, organizations are ramping up investments in other spending areas faster than they are for IT security.”
I am not sure I am comfortable with this recommendation or the analysis itself. But I am an addled goose. The crazy stuff I write is a direct result of drinking mine run off effluent. The news story may be the flight of fancy of an azure chip marketing person.
For me, I will keep my spending for security at its pre crash level, thank you. The risk of creating a more costly problem by chopping security spending is too high for my operation. Your mileage may differ. In that case, rely on the “real” consultants at the azure chip outfits. My unsolicited opinion: Avoid Harrod’s Creek.
Stephen E Arnold, June 17, 2010
Freebie.

