Enterprise Search: Baloney Six Ways, like Herring
December 21, 2010
When my team and I discussed my write up about the shift of some vendors from search to business intelligence, quite a bit of discussion ensued.
The idea that a struggling vendor of search—most often an outfit with older technology—“reinvents” itself as a purveyor of business intelligence systems—is common evoked some strong reactions.
One side of the argument was that an established set of methods for indexing unstructured content could be extended. The words used to describe this digital alchemy were Web services, connectors, widgets, and federated content. Now these are or were useful terms. But what happens is that the synthetic nature of English makes it easy to use familiar sounding words in a way to perform an end run around the casual listener’s mental filters. It is just not polite to ask a vendor to define a phrase like business intelligence. The way people react is to nod in a knowing manner and say “for sure” or “I’ve got it.”
Have you taken steps to see through the baloney passed off as enterprise search, business intelligence, and knowledge management?
The other side of the argument was that companies are no longer will to pay big money for key word retrieval. The information challenge requires a rethink of what information is available within and to an organization. Then a system developed to “unlock the nuggets” in that treasure trove is needed. This side of the argument points to the use of systems developed for certain government agencies. The idea is that a person wanting to know which supplier delivers the components with the fewest defects needs an entirely different type of system. I understand this side of the argument. I am not sure that I agree but I have heard this case so often, the USB with the MP3 of the business intelligence sound file just runs.
As we approach 2011, I think a different way to look at the information access options is needed. To that end, I have created a tabular representation of information access. I call the table and its content “The Baloney Scorecard, 2011.”
SAS Juices Up Text Mining
December 20, 2010
SAS has updated their Predictive Analytics & Data Mining page, and of particular note is their updated version of SAS Text Miner, which can be used to grasp trends from unstructured text without the user having to be familiar with the contents.
Text Miner “provides complete views and meaningful insights within an integrated predictive modeling environment. Automating manual comprehension of the textual data sources, incorporating interactive drill-down reporting, and delivering algorithms for rigorous advanced analyses make it possible to grasp future trends and act on new opportunities more efficiently and with less risk.“ The 4.2 version includes not only a high-performance search capability, but also enhanced spell-check and the processing of multiple topics for each document and includes new text parsing, topic, and filter nodes.
The difference of SAS Text Miner versus any other text mining solution is that SAS has the best data mining algorithms and the simplest interface for managing and importing data, and SAS integrates its text mining capabilities into its data mining solution better than anyone else.
Alice Wasielewski, December 20, 2010
Salesforce and Chatter
December 20, 2010
I was surprised when Salesforce hired a former journalist to push collaboration and attention to Salesforce staff and clients. I was interested in ZDNet’s alliterative touch in Chatter Free in “With Free Version, Salesforce Chatter Changes Collaboration, Communication Processes for Companies.” Chatter Free premiered at the Salesforce’s Dreamforce conference. It is a free social messaging platform that can be used by any employee within a company. It has features similar to Twitter and Facebook and endeavors to be used in atypical environments. More than 60,000 professional companies use Chatter at the moment.
“What was especially interesting was to hear the customers talk about the significance of Chatter when it comes to collaboration within the organization. Some spoke of launching Chatter as an experimental service and getting immediate feedback from employees who were worried that the experiment would end. Others suggested that an employee without access to Chatter is like having an employee without access to e-mail – unable to communicate and collaborate in real-time.”
Employees have found they are e-mailing less and Chatter has increased work productivity. To many, Chatter is becoming the preferred method of communication in business organizations. However, information sharing in the Intel community may be moving in the opposite direction. Just a reminder that point of view is important. And what’s that old adage, “Loose lips sink ships”? What could leak from an enterprise social cloud service? Nothing.
Whitney Grace, December 20, 2010
Freebie
Arnold Comments about Exalead
December 20, 2010
A couple of times a year, I make a swing through Europe. I visit vendors, get demos, and talk with engineers about the future of search. In Paris on November 30, 2010, I answered questions about my views of Exalead. As you know, Exalead is a unit of Dassault Systems, one of the most sophisticated engineering firms in the world. You can get my view of Exalead by navigating to this link. Here’s an example of the observations I made:
“Exalead delivers applications that fit seamlessly and smoothly into customer workflows,” said Arnold. “When I spoke with Exalead customers I heard only: ‘This system works,’ ‘It’s easy to use,’ ‘It’s stable,’ and ‘I don’t have to chase around.”
In the interview, I point out that Exalead’s engineering makes it possible to embed search and information access in applications. Instead of using key words to unlock the information in a traditional search and retrieval system, Exalead makes the needed information available within existing work flows and applications. Access extends across a full range of content types and devices, including smart phones.
I have tracked Exalead for a number of years, and it continues to distinguish itself in information access by going “Beyond Search.” Here at Beyond Search we use the Exalead platform for our Overflight service.
Stephen E Arnold, December 20, 2010
The Exalead engineering team bought me lunch, a plus in Paris. Too bad about the snow and ice, though.
Netvibes Dashboards and Search
December 19, 2010
The San Francisco Gate gives us another story about dashboards: “Introducing Netvibes Dashboard Intelligence Solutions: Business Intelligence Reinvented for the Real-Time Web.” Netvibes has invented the Dashboard Intelligence solution, a dashboard programmed with features, including SmartTagging, to collect, interpret, and organize real time information for businesses. Netvibes’s advertising declares that the dashboard will save time, generate usable, current data, and keep businesses abreast about all social media information. The SmartTagging feature is how most of these actions will be accomplished.
“SmartTagging can capture hidden value generated by an infinite number of everyday work activities. Users won’t need to learn any complex new tools–they will soon be able to simply click and tag anything they access online with their personal sentiment and share their expertise with the entire organization.”
SmartTagging will then distribute this information to other personnel, who then can comment, and their additions will be sent out. This creates a cyclical process, augmented by new, real time information that keeps being fed into the system. I wonder if there will any repeated information or systems will get overloaded. Conclusion: do these dashboards actually make information access easier or harder in your opinion? Or, do dashboard provide a better user experience with the data pre-processed and ready to consume without critical thinking?
Whitney Grace, December 19, 2010
Freebie
IBM Chases Predictive Analytics Opportunities
December 18, 2010
IBM was once a top technology provider but over the last few years it seems to have lost its oomph, maybe even a decline.
According to the Thomas Net News “New IBM Predictive Analytics Software Personalizes Customer Relationship Strategies,” IBM seems to be trying to bounce back with its new predictive analytics software. IBM attempts to get involved in the social media world and promises that with its SPSS Modeler “users can uncover and analyze information from social media sources, such as social networks and blogs and then merge that with internal data for accurate insight and predictive intelligence.”
More importantly companies could then use the data to better understand their customer fan base as well as for marketing and product development direction. Data analytics providers and the social media world are flourishing and it seems that IBM is trying to enter the game. However, it’s likely that IBM will be benched and forced to watch from the sidelines.
At the same time, SAS appears to be ramping up its effort in this sector as well. The battle of the statistics superstars in underway. Maybe a cable TV reality show here, gentle reader?
April Holmes, December 18, 2010
Freebie
Merging of Lucene Solr Reported
December 17, 2010
A reader sent me a link to “Lucene and Solr Development Merged.” We are working to track down the details, but I wanted to capture the news item. In addition to the development merger, the write up references Riak Search. Here is the passage that caught my attention:
With merged dev, there is now a single set of committers across both projects. Everyone in both communities can now drive releases – so when Solr releases, Lucene will also release – easing concerns about releasing Solr on a development version of Lucene. So now, Solr will always be on the latest trunk version of Lucene and code can be easily shared between projects – Lucene will likely benefit from Analyzers and QueryParsers that were only available to Solr users in the past. Lucene will also benefit from greater test coverage, as now you can make a single change in Lucene and run tests for both projects – getting immediate feedback on the change by testing an application that extensively uses the Lucene libraries. Both projects will also gain from a wider development community, as this change will foster more cross pollination between Lucene and Solr devs (now just Lucene/Solr devs).
Riak Search is described in “Riak 0.13, Featuring Riak Search” and “Riak Search and Riak Full Text Indexing”.
The primary information appears on the Riak Web site in a Web page titled “Riak Search.”
Riak Search uses Lucene and features “a Solr like API on top.” According to the Basho blog’s article “Riak 0.13 Released”:
At a very high level, Search works like this: when a bucket in Riak has been enabled for Search integration (by installing the Search pre-commit hook), any objects stored in that bucket are also indexed seamlessly in Riak Search. You can then find and retrieve your Riak objects using the objects’ values. The Riak Client API can then be used to perform Search queries that return a list of bucket/key pairs matching the query. Alternatively, the query results can be used as the input to a Riak MapReduce operation. Currently the PHP, Python, Ruby, and Erlang APIs support integration with Riak Search.
The story “Riak 0.13 Released” provides additional information, including explicit links to download Riak 0.13 and Riak Search for a variety of platforms.
At first glance, Riak Search makes search and retrieval available to NoSQL data stores like the Basho Riak open source scalable data store.
A number of questions require some further data collection and consideration:
- Will other NoSQL implementations “bundle” or “snap in” a search component?
- What are the technical considerations of this approach to search in NoSQL data stores?
- Are there any performance or scaling issues to consider?
The blending of the Lucene Solr merging story with the Riak Search information caught us by surprise. Time to flip through the Rolodex to see whom we can call for more information. If a reader has additional insight on these two items, please, use the comments section of the blog to make the information available to the other two readers of Beyond Search.
We did a bit of sleuthing and wanted to pass along that Riak may be using some of the Lucene/Solr analyzers. One view is that the indexing and search code may not be Lucene based. The implication is that scaling and performance may be an issue. Faceting and group may also be an issue. Without digging too deeply into the innards of Riak Search, we suggest you do some testing on a suitable data set or corpus.
We located some information about Solr as NoSQL. You can find that information on the Lucid Imagination Web site at this link.
Stephen E Arnold, December 17, 2010
Freebie
Why Big Name Enterprise Search Is So Costly
December 17, 2010
“How Much Time Out of Your Day Does IBM Waste?” is about IBM’s WebSphere Application Server and related components. The author does a good job of explaining how undocumented dependencies and bugs suck up his work time. Of course, a company that relies on IBM technology has made a business decision that probably had little to do with the challenges the firm’s technical professionals must face on an on going basis.
Here’s a passage from the write up that caught my attention:
The sad thing is that RAD is nothing but Eclipse, weighted down with IBM plugins and I love Eclipse. The latest release, Helios, is one of the nicest IDEs that you can use and it is totally free. It does everything RAD can do and it leaves a lighter foot print…
If this is an accurate statement, it shines a bright light on IBM’s use of open source technology. I am not sure I enjoy what the light shows me. Like many other 66 year old geese, I prefer the stage illusion of a well-oiled machine and its bullet proof engineering. Reality is often different from the marketing collateral I suppose.
The other passage I downloaded to my IBM file was this one:
No one ever calculates the lost productivity when the consider IBM products and really no one looks at the amount of money spent either. There are plenty of open source solutions that are faster, easier to configure and support is a Google away. My preference is to use Tomcat. Since every sane developer pretty much uses Spring anyway, Tomcat is the perfect choice and it is easy to support and maintain. JBoss is another great choice if you must have more J2EE container features, but again, by using Spring, they are mostly unnecessary.
Lost productivity. That means money. And when chief financial officers look to reduce costs, will the beancounter’s eyeballs focus on the expenses (both direct and indirect) that some large vendors’ software imposes? I know the answer is, “It depends.”
And enterprise search?
OmniFind 9.x is based on open source technology. I did this mental calculation: What’s the cost of direct and indirect engineering associated with a full IBM-centric search system? I ran through the costs of the hardware, field replaceable units, engineering support, and maintenance for WebSphere, OmniFind, and training for the bits and pieces? How much?
A lot. What got me thinking was that IBM is using open source to generate revenue for its high margin businesses like consulting, engineering support, and maintenance.
The point of the Jeviathon article was that he wanted to use other, lower cost tools, but the IBM commitment locks in certain technical challenges and, of course, the revenue for IBM from services.
After reading Jeviathon’s article, I formed a different impression of IBM’s commitment to open source. Thinking about Oracle’s stance on open source, I concluded that open source may be a stalking horse. If big name search vendors follow in IBM’s footsteps, the deployments have built in costs that may be difficult to control.
Big time search solutions are expensive because they are designed to generate a revenue stream for the vendor. No problem with that, of course. I like the idea of open source software providing the base and then the vendor wrapping the solution in Velcro so the hook dig in and keep the money flowing from the client to IBM. Would IBM take such actions to generate revenue? I don’t know, and it is an interesting hypothesis to consider.
Stephen E Arnold, December 17, 2010
Freebie
Funnelback Feature List Slideshow
December 17, 2010
We’ve unearthed a slideshare.net document worth mentioning: the Funnelback Enterprise Search Features list.
Acquired by the open source software services company Squiz in 2009, Funnelback is an Australian-based enterprise search engine and services company with a client list including universities, government agencies and large corporations spanning three continents. In Funnelback’s own words:
“Our technology is used to search information across the breadth of an organization. We offer externally hosted search solutions as well as in-house server installed solutions and consultancy services. We search across websites, intranets, portals, databases, fileshares and many other data sources. Our feature rich, high powered, customizable, search engine allows organizations to find accurate information quickly and easily.”
For a concise overview of what Funnelback offer, visit the link above to the four page features list. Whether you are interested in the particulars of its search features, query language, results & reporting or security, amongst even more categories, it’s all organized and detailed right there.
Sarah Rogers, December 17, 2010
Freebie
SharePoint White Papers Categorized
December 17, 2010
I know you need some reading over the New Year’s holiday. I just learned that Bill Baer, a “master for SharePoint”, has published “Categorized Index of SharePoint 2010 White Papers. I don’t know too much about building categories. My miserable attempts ended with the ABI/INFORM controlled term project and limped through several taxonomy projects for the late and lamented Ziff Communications Corp. Certainly my skills are nothing compared to the 20 somethings now pitching ontological skills that would make a monk in Mont St Michel’s scriptorium weep.
I did notice that there was a top level category for search called “Search.” There is one white paper in the category: “Search Topology Operations in SharePoint Server 2010.” The fastest category was the collection of white papers about making SharePoint run like an Olympic sprinter. Search is a tidy little category.
We all know that search—particularly with the tools Microsoft provides—is no big deal. If it were a challenging function, I would anticipate more white papers. Now that performance topic warrants lots of words.
Is SharePoint search trivial? Is performance the big problem? Interesting page. All those white papers about performance. There is message there.
Stephen E Arnold, December 17, 2010
Freebie just like the white papers