Exclusive Interview with Steve Cohen, Basis Technology
September 21, 2010
The Lucene Revolution is a few weeks away. One of the featured speakers is Steve Cohen, the chief operating officer of Basis Technology. Long a leader in language technology, Basis Technology has ridden a rocket ship of growth in the last few years.
Steve Cohen, COO, Basis Technology
I spoke with Steve about his firm and its view of open source search technology on Monday, November 20, 2010. The full text of the interview appears below:
Why are you interested in open source search?
The open source search movement has brought great search technology to a much wider audience. The growing Lucene and Solr community provides us with a sophisticated set of potential customers, who understand the difference that high quality linguistics can make. Historically we have sold to commercial search engine customers, and now we’re able to connect with – and support – individual organizations who are implementing Solr for documents in many languages. This also provides us with the opportunity to get one step closer to the end user, which is where we get our best feedback.
What is your take on the community aspect of open source search?
Of course, open source only works if there is an active and diverse community. This is why the Apache Foundation has stringent rules regarding the community before they will accept a project. “Search” has migrated over the past 15 years from an adjunct capability plugged onto the side of database-based systems to a foundation around which high performance software can be created. This means that many products and organizations now depend on a great search core technology. Because they depend on it they need to support and improve it, which is what we see happening.
What’s your take on the commercial interest in open source?
Our take, as a mostly commercial software company, is that we absolutely want to embrace and support the open source community – we employ Apache committers and open source maintainers for non-Apache projects – while providing (selling) technology that enhances the open source products. We also plan to convert some of our core technology to open source projects over time.
What’s your view on the Oracle Google Java legal matter with regards to open source search?
The embedded Java situation is unique and I don’t think it applies to open source search technology. We’re not completely surprised, however, that Oracle would have a different opinion of how to manage an open source portfolio than Sun did. For the community at-large this is probably not a good thing.
What are the primary benefits of using open source search?
I’ll tell you what we hear from customers and users: the primary benefits are to avoid vendor lock-in and flexibility. There has been many changes in the commercial vendor landscape over the fifteen years we’ve been in this business, and customers feel like they’ve been hurt by changes in ownership and whole products and companies disappearing. Search, as we said earlier, is a core component that directly affects user experience, so customizing and tuning performance to their application is key. Customers want all of the other usual things as well: good price, high performance, support, etc.
When someone asks you why you don’t use a commercial search solution, what do you tell them?
We do partner with commercial search vendors as well, so we like to present the benefits of each approach and let the customer decide.
What about integration? That’s a killer for many vendors in my experience.
Our exposure to integration is on the “back end” of Lucene and Solr. Our technology plugs in to provide linguistic capabilities. Since we deliver a reliable connector between our technology and the search engine this hasn’t been much of a problem.
How does open source search fit into Basis’ product/service offerings?
Our product, Rosette, is a text analysis toolkit that plugs into search tools like Solr (or the Lucene index engine) to help make search work well in many languages. Rosette prepares tokens for the search index by segmenting the text (which is not easy in some languages, like Chinese and Japanese), using linguistic rules to normalize the terms to enhance recall, and also provide enhanced search and navigation capabilities like entity extraction and fuzzy name matching.
How do people reach you?
Our Web site, at www.basistech.com, contains details on our various products and services, or people can write to info@basistech.com or call +1-617-386-2090.
Stephen E Arnold, September 21, 2010
Sponsored post
IBM: Catching Up or Moving Ahead?
September 21, 2010
As IBM is inching towards complete automation in content classification and management, we noticed that the IBM InfoSphere Classification Module in now renamed as “Classification Module,” which “automates the organization of unstructured content by analyzing full text of documents and emails.” IBMs Enterprise Content Management (ECM) now uses context-sensitive and rules-based classification and categorization that not only saves time and money, but also “shifts the categorization burden away from your valuable staff resources and helps you make better informed data management decision.”
IBM also seems to be repositioning itself more towards the government, as seen from its “Advance Classification” experiences that pay heavy emphasis on its government applications. Displayed on its site are the case studies and white paper that deal with the army use, and the federal governments’ records management challenges. Among other podcasts and videos are details about Classification Module, automated document classification, content management software, and complaint information management. Check them out if you are into the Big Blue.
Is IBM ahead of curve, behind the curve, or confused by the curve. Clustering, categorization, and entity extraction have become standard features of most search systems and next generation enterprise content processing products from giants like Autonomy and Exalead to more specialized vendors such as Megaputer. The IBM open source initiative also is interesting and we think we see either competitive jiggling or catching up in action.
Harleena Singh, September 21, 2010
Freebie
Fujitsu and Libraries: A Bit of a Surprise
September 20, 2010
Fujitsu has taken the charge on the cloud. It recently started its software-as-a-service (SaaS)-based solution for library administration for Japan’s municipal public libraries, as part of its global cloud strategy. The JapanToday’s article, “Fujitsu to Start Services for Libraries Using Cloud Computing,” further states that, “the services will enable libraries to manage information on lending books to users without their own computer systems.”
Fujitsu estimates that deploying the ICT system environment for the libraries, with the help of Fujitsu’s datacenters can save the libraries about 30 percent on their ICT costs over a period of five years. The article says that since the library employees are relieved from “the responsibilities for maintaining and operating the ICT system, the library can operate more efficiently.” As Fujitsu plans to create regional library centers, and its rival NEC Corp too plans to begin similar services, it appears to us as a different and potentially predatory move against the beleaguered library vendors.
Harleena Singh, September 20, 2010
Open Text Growth
September 20, 2010
It is growing time for Open Text. The undisputed largest stand-alone global ECM vendor, which though lacks an attractive organic growth, makes up with its cash flow through its revenue from partnership agreements, merger and acquisitions, besides new products. The Financial Post article “New M&A could help fuel growth at Open Text” exposes that the Waterloo-based tech company reported earnings last month, and the National Bank Financial too forecasts good growth for Open Text, as the demand for ECM solutions accelerates out of the downturn.
The company’s key to success has been the partnership with SAP, Microsoft, which now looks forward to a new partnership with Oracle, and all of these bring 40% of total license revenues for Open Text. Now Open Text is back in the hunt for acquisitions, tagged with an Outperform rating by National Bank Financial that supports the company’s policies and believes they will drive Open Text’s growth.
Harleena Singh, September 20, 2010
Social Media and Attensity: Pushing Forward
September 15, 2010
The social media world has become deeply rooted into the business world. The ZDNet article “Social “Rising Stars: Maria Ogneva on Scaling Social Media” gives Attensity’s Social Media Director, Maria Ogneva, a chance to discuss the importance of knowing what is going on in the social media world. According to the article social media is “heading straight into mass-market adoption, with no signs of slowing down.” It is the “#1 activity on the web” and with so much influence companies must find ways to listen to their customer online chatter and properly respond. Tools are needed that allow companies to filter customer responses, route them to the appropriate department and provide a prompt response. With so many different social media outlets, companies must decide which ones to put emphasis on and the employee assistance set up needed in order to properly handle customer issues. The social media world has become like a gossip column where customer comments, especially negative, spread like wildfire and can have lasting effects.
April Holmes, September 15, 2010
Freebie
Exalead Anchors US International Trade Commission
September 14, 2010
Drowning in a sea of data, one government agency recently had a life preserver tossed its way from one of the search industry’s best and brightest. “U.S. International Trade Commission Selects Exalead CloudView as Primary Search Engine” said:
USITC end user site surveys indicated that people couldn’t easily find the information they were looking for on http://www.usitc.gov via its search function. In 2009, USITC decided to replace its previous search software and reviewed a number of other enterprise search options for a solution that met its needs, was easy to administer, and fit its budget.
The United States International Trade Commission (USITC) (http://www.usitc.gov) handles issues of global and domestic trade with its quasi-judicial authority. In doing so, the agency collects massive quantities of data that both employees and visitors to its site needed to access.
The solution was Exalead CloudView, which “uses advanced semantic technologies to bring structure, meaning and accessibility to previously unused or under-used data in the new hybrid enterprise and Web information cloud.” For the USITC, specifically, CloudView aimed to provide two very specific functions. First, it gave outside users access to over 40,000 documents ranging from PDFs, spreadsheets, Word docs and more. Secondly, CloudView gave employees the ability to search file systems, folders and data repositories that, previously, had to be searched for in a time-consuming manual process.
The result is a highly efficient enterprise and web combination that improves the agency’s ability to monitor trade around the globe. The new system increased the range of available information, boosted performance and provided much-needed speed and simplicity to the Web site.
This is not only a big win for Exalead.
Stephen E Arnold, September 14, 2010
Freebie
IBM and Its Fall 2010 Marketing Angle
September 14, 2010
I read “IBM’s Big Push to Steal Sales from Rivals”. If the write up is accurate, IBM is slashing prices or buying market share. I am not sure how to position the tactic. Here’s what Business Week says:
Starting this month, Oracle and HP customers that switch to IBM’s latest package of servers, software, and storage, priced at upwards of $75,000, will get trade-in credit and can defer all payments until next year, interest-free. Big Blue also will help finance the cost of taking out a client’s old equipment and transferring the data over to its Power7 system. IBM, which managed to steal 500 customers away from competitors last year, hit that mark in just six months in 2010, says Jeff Howard, the marketing director for Power7. Now it’s hoping the sweetened financing will help keep the momentum going.
How will other companies respond? I anticipate bundling, price cutting, special offers, and quite a bit of love and attention to procurement teams.
What are the implications?
First, I think the companies affected by these tactics, if the write up is accurate, will be second and third tier enterprise vendors.
Second, the bundling will put further pressure on some specialist providers of search, content processing and business intelligence. It takes deep pockets to buy market share Big Blue style.
Third, I think customers may take a closer look at products that may not be free. Deals from giants like IBM often come with an Iron Maiden, a thumbscrew, and foot chains. There is no free lunch in the rough and tumble world of enterprise software.
Stephen E Arnold, September 14, 2010
Freebie
SharePoint Dual Feature Bonanza
September 13, 2010
Microsoft SharePoint 2010 has more social and search features, which are intertwined to create an enticing platform for users. This and more is revealed in the Able Blue blog post “The MOSS Show Interview”, which takes you to The MOSS show site’s interview “Enterprise Social (and Search) in SharePoint 2010”.
The two part podcast interview of Matthew McDermott, who is a Microsoft SharePoint expert and MVP, talks about the new improved social features like improved My Sites, Activity Feed, tagging, rating, managed metadata, taxonomies, and folksonomies in SharePoint 2010. Matthew talks about the importance of having a search strategy, and leverage the search applications by making the search actionable and refined.
SharePoint 2010 can help create a knowledge base that benefit over a long period, and can be shared amongst users. Matthew points out, “What makes SharePoint 2010 special is its ability to gather feedback from people participating in the content consumption,” which enhances the value of the content, making it more important to the enterprise. This is enterprise social, which gains more relevance if “made more findable by tagging and using proper metadata.”
Matthew explains that SharePoint 2010 adds great enterprise social capabilities, and facilitates to integrate third party external applications like LinkedIn, Facebook, and Twitter outside the firewall from SharePoint 2010. These social tools can be used to create a business value. The new SharePoint 2010 allows the internal as well as external URLs in the browser to be tagged, and enables the list of all the tagged URLs to collect on a tag profile page.
The managed metadata store of SharePoint 2010, allows people to create a central repository of data through service applications. There is also a feature to make the data translatable into multi-lingual forms, and even deny the use of tags for various reasons. “Activity feed is a feature through which you get your news or tips of the day by just following the tags,” Matthew reports, “and you get the ability to consume content around the organization.” He believes that this helps the employees to connect with each other, nurture cooperation, and makes them productive by improving the culture of the workplace.
The beauty of SharePoint 2010, as per Matthew McDermott is that users can themselves decide upon the governance of the data, and thus get complete control of this powerful enterprise social platform, with highly developed search techniques.
Now, how expensive is it to maintain a proprietary system that requires hands on fiddling to make work as advertised? The answer to this question is not in the movie. Maybe the sequel?
Harleena Singh, September 13, 2010
Freebie
SwiftRiver: Open Source Pushes into the Intel Space
September 13, 2010
If you are one of the social netizens, you know it isn’t easy to keep track of, manage, and organize the hundreds of Twitter streams, Facebook updates, blog posts, RSS feeds, or SMS that you keep getting. Do not feel helpless as SwiftRiver comes to your aid, which is a free open source intelligence-gathering platform for managing real-time streams of data streams. This unique platform consists of a number of unique products and technologies, and its goal is to aggregate the information from multiple media channels, and add context related to it, using semantic analysis.
SwiftRiver can also be used as a search tool, for email filtering, to monitor numerous blogs, and verify real-time data from various channels. It offers, “Several advanced tools (social graph mining, natural language processing, locations servers, and twitter analytics) for free use via the open API platform Swift Web Services.” According to the parent site Swiftly.org, “This free tool is especially for organizations who need to sort their data by authority and accuracy, as opposed to popularity.” SwiftRiver has the ability to act quickly on massive amounts of data, a feat critical for emergency response groups, election monitors, media, and others.
There are multiple Swift Rivers. You want the one at http://swift.ushahidi.com or http://swiftly.org/.
Ushahidi, the company behind this initiative claims, “The SwiftRiver platform offers organizations an easy way to combine natural language/artificial intelligence process, data-mining for SMS and Twitter, and verification algorithms for different sources of information.” Elaborating further it states, “SwiftRiver is unique in that there is no singular ‘SwiftRiver’ application. Rather, there are many, that combine plug-ins, APIs, and themes in different ways that are optimized for workflows.”
Presently SwiftRiver uses the Sweeper App, the Kohana MVC UI, the distributed reputation system RiverID, and SwiftWebServices (SWS) as the API platform. The beauty here is that SwiftRiver is just the core, and it can have any UI, App, or API. It also has an intuitive and customizable dashboard, and the “users of WordPress and Drupal can add features like auto-tagging and more using Swift Web Services.” While you may download SwiftRiver and run it on your web server, SWS is a hosted cloud service, and does not need to be downloaded and installed.
Harleena Singh, September 13, 2010
Freebie
RSS Readers Dead? And What about the Info Flows?
September 13, 2010
Ask.com is an unlikely service to become a harbinger of change in content. Some folks don’t agree with this statement. For example, read “The Death Of The RSS Reader.” The main idea is that:
There have been predictions since at least 2006, when Pluck shut its RSS reader down that “consumer RSS readers” were a dead market, because, as ReadWriteWeb wrote then, they were “rapidly becoming commodities,” as RSS reading capabilities were integrated into other products like e-mail applications and browsers. And, indeed, a number of consumer-oriented RSS readers, including News Alloy, Rojo, and News Gator, shut down in recent years.
The reason is that users are turning to social services like Facebook and Twitter to keep up with what’s hot, important, newsy, and relevant.
An autumn forest. Death or respite before rebirth?
I don’t dispute that for many folks the RSS boom has had its sound dissipate. However, there are several factors operating that help me understand why the RSS reader has lost its appeal for most Web users. Our work suggest these factors are operating:
- RSS set up and management cause the same problems that the original Pointcast, Backweb, and Desktop Data created. There is too much for the average user to do and then too much on going maintenance required to keep the services useful.
- The RSS stream outputs a lot of baloney along with the occasional chunk of sirloin. We have coded our own system to manage information on the topics that interest the goose. Most folks don’t want this type of control. After some experience with RSS, my hunch is that many users find them too much work and just abandon them. End users and consumers are not too keen on doing repetitive work that keeps them from kicking back and playing Farmville or keeping track of their friends.
- The volume of information in itself is one part of the problem. The high value content moves around, so plugging into a blog today is guarantee that the content source will be consistent, on topic, or rich with information tomorrow. We have learned that lack of follow through by the creators of content creators is an issue. Publishers know how to make content. Dabblers don’t. The problem is that publishers can’t generate big money so their enthusiasm seems to come and go. Individuals are just individuals and a sick child can cause a blog writer to find better uses for any available time.

