Exclusive Interview: Mike Horowitz, Fetch Technologies

July 20, 2010

Savvy content processing vendors have found business opportunities where others did not. One example is Fetch Technologies, based in El Segundo, California. The company was founded by professors at the University of Southern California’s Information Sciences Institute. Since the firm’s doors opened in the late 1990s, Fetch has developed a solid clientele and a reputation for cracking some of the most challenging problems in information processing. You can read an in-depth explanation of the Fetch system in the Search Wizards Speak’s interview with Mike Horowitz.

The Fetch solution uses artificial intelligence and machine learning to intelligently navigate and extract specific data from user specified Web sites. Users create “Web agents” that accurately and precisely extract specific data from Web pages. Fetch agents are unique in that they can navigate through form fields on Web sites, allowing access to data in the Deep Web, which search engines generally miss.

You can learn more about the company and its capabilities in an exclusive interview with Mike Horowitz, Fetch’s chief product officer. Mr. Horowitz joined Fetch after a stint at Googler.

In the lengthy discussion with Mr. Horowitz, he told me about the firm’s product line up:

Fetch currently offers Fetch Live Access as an enterprise software solution or as a fully hosted SaaS option. All of our clients have one thing in common, and that is their awareness of data opportunities on the Web. The Internet is a growing source of business-critical information, with data embedded in millions of different Web sites – product information and prices, people data, news, blogs, events, and more – being published each minute. Fetch technology allows organizations to access this dynamic data source by connecting directly to Web sites and extracting the precise data they need, turning Web sites into data sources.

The company’s systems and methods make use of proprietary numerical recipes. Licensees, however, can program the Fetch system using the firm’s innovative drag-and-drop programming tools. One of the interesting insights Mr. Horowitz gave me is that Fetch’s technology can be configured and deployed quickly. This agility is one reason why the firm has such a strong following in the business and military intelligence markets.

He said:

Fetch allows users to access the data they need for reports, mashups, competitive insight, whatever. The exponential growth of the Internet has produced a near-limitless set of raw and constantly changing data, on almost any subject, but the lack of consistent markup and data access has limited its availability and effectiveness. The rise of data APIs and the success of Google Maps has shown that there are is an insatiable appetite for the recombination and usage of this data, but we are only at the early stages of this trend.

The interview provides useful insights into Fetch and includes Mr. Horowitz’s views about the major trends in information retrieval for the last half of 2010 and early 2011.

Now, go Fetch.

Stephen E Arnold, July 20, 2010

Freebie. I wanted money, but Mr. Horowitz provided exclusive screen shots for my lectures at the Special Library Association lecture in June and then my briefings in Madrid for the Department of State. Sigh. No dough, but I learned a lot.

Oracle and Its Silence May Dissolve OpenSolaris Board

July 19, 2010

Oracle and open source might have more difficulty than Mel Gibson and his significant other.

There’s a report of trouble brewing over Oracle keeping things close to the hip about the future of OpenSolaris. The project, which is an open-source version of Sun’s Solaris distribution of Unix, has run into snags since Oracle bought Sun in February.

Frustrated by the lack of open discourse from Oracle, The OpenSolaris Governing Board (OGB) has issued an ultimatum. The accompanying article OpenSolaris Board May Quit Over Oracle’s Silence even goes on to say that Oracle needs to appoint a liaison by August 16. The significance lies in the fact that under those circumstances, the control in OpenSolaris would be returned to Oracle.

Oracle might not worry too much about these things. They are a commercial outfit and all in all, walkouts do attract attention. Still at least one board member understands how damaging these things can be to a firm’s reputation according to comments made by John Plocher.

Will there be a reconciliation? In Hollywood, the publicists decide. The addled goose is happy with the Lucid Imagination approach.

Rob Starr, July 19, 2010

Freebie

Search and Fax from Oracle Purchasing: Easy as Pie

July 17, 2010

Every once in a while I realize why Oracle database administrators are thrilled with their jobs. I don’t use facsimiles too often any more. I still have a fax machine, and I use it to communicate with one of my three legal eagles. None of these folks is into electronic mail, SMS, or taking telephone calls. A post on the San Francisco Blogger Community reminded me of the hula hoops that Oracle administrators must keep swinging to accomplish simple tasks.

Assume you have run a query on the Oracle table and you have the information required in one of those plain Jane Oracle text reports.  You have to fax that report to 20 field offices. Do you code the fax numbers into the Brother MFC 8820D or do you write a custom script, pass the Oracle output to the Omtool Genifax system and sit back thinking about your next vacation?

Oracle certified professionals may go with the vacation method because of the complexity of the process  required to perform a quite simple task. Navigate to “How Would I transfer the Info from a PO in Oracle to a Text File That Genifax Can Pick Up and Fax Out?”

The key is an “indispensable report to a content record.” Any questions about why a CFO’s effort to replace an Oracle database centric system with a lower cost, more programmer friendly system?  Thought not.

The Oracle administrator when it comes to retrieving a  record and faxing it has more power than the top bean counter. That’s why NoSQL vendors face  significant push back when pitching alternatives to the decades old Codd database in my opinion. From my vantage point in the goose pond, the cost of using certain traditional enterprise information systems may force abrupt, non linear change.

For some organizations, the shift will come too late in the game to have a material impact I fear.

Stephen E Arnold, July 17, 2010

US Search Start Ups May Struggle for Funding

July 17, 2010

Venture Capitalists Not Finding Funding Either” may mean good news for pharmacies selling Pepto-Bismol but bad news for search start ups. The write up said:

According to Thomson Reuters and the NVCA today, thirty-eight U.S. venture capital firms raised $1.9 billion in the second quarter of 2010, down 49 percent compared to Q1 this year, when 38 funds raised $3.7 billion. Thomson Reuters and the NVCA said that the quarter is the lowest–based on dollar commitments–since the third quarter of 2003.

I have heard that a number of search and content processing vendors are gasping for air. There’s an outfit in Chicago looking for funds or a buyer. There’s a vendor out west sweating bullets. There have been some rumors of trouble at one high profile outfit.

Without friendly VCs looking to fund the next Google, search start ups may struggle for funding.

Stephen E Arnold, July 17, 2010

Lucene Revolution Conference Details

July 15, 2010

The Beyond Search team received an interesting news release from a reader in San Francisco. We think the information reveals the momentum that is building for open source search. Here’s the story as we received it:

San Mateo, Calif. – July 14, 2010 – Lucid Imagination, the commercial company for Apache Lucene and Solr open source search technologies, is pleased to announce speakers for Lucene Revolution, the first-ever conference [EV1] in the US devoted to open source search. The conference will take place October 7-8, 2010 at the Hyatt Harborside, Boston, Massachusetts. Lucene Revolution is a groundbreaking event that drives broad participation in open source enterprise search , creating opportunities for developers, technologists and business leaders to explore the disruptive new benefits that open source enterprise search makes possible, in a fresh, energetic and forward thinking format.

The diverse and widespread adoption of Lucene/Solr for enterprise search applications is reflected by the broad range of speakers at the event, such as:

  • Cisco Systems: Satish Gannu
  • eHarmony: Joshua Tuberville
  • LinkedIn: John Wang
  • Sears: David Oliver
  • The McClatchy Company: Martin Streicher
  • The Smithsonian: Ching-Hsien Wang
  • Twitter: Michael Busch

Conference speakers represent a cross-section of Lucene/Solr adoption – including new media, ecommerce, embedded search applications, content management, social media, and security and intelligence – spanning the broad spectrum of production-class enterprise search implementations, all of whom leverage the power and economics of Lucene/Solr innovation.

Other industry thought leaders participating and sharing their insights into open source enterprise search include Hadley Reynolds (Research Director, Search & Digital Marketplace Technologies, IDC) and Stephen E. Arnold (Beyond Search; Managing Partner, ArnoldIT).

Over the two days of the conference there are over 30 sessions scheduled in a variety of different formats: technical presentations, use cases, panel discussions, and Q&A sessions. In addition there will be an “un-conference” the evening of October 7, where attendees can present lightning talks and take part in hands-on community coding efforts.

Registration for Lucene Revolution is now open for the conference at: http://www.lucenerevolution.com/register. A full list of speakers, along with a complete conference agenda, is available at http://www.lucenerevolution.com/agenda.

If you are not familiar with Lucid, here’s a snapshot:

Lucid Imagination is the commercial company dedicated to Apache Lucene technology. The company provides value-added software, documentation, commercial-grade support, training, high-level consulting, and free certified distributions, for Lucene and Solr. Lucid Imagination’s goal is to serve as a central resource for the entire Lucene community and search marketplace, to make enterprise search application developers more productive. Customers include AT&T, Sears, Ford, Verizon, Elsevier, Zappos, The Motley Fool, Macy’s, Cisco, HP, The Guardian and many other household names. Lucid Imagination is a privately held venture-funded company. Investors include Granite Ventures, Walden International, In-Q-Tel and Shasta Ventures. To learn more please visit www.lucidimagination.com.

Goslings Constance Ard and Dr. Tyra Oldham will be attending. Should be useful. Certainly more timely than the plethora of SharePoint and gasping one-size-fits-all programs. Honk.

Stephen E Arnold, July 15, 2010

Sponsored post.

Ontoprise Bids to Stay on Top of Semantic Web Technologies

July 15, 2010

Ontoprise GmbH from Germany is looking to increase their already impressive line of Semantic Web infrastructure products with OntoBroker 6.0 and OntoStudio 3.0.

The company is constantly looking to improve its web services and to that end that have developed ways to interface existing technologies into their OntoBroker web-services. Along with an overall promise to improve performance there are a few specific areas that this company has highlighted including:

  • A collaboration server that has extended rights management
  • Ontology optimizing tools that are integrated
  • Improved handling of very large ontologies

When it comes to the most reliable and technologically advanced semantic web technologies and products, Ontoprise has been an industry leader in delivering key elements for the upcoming advancements in semantic Web.

Rob Starr, July 16, 2010

Freebie

Lucene Revolution Preview: Otis Gospodnetic, Sematext

July 13, 2010

The Lucene Revolution Conference is shaping up. Among the presenters are open source developers representing a wide range of organizations. One of the speakers is Otis Gospodnetic, Sematext’s founder. Mr. Gospodnetic is also the author of Lucene in Action with co-authors Erik Hatcher and Michael McCandless. His firm implements open source search, natural language processing, and text analytics technology in the enterprise. His team focuses on the design and development of scalable, high-performance search and solutions.

I spoke with Mr. Gospodnetic earlier this week. Here are the highlights of our conversation:

Why are you interested in Lucene/Solr?

I’ve always been interested in information  gathering, information extraction, search, and related areas.  I’m think  that’s because I feel that information gathering, extraction, and  searching are precursors for gaining knowledge, and knowledge has always  been a hobby of mine. If I look back at all my professional experience,  everything I ever built had a strong search component.  This is why I  was happy when I stumbled upon Lucene around 2000 and why I immediately  joined the project, even before it was an Apache project, and why I’ve  been using Lucene ever since.

What is your take on the community aspect of Lucene/Solr?

Community around  Lucene and Solr is as real and as alive and active as it can be.  It’s  very knowledgeable and quick to help.  I’ve been a part of it for around  10 years now, and have witnessed the community grow, as well as its  knowledge breadth and depth increase.

When it comes to Lucene/Solr community,  the quote I like to give comes from the former Netflix search guy:

I posted,  went to get a sandwich, and came back to see two answers. The change  works, and I can get the fix into production today. This list is magic.

Both user and  development communities are so strong and active that it’s becoming  really hard for people to keep up with the volume of output these  communities produce.  Earlier this year we started publishing monthly  Lucene and Solr Digest blog posts.  These posts are for people who want  to keep up with (or keep an eye on) Lucene and Solr, but don’t have the  time to read some 60+ non-trivial-to-read email messages these  communities produce every day.  See http://blog.sematext.com/ or  http://twitter.com/sematext .  I hope we are not going through the  trouble of getting this published every month just because of some  mythical community!

Commercial companies are playing what I call the “open source card.”  Won’t that confuse people?

Judging from the demand, I’d say this is not  confusing to people.  On the contrary, I get the feeling they like the  open-source/commercial blend.  Plus, there is precedent – commercial  support for open-source software has been around for many years now:  MySQL, Red Hat have been doing this for years.  Not only is this not  confusing, it is welcomed.  Some people and organizations love and can  rely on the community support.  Others prefer paid support.  At Sematext  we do both – some of us participate on Lucene/Solr mailing lists  helping as much as we can via that channel.  We also publish the already  mentioned monthly Lucene and Solr Digest that summarize the new and  interesting developments from those two projects, and we offer paid tech  support and other types of services for Lucene, Solr, Hadoop, and other  related technologies.

What are the primary benefits of using  Lucene/Solr?

Let me highlight the points my work has driven home as pivotal.

First, there is the notion of TCO or total cost of ownership. TCO is *much* lower.  There are no license  fees, no
limitations about the index size, query rates, number of  servers, etc.

Second, Lucene/Solr offer flexibility. If you don’t like how something works in Lucene/Solr, you  can change it today and deploy it tomorrow.  If your use case is good,  the community will adopt it and you won’t have to maintain your  customized, forked Lucene/Solr version.

Third, quality. Lucene and Solr are mature.   They’ve been worked on by many smart people 24/7 around the world for  more than 10 years.  These people work on Lucene/Solr because that is  their passion, not because they are paid to do so, except for the lucky  few who also get paid to work on what they love.  Lucene and Solr can do  a lot – they have lots of features, they are reliable, they are still  being worked on and are improved on a daily basis.

And, finally, agility: You need  search?  You can have something working today.  You don’t have to go  through budget approvals, through long sales and negotiation cycles, you  don’t have to go through wine and dine dates that just create delays  that ultimately increase your costs.

When someone asks you why you don’t  use a commercial search solution, what do you tell them?

I tell them to wake  up.  It’s 2010.  There are alternatives.  Cheaper.  Faster.  Better.  I  tell them to read the answers to the previous questions.  When I see how  much some (all?) of the commercial search solutions cost and I compare  that to what we at Sematext can do for a customer for that sort of  money…  I recently happened to see a quote from one well-known  commercial search vendor and my jaw dropped.  Well, not really, because I  know they charge an arm and the leg for their software, but when you  think about how many kids you can put through college for that kind of  money.

Let me also quote  something that came up recently in a thread titled “Arguments in Favor  of Lucene over Commercial Competition”.

In my initial foray  into Lucene several years ago, by the time I’d sent a support request to  the vendor of a commercial product and received an answer telling me  that I hadn’t included the
correct license info and I’d have to provide  it before they could talk to me, I’d found Lucene, downloaded it,  indexed some of our data and run searches against it. Not to mention that rather than  waiting for days to get a response from the commercial vendor, my  questions on the Lucene user’s list were answered within a very few  hours.  With grace and tolerance for my ignorance.

How do people reach you?

Sematext is at  http://sematext.com/ and that is the best way to reach the professional  me. Our  blog and the Digest posts mentioned earlier are at http://blog.sematext.com/ . We are also at http://twitter.com/sematext if  you prefer us in 140 char bites.

Will you elaborate on these points in your Lucene Revolution lecture?

Absolutely. Looking forward to the conference and hearing the great speakers. I understand Cisco is giving a talk too.

Stephen E Arnold, July 13, 2010

Post sponsored by Lucid Imagination and the Lucene Revolution Conference.

Brainware and Paper

July 12, 2010

I used to work with Harvey Poppel. You, gentle reader, will not remember Harvey, who invented Harvey Balls. He was the Booz, Allen guy who coined the phrase “the paperless office.” Like many of the BAH clan in the 1970s, there were some smart, prescient dudes sprinting up and down the staircase between floors 25 and 25 at 245 Park Avenue South. Harvey was into the digitization of memos, reports, presentations, and other hard copy effluvia.,

Problem. Paper remains popular. The paperless office is not yet a reality even though another New Yorker, Alan Siegel, worked long and hard on the paperwork reduction stuff for years. Woody Horton tried his hand at this goal. I remember being in a meeting in June 2010 when the notion of a paperless operation floated from the blather.

That’s 35 or 40 years, right? Harvey had a good idea.

The reason this is important is that the search vendor Brainware has discovered a source of business hooking up with outfits converting paper into digital information. Several other search companies are nosing around this market sector. I don’t want to sneeze when I get too close to converting paper into searchable ASCII, however.

You can read about Brainware’s deal with OPEX. The story is “Brainware and OPEX Partner to Deliver Scan to Post Automation.” The write up says:

By implementing a combined OPEX and Brainware Distiller solution, companies can streamline the entire document processing cycle, including reducing the tedious and expensive steps of removing the documents from envelopes or file folders, prepping those documents for scanning, and then actually performing the scanning operation. The net result of this solution is that related documents, envelopes and transactions can be kept together and handled a minimum number of times, allowing these items to be quickly and easily routed and processed through companies’ Accounts Payable, Accounts Receivable, Customer Support, Legal departments, and others — all while delivering the unparalleled touchless pass rates, instant visibility, reduced cycle times, and error reduction for which Distiller is known.

Google’s in this business too. Too bad for Harvey. As smart as he was, he missed his call for a paperless office. Converting hard copy to searchable ASCII may not be exciting but it is revenue to Brainware and a source of legal thrills for Google.

Stephen E Arnold, July 12, 2010

Freebie

Autonomy, CA, and Enterprise Message Manager

July 12, 2010

Your weekly dose of Autonomy goodness follows:

Message Manager, the popular enterprise search engine, just boosted its capabilities when it was snapped up by Autonomy Corporation. Red Orbit announced this leap forward in a recent article, “Autonomy Announces Availability of Idol-Based CA Message Manager,” and showcased the ways IDOL, its meaning based search platform, will enhance Message Manager.

“The integration of Autonomy IDOL into Message Manager brings advanced automation to information governance tasks based on IDOL’s ability to understand the meaning of information,” the article says. “This significantly reduces the levels of manual effort for classifying, monitoring and managing large and growing volumes of data.”

In addition, current Message Manger users will receive an upgrade of sorts, including, “access to more than 400 connectors and over 1,000 file types, including text, audio and video.”

We see more and more mergers like this, which clearly points toward the growing power of searches.

Pat Roland, July 12, 2010

Freebie but the goose wants some stale bread for writing so much news about the Cambridge kids.

Publishing Help is on the Way?

July 11, 2010

Publishing remains one of the toughest industries to be a part of, but one aspect of this work just got a lot easier thanks to OpenPublish. This Calais-powered publishing suite comes as a direct result of the pairing of Thompson Reuters with Phase2 Technology. The result is a Drupal compatible program aimed to help medium-sized and small publishers reduce costs and get more bang from offline content.

OpenPublish provides support on a variety of content, ranging from articles, to blogs and even content monetization tools. In addition, it helps build an online presence by including functionality for email forwarding, social bookmarking, RSS feed capabilities and allowing for readers to leave comments.

This is an exciting development for any publisher because as the rising costs of production and competition from other media, this industry needs help catching up. With many commercial content management systems crashing on the rocks, this solution may be worth a close look. The open source magnetism may be a plus too.

Pat Roland, July 11, 2010

Freebie

« Previous PageNext Page »

  • Archives

  • Recent Posts

  • Meta