GooNews: Google Dooms Some Commercial Database Publishers

September 9, 2008

I have been mired in family business about 90 miles south of Chicago. I was unfortunately unable to add my two cents to the Web wave of comments about Google’s scanning newspapers. Anyone remember University Microfilms, the outfit that put newspapers on–yuck–microfilm?  Techmeme and Megite have dozens of posts about Google scanning newspapers, and I doubt that my telling you that Google is supplementing its book scanning activities will add much to your day.

My angle on this announcement by Google here is rotated about six degrees off the Web buzz.

First, you can kiss most commercial database publishers’ as great investments good bye. Customers are tired of paying through the nose for “real” databases. The idea is that Google makes “toy” databases. Wrong. Google is collecting information and making it available with a business model that allows searching for free. Google’s business model is a big earth mover grinding down traditional media. Most traditional media mavens hear crunching but have not connected the noise with the footfalls of the GOOG.

Second, you can ignore those Monday Night Football ads from Thomson Reuters. There were more buzz words about intelligent information and professionals than I could process. Advertising is not going to sell search queries that cost anywhere from $5 to $500 per query. Yes, $500. Fire up Derwent. Hunt for Google patents. Poke around for prior art and let me know how much you pay to search and save your results. Google Patents may not be perfect, but access is free. Ads kon Monday Night Football won’t sell searches on WestLaw–ever.

Third, the yip yap of competitors, advertisers, and Google critics won’t make a single iota of difference to what Google is doing. I have been documenting for clients and for readers of my monographs that Google is a supra national enterprise. So tell me, “Who is going to regulate Google?” One wealthy wizard screamed at me when I hinted that Google could fold its tent and move to another country without much downtime. When I suggested Russia and mentioned Mr. Brin’s interest in going into space, the wealthy wizard foamed at the mouth. I think he threw a pencil at me. If GooNews wipes out companies in the archived news business, to whom does one complain.

In short, GooNews is the start of a new era at Google. I dubbed the company Googzilla in 2005. No one paid much attention. Bet those folks at ProQuest and Newsbank are perking up now. Agree? Disagree? Help me learn. Just bring facts.

Stephen Arnold, September 9, 2008

Chrome: What It Isn’t

September 8, 2008

The writing is a bit salty, but I found Ted Dziuba’s post at http://teddziuba.com/2008/09/a-web-os-are-you-dense.html quite interesting. Mr. Dziuba asserts that Chrome is not a Web operating system. I agree. The most interesting comment in his article “A Web OS. Are You Dense?” was:

The “Web Operating System” just highlights how much journalists don’t know about computers.

I must admit that I enjoyed this post. A happy quack to Mr. Dziuba. More, please. You have an invitation to contribute to my wimpy Web log as well.

Stephen Arnold, September 8, 2008

Traditional Media Figures Out that Sci-Tech Publishing Is Threatened

September 8, 2008

But do the sci-tech publishers know what’s happening? Navigate to http://www.msnbc.msn.com/id/26512717/ and read “Era of Scientific Secrecy Ends.” The article, written by Robin Lloyd, provides a round up of information changes that will roil the world of sci-tech publishing. Ms. Lloyd does not focus on publishing, but her analysis makes clear that “open science” will bring further changes of sci-tech information access. What’s amazing is that this article comes from two companies struggling to keep pace with similar changes. When I read this good article, I thought of the captain and his officers on the Titanic when the ship was sinking. Insight came too late.

Stephen Arnold, September 8, 2008

Intel: LTU Talks Up Next Generation Processors

September 8, 2008

Update: September 8, 2008, 8 12 am Eastern

More about the Intel quad push is at http://www.yourdesktopinnovation.com/

Original Post

Another item about Intel and search. LTU offers an image processing system that law enforcement professionals find useful in certain matters. But LTU’s technology needs processing horsepower. The company had a deal to embed its image classification technology in a consumer video device, but that was slow out of the gates. The reason, according to my sources, was performance.

At the Intel Developer Forum, LTU showed its image processing system running on Intel’s zippy i7 processor. I can’t keep the names straight anymore, but this processor features more cores on die and more cache plus speed ups for computational intensive applications such as image and content processing.

The crowd loved the demonstration, which should make Intel happy. Search vendors need a way to crank up the performance of their systems. Throwing hardware at search bottlenecks may not be the really smart way to solve problems, but it is one that does not require the search vendors to tackle harder problems such as input output and clunky code in their search systems.

I think Endeca will follow in LTU’s foot steps. Intel is poking around the periphery of search, and the company is going to have to take positive action if it wants to do more than sell chips. My hunch is that smart devices with search and content processing functions on board might be an avenue Intel might investigate.

LTU, in case you are not familiar with the company, is French. The company was Founded in 1999 by software and engineering wizards.  LTU Technologies provides multimedia content control solutions. Its patented technology is use by the French Gendarmerie Nationale, and the Italian state police; agencies investigating traffic in cultural goods and stolen objects (OCBC of the French National Police); as well as commercial media organizations such as Corbis and Meredith Corporation. You can get more information about the company at http://www.ltutech.com.

Stephen Arnold, September 8, 2008

Oracle Teams with ekiwi

September 8, 2008

ekiwi, based in Provo, Utah, has formed a relationship with Oracle. The company was founded in 2002. It focuses on Web based data extraction. The firm’s Screen-Scraper technology is, the news release asserts, “platform-independent and designed to integrate with virtually any existing information technology system.”

The company describes Screen Scraper this way here:

It consists of a proxy server that allows the contents of HTTP and HTTPS requests to be viewed, and an engine that can be configured to extract information from Web sites using special patterns and regular expressions. It handles authentication, redirects, and cookies, and contains an embedded scripting engine that allows extracted data to be manipulated, written out to a file, or inserted into a database. It can be used with PHP, .NET, ColdFusion, Java, or any COM-friendly language such as Visual Basic or Active Server Pages.

Oracle’s revenues are in the $18 to 20 billion range. ekiwi’s revenues may be more modest. Oracle, however, has turned to ekiwi for screen scraping technology to enhance the content acquisition capabilities of Oracle’s flagship enterprise search system, Secure Enterprise Search 10g or SES10g. In May 2008, one of Oracle’s senior executives told me that SES10g was key player in the enterprise search arena and SES10g sold because it was secure. Security, I recall being told, was the key differentiation.

This deal suggests that SES10g has to turn to up-and-coming screen scraping vendors to expand the capabilities of SES10g. I’m still puzzling over this deal, but that’s clearly my inability to understand the sophisticated management thinking that fuels SES10g to its lofty position among the search and content processing vendors.

The news release makes it clear that e-kiwi can access content from the “deep Web”. This buzzword means to me dynamic, database-driven sites. Google has its “deep Web” technologies which may be in part described in its five Programmable Search Engine patents, published by the USPTO as patent applications, in February 2007.

e-kiwi, which offers a very useful Web log here, is:

…a member of the Oracle PartnerNetwork, has worked with Oracle to develop an adaptor that integrates ekiwi’s Screen Scraper with Oracle Secure Enterprise Search to help significantly expand the amount of enterprise content that can be searched while maintaining existing information access and authorization policies. The Oracle Secure Enterprise Search product provides a secure, easy-to-use enterprise search platform that connects to a broad range of enterprise applications and data sources.

The release continues:

The two technologies have already been coupled in a number of cases that demonstrate their ability to work together. In one instance cell phones from many of the major providers were crawled by Screen-Scraper and indexed by Oracle Secure Enterprise Search. A user shopping for cell phones is then able to search, filter, and browse from a single location the various cell phone models by attributes such as price, form factor, and manufacturer. In yet another case, Screen-Scraper was used to extract forum postings from various photography aficionado web sites. This information was then made available through Oracle Secure Enterprise Search, which made it easy to conduct internal marketing analysis on recently released cameras.

I did some poking around and came up short after a quick look at my files and running a couple of Web searches. Information is located, according to the news story about the deal, here. The url is http//:www.screen-scraper.com/ss4ses/. The link redirected for me to http://www.w3.org/Protocols/. The company’s Web site is at http://www.screen-scraper.com, and it looks like this on September 7, 2008, at 8 pm Eastern:

screenscrapersplash

I am delighted that SES10g can acquire Web-based content in dynamic systems. I remain confused about the functions included with SES10g. My understanding was that SES10g was easily extensible, compatible with Oracle Applications, Fusion, and other Oracle technologies. If this were true, SES10g’s ability to pull content from databased services should be trivial for the firm’s engineering team. I was hoping for an upgrade to SES10g, but that seems not to be in the cards at this time. Scraping Web pages seems to be a higher priority that getting a new release out the door. What’s your understanding of Oracle’s enterprise search strategy? I’m confused. Help me out, please.

Stephen Arnold, September 8, 2008

New Beyond Search White Paper: Coveo G2B for Mobile Email Search

September 8, 2008

The Beyond Search research team prepared a white paper about Coveo’s new G2B for Email product. You can download a copy from us here or from Coveo here. Coveo’s system works across different mobile devices, requires no third-party viewers, delivers low-latency access when searching, evidenced no rendering issues, and provided access to contacts and attachments as well as the text in an email. When compared to email search solutions from Google, Microsoft and Yahoo–Coveo’s new service provided a more robust and functional service. Beyond Search identified 13 features that set G2B apart. These include a graphical administrative interface, comprehensive usage reports, and real time indexing of email. The Beyond Search research team—Stephen Arnold, Stuart Schram, Jessica Bratcher, and Anthony Safina–concluded that Coveo established a new benchmark for mobile email search. For more information about Coveo, navigate to www.coveo.com. Pricing information is available from Coveo.

Stephen Arnold, September 5, 2008

Life before Google: History from Gen X

September 7, 2008

When I am in the UK, I enjoy reading the London papers. The Guardian often runs interesting and quirky stories. My newsreader delivered to me “Life before Google” by Kevin Anderson who was in college in the 1990s. Ah. Gen X history. I dived right in, and you may want to read this article here. After a chronological run down of Web search (happily ignoring the pre-Web search systems), Mr. Anderson wrote:

Using the almost 250 year-old theories British mathematician and Presbyterian minister Thomas Bayes, Page and Brin developed an algorithm to analyse the links to a site, helping to predict what sites were relevant to search terms.

This is a comment that is almost certain to catch the attention of Autonomy, the British vendor that has claimed Bayesian methods as its core technology.

Then Mr. Anderson added:

Google hasn’t solved search. There is still the so-called dark web, or deep web – terabytes of data that aren’t searchable or indexed.

Mr. Anderson, despite his keen Gen X intellect, overlooked Google’s Programmable Search Engine inventions or this query on Google. air schedule LGA SFO. The result displayed is

airschedule

What you are looking as is a “deep Web” search result. Mr. Anderson also overlooked the results for Baltimore condo.

The results displayed when I ran this search on September 6, 2008, at 7 10 pm Eastern were:

baltimorecondo

Yep, another “deep Web” search.

What’s the problem with Gen X research for Mr. Anderson’s article? I think for this article it was shallow. Much of the analysis of Google is superficial, incomplete, and misleading in my opinion. Agree or disagree? Help me learn.

Stephen Arnold, September 7, 2008

WordLogic, Codima: Entering the Search War

September 6, 2008

WordLogic (Vancouver, BC) and Codima (Edmonton, AB) have teamed in a joint venture to develop Web search technology. Not much information is available on the tie up. Mediacaster Magazine has a short announcement of the deal here. WordLogic has carved a path for itself in mobile device interfaces. Codima is a VoIP specialist. More information about this company is here. Mobile search is attracting interest from Google and Yahoo. Coveo, another Canadian outfit, has a mobile email search service that looks very solid. As more information becomes available about the WordLogic and Codima play, I will pass the information along.

Stephen Arnold, September 6, 2008

TinEye: Image Search

September 5, 2008

A happy quack to the reader who tipped me about TinEye, a search system that purports to do for images what the GOOG did for test.” The story about TinEye that I saw appeared in the UK computer news service PCPro.co.uk. The story “Visual Search Engine Is Photographer’s Best Friend” is here. The visual search engine was developed by Idée, based in Toronto. The company says:

TinEye is the first image search engine on the web to use image identification technology. Given an image to search for, TinEye tells you where and how that image appears all over the web—even if it has been modified.

The image index contains about one billion images. Search options include uploading an image for the system to pattern match, an image url, or via a plug in for Firefox or Internet Explorer.

Search results are displayed graphically. You can explore the images with a mouse click. One interface appears below:

image

The technology powering the service is Espion. I couldn’t locate a public demonstration of the service. You can request a demonstration of the system here. Toronto is becoming a hot bed of search activity. Arikus and Sprylogics both operate there. OpenText has an office. Coveo is present. I will add this outfit to my list of Canadian vendors.

Stephen Arnold, September 5, 2008

Blossom Search for Web Logs

September 5, 2008

Over the summer, several people have inquired about the search system I use for my WordPress Web log. Well, it’s not the default WordPress engine. Since I wrote the first edition of Enterprise Search Report (CMSWatch.com), I have had developers providing me with search and content processing technology. We’ve tested more than 50 search systems in the last year alone. After quite a bit of testing, I decided upon the Blossom Software search engine. This system received high marks in my reports about search and content processing. You can learn more about the Blossom system by navigating to www.blossom.com. Founded by a former Bell Laboratories’ scientist, Dr. Alan Feuer, Blossom search works quickly and unobtrusively to index content of Web sites, behind-the-firewall, and hybrid collections.

You can try the system by navigating to the home page for this Web log here and entering the search phrase in quotes “search imperative” and you will get this result:

search imperative blossom

When you run this query, you will see that the search terms are highlighted in red. The bound phrase is easily spotted. The key words in context snippet makes it easy to determine if I want to read the full article or just the extract.

Most Web log content baffles some search engines. For example, recent posts may not appear. The reason is that the index updating cycle is sluggish. Blossom indexes my Web site on a daily basis, but you can specify the update cycle appropriate to your users’ needs and your content. I update the site at midnight of each day, so a daily update allows me to find the most recent posts when I arrive at my desk in the morning.

The data management system for WordPress is a bit tricky. Our tests of various search engines identified three issues that came up when third-party systems were launched at my WordPress Web log:

  1. Some older posts were not indexed. The issue appeared to be the way in which WordPress handles the older material within its data management system.
  2. Certain posts could not be located. The posts were indexed, but the default OR for phrase searching displayed too many results. With more than 700 posts on this site, the precision of the query processing system was not too helpful to me.
  3. Current posts were not indexed. Our tests revealed several issues. The content was indexed, but the indexes did not refresh. The cause appeared to be a result of the traffic to the site. Another likely issue was WordPress’ native data management set up.

As we worked on figuring out search for Web logs, two other issues became evident. First, redundant hits (since there are multiple paths to the same content) as well as incorrect time stamps (since all of the content is generated dynamically). Blossom has figured out a way to make sense of the dates in Web log posts, a good thing from my point of view.

The Blossom engine operates for my Web log as a cloud service; that is, there is no on premises installation of the Blossom system. An on premises system is available. My preference is to have the search and query processing handled by Blossom in its data centers. These deliver low latency response and feature fail over, redundancy, and distributed processing.

The glitches we identified to Blossom proved to be no big deal for Dr. Feuer. He made adjustments to the Blossom crawler to finesse the issues with WordPress’ data management system. The indexing cycle does not choke my available bandwidth. The indexing process is light weight and has not made a significant impact on my bandwidth usage. In fact, traffic to the Web log continues to rise, and the Blossom demand for bandwidth has remained constant.

We have implemented this system on a site run by a former intelligence officer, which is not publicly accessible. The reason I mention this is that some cloud based search systems cannot conform to the security requirements of Web sites with classified content and their log in and authentication procedures.

The ArnoldIT.com site, which is the place for my presentations and occasional writings, is also indexed and search with the Blossom engine. You can try some queries at http://www.arnoldit.com/sitemap.html. Keep in mind that the material on this Web site may be lengthy. ArnoldIT.com is an archive and digital brochure for my consulting services. Several of my books, which are now out of print, are available on this Web site as well.

Pricing for the Blossom service starts at about $10 per month. If you want to use the Blossom system for enterprise search, a custom price quote will be provided by Dr. Feuer.

If you want to use the Blossom hosted search system on your Web site, for your Web log, or your organization, you can contact either me or Dr. Alan Feuer by emailing or phoning:

  • Stephen Arnold seaky2000 at yahoo dot com or 502 228 1966.
  • Dr. Alan Feuer arf at blossom dot com

Dr. Feuer has posted a landing page for readers of “Beyond Search”. If you sign up for the Blossom.com Web log search service, “Beyond Search” gets a modest commission. We use this money to buy bunny rabbit ears and paté. I like my logo, but I love my paté.

Click here for the Web log search order form landing page.

If you mention Beyond Search, a discount applies to bloggers who sign up for the Blossom service. A happy quack to the folks at Blossom.com for an excellent, reasonably priced, efficient search and retrieval system.

Stephen Arnold, September 5, 2008

« Previous PageNext Page »

  • Archives

  • Recent Posts

  • Meta