New Beyond Search White Paper: Coveo G2B for Mobile Email Search
September 8, 2008
The Beyond Search research team prepared a white paper about Coveo’s new G2B for Email product. You can download a copy from us here or from Coveo here. Coveo’s system works across different mobile devices, requires no third-party viewers, delivers low-latency access when searching, evidenced no rendering issues, and provided access to contacts and attachments as well as the text in an email. When compared to email search solutions from Google, Microsoft and Yahoo–Coveo’s new service provided a more robust and functional service. Beyond Search identified 13 features that set G2B apart. These include a graphical administrative interface, comprehensive usage reports, and real time indexing of email. The Beyond Search research team—Stephen Arnold, Stuart Schram, Jessica Bratcher, and Anthony Safina–concluded that Coveo established a new benchmark for mobile email search. For more information about Coveo, navigate to www.coveo.com. Pricing information is available from Coveo.
Stephen Arnold, September 5, 2008
Life before Google: History from Gen X
September 7, 2008
When I am in the UK, I enjoy reading the London papers. The Guardian often runs interesting and quirky stories. My newsreader delivered to me “Life before Google” by Kevin Anderson who was in college in the 1990s. Ah. Gen X history. I dived right in, and you may want to read this article here. After a chronological run down of Web search (happily ignoring the pre-Web search systems), Mr. Anderson wrote:
Using the almost 250 year-old theories British mathematician and Presbyterian minister Thomas Bayes, Page and Brin developed an algorithm to analyse the links to a site, helping to predict what sites were relevant to search terms.
This is a comment that is almost certain to catch the attention of Autonomy, the British vendor that has claimed Bayesian methods as its core technology.
Then Mr. Anderson added:
Google hasn’t solved search. There is still the so-called dark web, or deep web – terabytes of data that aren’t searchable or indexed.
Mr. Anderson, despite his keen Gen X intellect, overlooked Google’s Programmable Search Engine inventions or this query on Google. air schedule LGA SFO. The result displayed is
What you are looking as is a “deep Web” search result. Mr. Anderson also overlooked the results for Baltimore condo.
The results displayed when I ran this search on September 6, 2008, at 7 10 pm Eastern were:
Yep, another “deep Web” search.
What’s the problem with Gen X research for Mr. Anderson’s article? I think for this article it was shallow. Much of the analysis of Google is superficial, incomplete, and misleading in my opinion. Agree or disagree? Help me learn.
Stephen Arnold, September 7, 2008
WordLogic, Codima: Entering the Search War
September 6, 2008
WordLogic (Vancouver, BC) and Codima (Edmonton, AB) have teamed in a joint venture to develop Web search technology. Not much information is available on the tie up. Mediacaster Magazine has a short announcement of the deal here. WordLogic has carved a path for itself in mobile device interfaces. Codima is a VoIP specialist. More information about this company is here. Mobile search is attracting interest from Google and Yahoo. Coveo, another Canadian outfit, has a mobile email search service that looks very solid. As more information becomes available about the WordLogic and Codima play, I will pass the information along.
Stephen Arnold, September 6, 2008
TinEye: Image Search
September 5, 2008
A happy quack to the reader who tipped me about TinEye, a search system that purports to do for images what the GOOG did for test.” The story about TinEye that I saw appeared in the UK computer news service PCPro.co.uk. The story “Visual Search Engine Is Photographer’s Best Friend” is here. The visual search engine was developed by Idée, based in Toronto. The company says:
TinEye is the first image search engine on the web to use image identification technology. Given an image to search for, TinEye tells you where and how that image appears all over the web—even if it has been modified.
The image index contains about one billion images. Search options include uploading an image for the system to pattern match, an image url, or via a plug in for Firefox or Internet Explorer.
Search results are displayed graphically. You can explore the images with a mouse click. One interface appears below:
The technology powering the service is Espion. I couldn’t locate a public demonstration of the service. You can request a demonstration of the system here. Toronto is becoming a hot bed of search activity. Arikus and Sprylogics both operate there. OpenText has an office. Coveo is present. I will add this outfit to my list of Canadian vendors.
Stephen Arnold, September 5, 2008
Blossom Search for Web Logs
September 5, 2008
Over the summer, several people have inquired about the search system I use for my WordPress Web log. Well, it’s not the default WordPress engine. Since I wrote the first edition of Enterprise Search Report (CMSWatch.com), I have had developers providing me with search and content processing technology. We’ve tested more than 50 search systems in the last year alone. After quite a bit of testing, I decided upon the Blossom Software search engine. This system received high marks in my reports about search and content processing. You can learn more about the Blossom system by navigating to www.blossom.com. Founded by a former Bell Laboratories’ scientist, Dr. Alan Feuer, Blossom search works quickly and unobtrusively to index content of Web sites, behind-the-firewall, and hybrid collections.
You can try the system by navigating to the home page for this Web log here and entering the search phrase in quotes “search imperative” and you will get this result:
When you run this query, you will see that the search terms are highlighted in red. The bound phrase is easily spotted. The key words in context snippet makes it easy to determine if I want to read the full article or just the extract.
Most Web log content baffles some search engines. For example, recent posts may not appear. The reason is that the index updating cycle is sluggish. Blossom indexes my Web site on a daily basis, but you can specify the update cycle appropriate to your users’ needs and your content. I update the site at midnight of each day, so a daily update allows me to find the most recent posts when I arrive at my desk in the morning.
The data management system for WordPress is a bit tricky. Our tests of various search engines identified three issues that came up when third-party systems were launched at my WordPress Web log:
- Some older posts were not indexed. The issue appeared to be the way in which WordPress handles the older material within its data management system.
- Certain posts could not be located. The posts were indexed, but the default OR for phrase searching displayed too many results. With more than 700 posts on this site, the precision of the query processing system was not too helpful to me.
- Current posts were not indexed. Our tests revealed several issues. The content was indexed, but the indexes did not refresh. The cause appeared to be a result of the traffic to the site. Another likely issue was WordPress’ native data management set up.
As we worked on figuring out search for Web logs, two other issues became evident. First, redundant hits (since there are multiple paths to the same content) as well as incorrect time stamps (since all of the content is generated dynamically). Blossom has figured out a way to make sense of the dates in Web log posts, a good thing from my point of view.
The Blossom engine operates for my Web log as a cloud service; that is, there is no on premises installation of the Blossom system. An on premises system is available. My preference is to have the search and query processing handled by Blossom in its data centers. These deliver low latency response and feature fail over, redundancy, and distributed processing.
The glitches we identified to Blossom proved to be no big deal for Dr. Feuer. He made adjustments to the Blossom crawler to finesse the issues with WordPress’ data management system. The indexing cycle does not choke my available bandwidth. The indexing process is light weight and has not made a significant impact on my bandwidth usage. In fact, traffic to the Web log continues to rise, and the Blossom demand for bandwidth has remained constant.
We have implemented this system on a site run by a former intelligence officer, which is not publicly accessible. The reason I mention this is that some cloud based search systems cannot conform to the security requirements of Web sites with classified content and their log in and authentication procedures.
The ArnoldIT.com site, which is the place for my presentations and occasional writings, is also indexed and search with the Blossom engine. You can try some queries at http://www.arnoldit.com/sitemap.html. Keep in mind that the material on this Web site may be lengthy. ArnoldIT.com is an archive and digital brochure for my consulting services. Several of my books, which are now out of print, are available on this Web site as well.
Pricing for the Blossom service starts at about $10 per month. If you want to use the Blossom system for enterprise search, a custom price quote will be provided by Dr. Feuer.
If you want to use the Blossom hosted search system on your Web site, for your Web log, or your organization, you can contact either me or Dr. Alan Feuer by emailing or phoning:
- Stephen Arnold seaky2000 at yahoo dot com or 502 228 1966.
- Dr. Alan Feuer arf at blossom dot com
Dr. Feuer has posted a landing page for readers of “Beyond Search”. If you sign up for the Blossom.com Web log search service, “Beyond Search” gets a modest commission. We use this money to buy bunny rabbit ears and paté. I like my logo, but I love my paté.
Click here for the Web log search order form landing page.
If you mention Beyond Search, a discount applies to bloggers who sign up for the Blossom service. A happy quack to the folks at Blossom.com for an excellent, reasonably priced, efficient search and retrieval system.
Stephen Arnold, September 5, 2008
Google on Chrome: What We Meant Really… No, Really
September 4, 2008
You must read Matt Cutts’s “Google Does Not Want Rights to Things You Do Using Chrome”. First, click here to read the original clause about content and rights. Now read the September 3, 2008, post about what Google * really * meant to say here. I may be an addled goose in rural Kentucky but I think the original statements in clause 11.1 expressed quite clearly Google’s mind set.
It sure seems to me that the two Google statements–the original clause 11.1 and Mr. Cutts’s statements–are opposite to one another. In large companies this type of “slip betwixt cup and lip” occurs frequently. What struck me as interesting about Google is that it is acting in what I call due to my lack of verbal skill, “nerd imperialism”.
What troubles me is the mounting evidence in my files that Google can do pretty much what it wants. Mr. Cutts’ writing is a little like those text books that explain history to suit the needs of the school district or the publisher.
Google may house it lawyers one mile from Shoreline headquarters, but the fact is that I surmise that Google’s legal eagles wrote exactly what Google management wanted. Further I surmise that Google needs Chrome to obtain more “context” information from Chrome users. I am speculating but I think the language of the original clause was reviewed, vetted, and massaged to produce the quite clear statements in the original version of clause 11.1.
When the the firestorm flared, Google felt the heat and rushed backwards to safety. The fix? Easy. Googzilla rewrote history in my opinion. The problem is that the original clause 11.1 showed the intent of Google. That clause 11.1 did not appear by magic from the Google singularity. Lawyers drafted it; Google management okayed the original clause 11.1. I can almost hear a snorting chuckle from Googzilla, but that’s my post heart attack imagination and seems amusing to me. (I was a math club member, and I understand mathy humor but not as well as a “real” Googler, of course.)
If you have attended my lecture on Google’s container invention or read my KMWorld feature about Google’s data model for user data, are you able to see a theme? For me, the core idea of the original clause 11.1 was to capture more data about “information.” Juicy meta information like who wrote what, who sent what to whom, and who published which fact where and when. These data are available in a dataspace managed by a dataspace support platform or DSSP which Google may be building.
Google wants these meta metadata to clean up the messiness of ambiguity in information. Better and more data means that predictive algorithms work with more informed thresholds. To reduce entropy in the information it possesses, you need more, better, and different information–lots of information. For more on usage tracking and Google’s technology, you can find some color in my 2005 The Google Legacy and my 2007 Google Version 2.0. If you are an IDC search research customer, you can read more about dataspaces in IDC report 213562. These reports cost money, and you will have to contact my publishers to buy copies. (No, I don’t give these away to be a kind and friendly former math club member. Giggle. Snort. Snort.)
Plus, I have a new Google monograph underway, and I will be digging into containers, janitors, and dataspaces as these apply to new types of queries and ad functions. For me the net net is that I think Google’s lawyers got it right the first time. Agree? Disagree? Help me learn.
Stephen Arnold, September 4, 2008
Google and Key Stroke Logging
September 4, 2008
Auto suggest is a function that looks at what you are typing in a search box. The agent displays words and phrases that offer suggestions. Sometimes called auto complete, you arrow down to the phrase you want and hit enter. The agent runs the query with the word or phrase you selected. This function turned up a couple of years ago on the Yahoo AllTheWeb.com search system. Now, it’s migrated to Google. You will want to read Ina Fried’s “Chrome Let’s Google Log User Keystrokes”, published on September 4, 2008, to get some additional information about this feature. Her point is that when you or I select a suggested search phrase, that selection is noted and sent to Google. For me, the most interesting point in her article was:
Provided that users leave Chrome’s auto-suggest feature on and have Google as their default search provider, Google will have access to any keystrokes that are typed into the browser’s Omnibox, even before a user hits enter. Google intends to retain some of that data even after it provides the promised suggestions. A Google representative told sister site CNET News.com that the company plans to store about two percent of that data, along with the IP address of the computer that typed it.
When I read statements assuring me that an organization will store “about two percent of that data”, I think about phrases such as “Your check is in the mail”. Based on my research, the substantive value of lots of clicks is that “two percent”. Here’s why. Most queries follow well worn ruts. If you’ve been to Pompei, you can see grooves cut in the roadway. Once a cart or chariot is in those grooves, changing direction is tough. What’s important, therefore, is not the ones in the grooves. What’s important are those carts that get out of the grooves. As Google’s base of user data grows, the key indicators are variances, deltas, and other simple calculations that provide useful insights. After a decade of capturing queries about pop stars, horoscopes, and PageRank values, that “two percent” is important. I ask, “How do I know what happens to that other 98 percent of the usage data?” The check is in the mail.
Stephen Arnold, September 4, 2008
Security Dents Chrome
September 4, 2008
InfoWeek, now an online only publication, published Early Security Issues Tarnish Google’s Chrome” on September 3, 2008. Nancy Gohring has gathered a number of Chrome security issues. You can read the full text of her article here. She catalogs hacker threats, malicious code, Java vulnerabilities, and more. For me, the most interesting statement in the story was:
Google did not directly address questions about this [file download] vulnerability or whether it plans to make any changes to Chrome to prevent any potential problems.
This “no comment” and “indirection” clashes with Google’s transparency push. When I read this sentence is Ms. Gohring’s article, I wondered why journalists don’t confront Google about its slither away and ignore approach to important questions. Transparency? I see a magician’s finesse at work.
What do your perceive?
Stephen Arnold, September 4, 2008
Googzilla Plays Crawfish: Back Tracking on Chrome Terms
September 4, 2008
Ina Fried wrote “Google Backtracks on Chrome License Terms”. You can read her CNet story here. The point of the story is that Google has withdrawn some of the language of its Chrome license terms. Ms. Fried wrote:
Section 11 now reads simply: “11.1 You retain copyright and any other rights you already hold in Content which you submit, post or display on or through, the Services.”
For me, this this sudden reversal is good news and bad news. The good news is that the GOOG recognized that it was close to becoming a Microsoft doppelgänger and reversed direction–fast. The bad news is that the original terms make it clear that Google’s browser containers would monitor the clicks, context, content, and processes of a user. Dataspaces are much easier to populate if you have the users in a digital fishbowl. The change in terms does little to assuage my perception of the utility of dataspaces to Google.
To catch up on the original language, click here. To find out a bit about dataspaces, click here.
Stephen Arnold, September 4, 2008
A Vertical Search Engine Narrows to a Niche
September 4, 2008
Focus. Right before I was cut from one of the sports teams I tried to join I would hear, “Focus.” I think taking a book to football, wrestling, basketball, and wrestling practice was not something coaches expected or encouraged. Now SearchMedica, a search engine for medical professionals, is taking my coach’s screams of “Focus” to heart. The company announced on September 3, 2008, a practice management category. The news release on Yahoo said:
The new category connects medical professionals with the best practice management resources available on the Web, including the financial, legal and administrative resources needed to effectively manage a medical practice.
To me the Practice Management focus is a collection of content about the business of running a health practice. In 1981, ABI/INFORM had a category tag for this segment of business information. Now, the past has been rediscovered. The principal difference is that access to this vertical search engine is free to the user. ABI/INFORM and other commercial databases charge money, often big money to access their content.
If you want to know more about SearchMedica, navigate to www.searchmedica.com. The company could encourage a host of copy cats. Some would tackle the health field, but others would focus on categories of information for specific user communities. If SearchMedica continues to grow, it and other companies with fresh business models will sign the death sentence for certain commercial database companies.
The fate of traditional newspapers is becoming increasingly clear each day. Super star journalists are starting Web logs and organizing conferences. Editors are slashing their staff. Senior management teams are reorganizing to find economies such as smaller trim sizes, fewer editions, and less money for local and original reporting. My though is that companies like SearchMedica, if they get traction, will push commercial databases companies down the same ignominious slope. Maybe one of the financial sharpies at Dialog Information Services, Derwent, or Lexis Nexis will offer convincing data that success is in their hands, not the claws of Google or upstarts like SearchMedica. Chime in, please. I’m tired of Chrome.
Stephen Arnold, September 4, 2008