Google Search Appliance Gains Muscle
June 2, 2009
Update: June 3, 2009: Version 6.0 of the GSA software includes a SharePoint Web part.
Original story below:
If you looked at the line up of the Google Search Appliances on offer in February 2009, you probably noticed that the pricing discouraged organizations from indexing more than 30 million documents per appliance. To scale the system with the GB 7007s and GB 8008s cost millions.
Version 6.0 and a new GB 9009 were announced today. You can read Google’s own write up here. You can download a data sheet here. The features fall under the banner of universal search, but you will need a cheerful authorized reseller or partner to get the most from your GSA.
You can get some other information from several IDG publications, including:
Google today revealed that it has created a GSA on steroids to handle larger indexing jobs. The GB 8008 is no more. The new model is the GB 9009, and it is built on Dell’s PowerEdge R710 platform. Google is not into customer support, so the Dell crowd gets the honor of explaining what to do when a GB 9009 goes south.
The system consists of two components: one for content processing and one for storing the index. Until ArnoldIT.com can get up close and personal with one of these two part set ups, it is not clear what indexing and query processing changes may be necessary.
A PowerEdge in gray, wondering if it will be Googley.
We do know that Version 6.0 of the GSA software won’t run on low end or the older GB 8008s. This seems to suggest that an organization can mix and match the GB 7007s and the GB 8008s. If you haven’t been keeping up with the GSA software, Version 6.x gives system administrators more control over security, customization, and hit boosting. In the older versions of the software, Google decided its relevancy was exactly the relevancy the licensee needed. Period. There were some clunky Fast Search & Transfer type workarounds, but Version 6.0 makes the system’s controls a bit more flexible.
The PowerEdge R710 gussied up for the enterprise prom.
Autonomy, Endeca, and other companies were previously able to point out that Google’s enterprise solution was less configurable that other high end systems. That’s still true, but to a lesser degree. Keep in mind that the GSA is not a box of components that can be assembled like Legos. An appliance is designed to eliminate the expensive and time consuming set up, tuning, and customizing that some high end systems permit. The GB 9009 is a search toaster, bigger, faster, and more capable, but still a toaster.
Google’s distribution channel will be selling the two part set up in the morning. I don’t want to estimate the cost of the GB 9009. Google has a fuzzy wuzzy approach to some pricing, and it is better to wait until the authorized resellers close some deals for the gizmos and the “street price” becomes clearer. My hunch is that the Dell gear will up the cost. With GB 8008s coming out of the blocks at $659,000 for a 15 million GB 8008 with two years of support and about $300,000 for a fully supported hot back up GB 8008, the GB 9009 will be in the same ballpark.
What’s interesting to me is that these prices convert to about what a fully loaded enterprise search system license with customization can cost from one of the blue chip search vendors. Expensive to perform search, isn’t it. I wonder why the actual cost of industrial strength search is not included in the reports from the azure chip consulting firms or those who witlessly insist that “search is simple. Yes, a no brainer.”
I look for another upgrade early in 2010., At that time, the blue chip vendors will have to start sweating the fact that Googzilla is finally getting serious about the enterprise search market. One indication of the shift is that the GB 1001 is a goner and that the the new software won’t work with even numbered GSAs.
Stephen Arnold, June 2, 2009
Search Archaeology
May 30, 2009
I find it amusing to look at articles about search, content processing and text mining. Perhaps I am tired or just confused. The past to me stretches back to cards with holes and wire rods and to the original NASA RECON system. For Computer Active, the past stretches all the way back to Lycos. You may find this revisionist view of history interesting. Click here to read “Bunch of Fives: Forgotten Search Engines.”
Let me comment of the five search engines, adding a bit of addled goose color to the authors’ view of search:
- Cuil.com. Cuil is a product of a Googler (Anna Patterson), her husband, and some other wizards. The company had a connection to Google. Dr. Patterson’s patents are still stumbling out of the USPTO with Google as an assignee. Xift, Dr. Patterson’s search system, was not mentioned in Computer Active. It was important for its semantic method and it exposed Dr. Patterson to the Alta Vista team. Alta Vista played some role in Google’s rise to success and its current plumbing. Cuil has improved, and I thought I saw a result set including some Google content before the system became publicly available. I use Cuil.com, and I am not sure if “forgotten” is a good word for it or its technology.
- MSN Live. I have lost count of Microsoft’s search systems. Microsoft search initiatives have moved through many iterations. The important point for me is that Microsoft is persistent. The search technology is an amalgamation of home grown, licensed, purchased, and reworked components. The search journey for Microsoft is not yet over. Bing is a demo. The rebuild of Fast as a SharePoint product is now in demo stage but not yet free of its Web and Linux roots. More to come on this front and, believe me, Microsoft search is not forgotten by Google or others in the search business.
- Alta Vista. Yep, big deal. The reason is that Alta Vista provided the Googlers with a pool of experienced and motivated talent. The job switch from the hopelessly confused Hewlett Packard to the freewheeling Google was an easy one. Alta Vista persists today, and I still use the service for certain types of queries. What’s interesting is that Alta Vista may have been one of the greatest influences on both Google and Microsoft. Again. Not forgotten.
- Lycos. We sold our Point system to Lycos, so I have some insight into that company’s system. The key point for me is that Fuzzy and his fellow band of coders from Carnegie Mellon sparked the interest in more timely and comprehensive Web search. Lycos was important at a sparkplug, but the company was among the first to add some important index update features and expanded snippets for each hit. Lycos has had a number of owners, but I won’t forget it. When we sold Point to the outfit, the check cleared the bank. That I will remember along with the fact that architectural issues hobbled the system just as the Excite Architext system was slowed. These are search as portal examples today.
- Ask Jeeves. I can’t forget. One of the first Ask Jeeves execs used to work at Ziff. I followed the company’s efforts to create query templates that allowed the system to recognize a question and then deliver an answer. The company was among the first to bill this approach “natural language” but it wasn’t. Ask Jeeves was a look up service and it relied on humans to find answers to certain questions. Ask.com is the descendent of Ask Jeeves’ clunky technology, but the system today is a supported by ace entrepreneur Barry Diller who, like Steve Ballmer, is persistent. The key point about Ask Jeeves is that it marketed old technology with a flashy and misleading buzzword “natural language”. Marketers of search systems today practice this type of misnaming as a standard practice. Who can forget this when a system is described one way and then operates quite another.
Enjoy revisionism. Much easier in a Twitter- and Facebook-centric world with a swelling bulge of under 40 experts, mavens, and pundits. These systems failed in some ways and succeeded in others. I remember each. I still use each, just not frequently.
Stephen Arnold, May 31, 2009
Social Search and Security
May 27, 2009
Might these terms comprise an oxymoron? Some organizations are plunging forward with social networking, social search, and open collaboration. You may find Vanessa Ho’s “Risks Associated with Web 2.0” here a useful article. She summarizes the results of a study by an outfit called Dynamic Markets conducted for WebSense. With a sample of 1,300 business professionals, the report contained some interesting information. This statement from the article struck a chord in me:
“The thing about the web is once it is out there, it is out there [forever],” Meizlik [a WebSense executive’] noted. Other findings of the survey include 80 per cent of respondents reported feeling confident in their organizations web security, despite the fact that the numbers show that they are ill-equipped to protect against Web 2.0 security threats. For example, 68 per cent do not have real-time analysis of web content; 59 per cent cannot prevent URL re-directs; 53 per cent do not have security solutions that stop spyware from sending information to bots; 52 per cent do not have solutions to detect embedded malicious code on trusted websites; and 45 per cent do not have data loss prevention technology to stop company-confidential information from being uploaded to sites like blogs and wikis, hosted on unauthorized cloud computing sites or leaked as a result of spyware and phishing attacks.
I learned from a chirpy 30 year old conference manager last week that security is not an issue of interest to that conference attendee audience. Yep, those 30 somethings set me straight again.
Stephen Arnold, May 28, 2009
Useful SQL Injection Info
May 27, 2009
At Los Alamos National Lab several years ago, a fellow speaker for an in-house conference gave a brilliant analysis of SQL injection. The talk was not made public. I came across a Bitpipe white paper from Breach. I have a Bitpipe user name and password, so locating the document was no problem. If you don’t have access to Bitpipe, click here, fill out the form, and download the seven page document. Useful.
Stephen Arnold, May 27, 2009
EDI Data Transformation
May 27, 2009
Most of the mavens and pundits write about a handful of search vendors. Not me. I grub around in the dark and often very important corners of search. If you have to transform data for EDI in an XML environment, you will find Alex Woodie’s “MegaXML Looks to Drive Expense Out of EDI” here useful. Mr. Woodie describes a new product, which if it works as described can eliminate some sleepless nights and a long weekend or two. The article describes Task Performance Group’s MegaXML utility. For me, the key passage in the article was:
Task Performance Group launched MegaXML a decade ago to take advantage of the flexibility of XML. On the front end, the Windows-based product can generate and send EDI documents, such as purchase orders and invoices, over VANs or the Internet using protocols like AS2. And on the backend, MegaXML can translate EDI documents to the format needed for specific platforms, such as flat files for AS/400-based ERP systems on DB2.
MegaXML has a hybrid or semi-cloud option that may be worth investigating. Mr. Woodie wrote:
With the outsourcing option, MegaXML will reside on a Windows server in Task Performance Group’s data center near Chicago. After mapping the EDI documents to the customer’s systems (a process that takes a few days), the customer will upload and download documents to the MegaXML data center using Secure FTP (S/FTP). MegaXML, in turn, will handle the translation to EDI formats and the distribution via AS2 or another method.
Data transformation consumes a significant portion of an information technology group’s time and budget. MegaXML may be a partial solution in some situations. More information is available at www.megaXML.com
Stephen Arnold, May 27, 2009
Cyberwarfare Attack Devices
May 26, 2009
If you worry about enterprise search, you won’t find much of interest in this Aviation Week. The addled goose, on the other hand, sees the story “Network Attack Weapons Emerge” here by David Fulghum as a precursor of similar information initiatives in the business arena. Information is a strategic asset and methods to locate, disrupt, intercept, and analyze those assets are going to remain and become increasingly significant. The core of the Aviatiion Week story was this comment:
Devices to launch and control cyber, electronic and information attacks are being tested and refined by the U.S. military and industry in preparation for moving out of the laboratory and into the warfighter’s backpack.
Mr. Fulghum added:
The Russians conducted a cyberattack that was well coordinated with what Russian troops were doing on the ground,” says a longtime specialist in military information operations. “It was obvious that someone conducting the cyber[war] was talking to those controlling the ground forces. They knew where the [cyber]talent was [in Russia], how to use it, and how to coordinate it. “That sophisticated planning at different levels of cyberwarfare surprised a lot of people in the Defense Dept.,” he says. “It looked like a seamless, combined operation that coordinated the use of a range of cyberweapons from the sophisticated to the high school kids that thought it was cool to deface official web sites. The techniques they used everybody knows about. The issue was how effective they were as part of a combined operation.”
I found interesting his description of the components of a cyberattack toolkit:
The three major elements of a cyberattack system are its toolbox, planning and execution capabilities. The toolbox is put together by the hardware and software experts in any organization to address specific missions. They maintain the database of available capabilities.
Worth reading.
Stephen Arnold, May 26, 2009
Tough to Search When Computers Are Off
May 22, 2009
Courant.com reported that a computer virus caused problems for the US Marshal’s information system. You can read “Mystery Virus Strikes Law Enforcement Computers, Forcing FBI, US Marshals to Shut Down Parts of Networks” here. Security is an important consideration in online systems and for search and content processing. Tough to perform information retrieval when the computers are off line.
Stephen Arnold, May 22, 2009
UK Military Security
May 18, 2009
I am not sure if this Web log post is true. If it is, a rethink of security in the UK’s Ministry of Defence is likely to be forthcoming. The article “MoD Loses 28 Laptops This Year” here reported:
The laptops were lost between January 1 and 11 May 2009. The Ministry of Defence also admitted to losing 20 flash drives and 4 PCs in the same period.
Not much need for exotic search and retrieval when you have the gizmo in your possession. I wonder if the security method was developed at Los Alamos National Lab?
Stephen Arnold, May 18, 2009
Sphinx: Inscrutable Search
May 9, 2009
The Register’s Ted Dziuba’s “Sphinx – Text Search the Pirate Bay Way” here is a good case example for open source search technology. Before you cancel your Microsoft Fast ESP license, keep in mind that Sphinx is for structured data, specifically MySQL tables. You can get more detail here. There are some doubters in the crowd, particularly with regard to open source search technology. Based on the email I receive and the implementations I have examined, the open source search technology cannot be dismissed or ignored. For me, one of the more interesting comments in the article was:
Internet-famous MySQL wonk Jeremy Zawodny, who had the foresight to jump from the ship’s bow as Yahoo started to take on water, replaced MySQL full text search at Craigslist with Sphinx. Craigslist used 25 machines to handle roughly 50 million queries per day on MySQL. Under that kind of load, Zawodny found that MySQL wasn’t using much CPU or doing much disk I/O, which means it’s spending all of its time waiting on thread locks. Oops. Maybe we should have paid attention to parallelism after all. The Sphinx implementation took those 25 machines down to 10, with plenty of room to grow. While Sphinx didn’t handle the traffic out of the box at the time, Zawodny was able to patch it to handle Craigslist’s specific need – and fix a few bugs along the way.
The “green angle” is important. The comments about vowels and stopwords are also interesting. Worth putting this write up in the open source search archive.
Stephen Arnold, May 9, 2009
SharePoint Overview
May 6, 2009
Barb Mosher wrote “SharePoint Online (SaaS) Review – What It Is and Isn’t.” You can find the full write up published by CMS Wire here. Ms. Mosher has done a very good job of explaining the Software as a Service implementation of SharePoint. She walks through the basics and provides some screenshots. She has done what she could to make these screenshots easy to follow, but I find the steps for some basic tasks convoluted. Addled geese are not good candidates for SharePoint wisdom, I suspect. The most useful part of the article is her description and lists of what is included and what is not included. With regards to search, it seemed that only the bare bones of queries within a site are supported. I have questions about the stability of SharePoint from the cloud, which she did not address. Latency also triggers questions in my mind. Useful information to download and keep close at hand.
Stephen Arnold, May 6, 2009