Endeca: Push into Education and Training
March 5, 2009
Endeca, http://www.endeca.com, is expanding its information access software business by connecting education and training customers with more specialized solutions. You can read the press release here. Solutions from Endeca Education Services here, include customized training curriculum served on site or online. The goal is to get pre-packaged, flexible solutions that speed up business performance to their customers in these trying economic times. Part of the attraction of Endeca’s expanded offerings is the ability to pre-purchase training at a discounted rate. There are a lot of information access companies in this industry, and education services are particularly dependent on critical technology. It’s a really good move on Endeca’s part to expand in that venue to tap so much opportunity. On the other hand, the shadows of Apple and Google have begun to creep into the education market. Excitement ahead in a large business sector perhaps?
Jessica W. Bratcher, March 5, 2009
MapReduce in a Browser: A Glimpse of the Google in 2011
March 4, 2009
I have no idea who is behind Igvita, but I will pay closer attention. You will want to read “Collaborative Map-Reduce in the Browser” here. When I read the write up and scanned the code, I thought, “Yep, this is the angle the Google is taking with Chrome, containers, and a bunch of other Googley patent documents’ “inventions”. I won’t spoil your fun. For me, the most important information in the write up is the diagram. A happy quack to Igvita. Heck, have two quacks.
Stephen Arnold, March 4, 2009
Mysteries of Online 9: Time
March 3, 2009
Electronic information has an interesting property: time distortion. The distortion has a significant effect on how users of electronic information participate in various knowledge processes. Information carries humans along much as a stream whisks a twig in the direction of the flow. Information, unlike water, moves in multiple directions, often colliding, sometimes reinforcing, and at others in paradoxical ways that leave a knowledge worked dazed, confused, and conflicted. The analogy of information as a tidal wave connotes only a partial truth. Waves come and go. Information flow for many people and systems is constant. Calm is tough to locate.
Vector fields. Source: http://www.theverymany.net/uploaded_images/070110_VectorField_test012_a-789439.jpg
In the good old days of cuneiform tablets, writing down the amount of wheat Eknar owed the king required specific steps. First, you had to have access to suitable clay, water, and a clay kneading specialist. Second, you needed to have a stylus of wood, bone, or maybe the fibula of an enemy removed in a timely manner. Third, you had to have your data ducks in a row. Dallying meant that the clay tablet would harden and make life more miserable than it already was. Once the document was created, the sun or kiln had to cooperate. Once the clay tablet was firm enough to handle without deleting a mark for a specified amount of wheat, the tablet was stacked in a pile inside a hut. Forth, the access the information, the knowledge worker had to locate the correct hut, find the right pile, and then inspect the tablets without breaking one, a potentially bad move if the king had a short temper or needed money for a war or a new wife.
In the scriptorium in the 9th century, information flow wasn’t much better. The clay tablets had been replaced with organic materials like plant matter or for really important documents, the scraped skin of sheep. Keep in mind that other animals were used. Yep, human skin worked too. Again time intensive processes were required to create the material on which a person would copy or scribe information. The cost of the materials made it possible to get patrons to spit out additional money to illustrate or illuminate the pages. Literacy was not widespread in the 9th century and there were a number of incentives to get sufficient person power to convert foul papers to fair copies and then to compendia. Not just anyone could afford a book. Buying a book or similar document did not mean the owner could read. The time required to produce hand copies was somewhat better than the clay tablet method or the chiseled inscriptions or brass castings used by various monarchs.
Yep, I will have it done in 11 months, our special rush service.
With the invention of printing in Europe, the world rediscovered what the Chinese had known for 800, maybe a thousand years. No matter. The time required to create information remained the same. What changed was that once a master set of printing plates had been created. A printer with enough capital to buy paper (cheaper than the skin and more long lasting than untreated plant fiber and less ink hungry than linen based materials) could manufacture multiple copies of a manuscript. The out of work scribes had to find a new future, but the impact of printing was significant. Everyone knows about the benefits of literacy, books, and knowledge. What’s overlooked is that the existence of books altered the time required to move information from point A to point B. Once time barriers fell, distance compressed as well. The world became smaller if one were educated. Ideas migrated. Information moved around and had impact, which I discussed in another Mysteries of Online essay. Revolutions followed after a couple hundred years, but the mindless history classes usually ignore the impact of information on time.
If we flash forward to the telegraph, time accelerated. Information no longer required a horse back ride, walk, or train ride from New York to Baltimore to close a real estate transaction. Once the new fangled electricity fell in love with information, the speed of information increased with each new innovation. In fact, more change in information speed has occurred since the telegraph than in previous human history. The telephone gave birth to the modem. The modem morphed into a wireless USB 727 device along with other gizmos that make possible real time information creation and distribution.
Time Earns Money
I dug out notes I made to myself sometime in the 1982 – 1983 time period. The implications of time and electronic information caught my attention for one reason. I noted that the revenue derived from a database with weekly updates was roughly 30 percent greater than information derived from the same database on a monthly update cycle. So, four updates yielded a $1.30, not $1.00. I wrote down, “Daily updates will generate an equal or greater increase.” I did not believe that the increase was infinite. The rough math I did 25 years ago suggested that with daily updates the database would yield about 1.6 percent more revenue than the same database with a monthly update cycle. In 1982 it was difficult to update a commercial database more than once a day. The cost of data transmission and service charges would gobble up the extra money, leaving none for my bonus.
In the financial information world, speed and churn are mutually reinforcing. New information makes it possible to generate commissions.
Time, therefore, not only accelerated the flow of information. Time could accelerate earnings from online information. Simply by u9pdating a database, the database would generate more money. Update the database less frequently, the database would generate less money. Time had value to the users.
I found this an interesting learning, and I jotted it down in my notebook. Each of the commercial database in which I played a role were designed for daily updates and later multiple updates throughout the day. To this day, the Web log in which this old information appears is updated on a daily basis and several times a week, it is updated multiple times during the day. Each update carries and explicit time stamp. This is not for you, gentle and patient reader. The time stamp is for me. I want to know when I had an idea. Time marks are important as the speed of information increases.
Implications
The implications of my probably third-hand insight included:
- The speed up in dissemination means that information impact is broader, wider, and deeper with each acceleration.
- Going faster translates to value for some users who are willing and eager to pay for speed. The idea is that knowing something (anything) first is an advantage.
- Fast is not enough. Customers addicted to information speed want to know what’s coming. The inclusion of predictive data adds another layer of value to online services.
- Individuals who understand the value of information speed have a difficult time understanding why more online systems and services cannot deliver what is needed; that is, data about what will happen with a probability attached to the prediction. Knowing that something has a 70 chance of taking place is useful in information sensitive contexts.
Let me close with one example of the problem speed presents. The Federal government has a number of specialized information systems for law enforcement and criminal justice professionals. These systems have some powerful, albeit complex, functions. The problem is that when a violation or crime occurs, the law enforcement professionals have to act quickly. The longer the reaction time, the greater the chance that the bad egg will tougher to apprehend increases. Delay is harmful. The systems, however, require that an individual enter a query, retrieve information, process it and then use another two or three systems in order to get the reasonably complete picture of the available information related to the matter under investigation.
The systems have a bottleneck. The human. Law enforcement personnel, on the other hand, have to move quickly. As a result, the fancy online systems operate in one time environment and the law enforcement professionals operate in another. The opportunity to create systems that bring both time universes together is significant. Giving a law enforcement team mobile comms for real time talk is good, but without the same speedy and fluid access to the data in the larger information systems, the time problem becomes a barrier.
Opportunity in online and search, therefore, is significant. Vendors who pitch another fancy search algorithm are missing the train in law enforcement, financial services, competitive intelligence, and medical research. Going fast is no longer a way to add value. Merging different time frameworks is a more interesting area to me.
Stephen Arnold, February 26, 2009
YAGG Update: PageRank Tweak or Bug
March 2, 2009
If you are mesmerized by things Google, you will want to navigate to Search Engine Roundtable and read “Google March 2009 PageRank Update or Glitch?” here. The article provides links to a couple of posts that identify what may be a potential glitch or goof as in “yet another Google goof” or YAGG. I know the acronym annoys Alex, a potential Googlephile. The article quotes a Googler who uses the phrase “some kind of glitch”, which may be old news if you were bitten by the Gfail issue a few days ago.
Stephen Arnold, March 2, 2009
Register Reports Microsoft Cloud Database Plan
February 28, 2009
SQL Server comes with a search function. SQL Server also is the muscle behind some of SharePoint’s magic. With the move to the cloud, Microsoft’s database plans have been a bit of a mystery to me. The Register provided some useful information and commentary about SQL Server in “Microsoft Cloud to Get ‘Full’ SQL Server Soon?” here. The Register reported that Microsoft may offer two different data storage options. Details are murky but Microsoft seems content to offer multiple versions of Vista. SharePoint comes in different flavors. Microsoft offers a number of search options. I find it difficult to figure out what’s available and what features are available in these splinter products. If the Register was right, then the same consumer product strategy used for shampoo and soup may be coming to the cloud. I find multiple variants of one product confusing, but I am definitely an old goose, somewhat uncomfortable in the hip new world of branding and product segmentation.
Stephen Arnold, February 28, 2009
When Search Bites Your Ankle
February 27, 2009
Search means Google. I suppose one can be generous and include Live.com and Yahoo.com search in the basket even though search is, for most people, Google. When I read “Malware Tricking Search Engines, and You Too” here, I auto inserted Google whether or not the author wanted me to. The point of the write up is that bad nerds figured out a way to spoof Facebook users, for example. This is a bit like snookering the seven year old neighbor’s kid at Halloween in my opinion. The idea is that
if you Googled “Error Check System” you were pushed links to malware-infested sites. The recent GMail outage produced a similar problem; Googling “Gmail Down” got you lots of malware.
My question is, “Where does the responsibility rest?” Is the operating system outfit responsible? Is the search vendor responsible? Is the Web site responsible? For me, the most interesting comment in the story was this stunner:
the end result was to push rogue anti-malware to the user. This really does seem to the star of the malware world in that it directly brings in money.
Two comments: Maybe the author would like to have a malware Oscar like award for this “star of the malware world.” And, yep, make the user responsible. Most computing device users really know what’s what with their systems. Great idea. The buck stops where? At my 84 year old father. Right. He’s able to spot malware just fine. I bet he does this as well as your mother and father do.
Stephen Arnold, February 27, 2009
YAGG: Google Talk
February 24, 2009
Tweets and posts are flying by about an alleged pfishing exploit for Google email. Mashable reports here that another issue may be poking its snout into hapless users’ lives. Adam Ostrow wrote:
Gmail is now being attacked by a phishing scam that is spreading like wildfire.
If true, YAGG strikes again. You get message “check me out” with a link to a tinyurl. Click the puppy and you go to “a site called ViddyHo.” Lucky you. Your contacts get an email. Nifty. Love those tiny urls which mask the destination url.
Stephen Arnold, February 24, 2009
Google: Suddenly Too Big
February 22, 2009
Today Google is too big. Yesterday and the day before Google was not too big. Sudden change at Google or a growing sense that Google is not the quirky Web search and advertising company everyone assumed Googzilla was?
The New York Times’s article by professor Randall Stross available temporarily here points out that some perceive Google as “too big.” Mr. Stross quotes various pundits and wizards and adds a tasty factoid that Google allowed him to talk to a legal eagle. Read the story now so you can keep your pulse on the past. Note the words the past. (You can get Business Week’s take on this same “Google too powerful” here.)
The fact is that Google has been big for years. In fact, Google was big before its initial public offering. Mr. Stross’s essay makes it clear that some people are starting to piece together what dear Googzilla has been doing for the past decade. Keep in mind the time span–decade, 10 years, 120 months. Also note that in that time interval Google has faced zero significant competition in Web search, automated ad mechanisms, and smart software. Google is essentially unregulated.
Let me give you an example from 2006 so you can get a sense of the disconnect between what people perceive about Google and what Google has achieved amidst the cloud of unknowing that pervades analysis of the firm.
Location: Copenhagen. Situation: Log files of referred traffic. Organization: Financial services firm. I asked the two Web pros responsible for the financial services firm’s Web site one question, “How much traffic comes to you from Google?” The answer was, “About 30 percent?” I said, “May we look at the logs for the past month?” One Webmaster called up the logs and in 2006 in Denmark, Google delivered 80 percent of the traffic to the Web site.
The perception was that Google was a 30 percent factor. The reality in 2006 was that Google delivered 80 percent of the traffic. That’s big. The baloney delivered from samples of referred traffic, if the Danish data were plus or minus five percent, Google has a larger global footprint than most Web masters and trophy generation pundits grasp. Why? Sampling services get the market share data in ways that understate Google’s paw prints. Methodology, sampling, and reverse engineering of traffic lead to the weird data that research firms generate. The truth is in log files and most outfits cannot process large log files so “estimates” not hard counts become the “way” to truth. (Google has the computational and system moxie to count and perform longitudinal analyses of its log file data. Whizzy research firms don’t. Hence the market share data that show Google in the 65 to 75 percent share range with Yahoo 40 to 50 points behind. Microsoft is even further behind and Microsoft has been trying to close the gap with Google for years.)
So now it’s official because the New York Times runs an essay that say, “Google is big.”
To me, old news.
In my addled goose monographs, I touched on data my research unearthed about some of Google’s “bigness”. Three items will suffice:
- Google’s programming tools allow a Google programmer to be up to twice as productive as a programmer using commercial programming tools. How’s this possible? The answer is the engineering of tools and methods that relieve programmers of some of the drudgery associated with developing code for parallelized systems. Since my last study — Google Version 2.0 — Google has made advances in automatically generating user facing code. If the Google has 10,000 code writers and you double their productivity, that’s the equivalent of 20,000 programmers’ output. That’s big to me. Who knows? Not too many pundits in my experience.
- Google’s index contains pointers to structured and unstructured data. The company has been beavering away to that it no longer counts Web pages in billions. The GOOG is in trillions territory. That’s big. Who knows? In my experience, not too many of Google’s Web indexing competitors have these metrics in mind. Why? Google’s plumbing operates at petascale. Competitors struggle to deal with the Google as it was in the 2004 period.
- The computations processed by Google’s fancy maths are orders of magnitude greater than the number of queries Google processes per second. For each query there are computations for ads, personalization, log updates, and other bits of data effluvia. How big is this? Google does not appear on the list of supercomputers, but it should. And Google’s construct may well crack the top five on that list. Here’s a link to the Google Map of the top 100 systems. (I like the fact that the list folks use the Google for its map of supercomputers.)
The real question is, “What makes it difficult for people to perceive the size, mass, and momentum of Googzilla?” I recall from a philosophy class in 1963 some thing about Plato and looking at life as a reflection in a mirror or dream (???????). Most of the analysis of Google with which I am familiar treats fragments, not Die Gestalt.
Google is a hyper construct and, as such, it is a different type of organization from those much loved by MBAs who work in competitive and strategic analysis.
The company feeds on raw talent and evolves its systems with Darwinian inefficiency (yes, inefficiency). Some things work; some things fail. But in chunks of time, Google evolves in a weird non directive manner. Also, Google’s dominance in Web search and advertising presages what may take place in other markets sectors as well. What’s interesting to me is that Google lets users pull the company forward.
The process is a weird cyber – organic blend quite different from the strategies in use at Microsoft and Yahoo. Of its competitors, Amazon seems somewhat similar, but Amazon is deeply imitative. Google is deeply unpredictable because the GOOG reacts and follows users’ clicks, data about information objects, and inputs about the infrastructure’s machine processes. Three data feeds “inform” the Google.
Many of the quants, pundits, consultants, and MBAs tracking the GOOG are essentially data archeologists. The analyses report what Google was or what Google wanted people to perceive at a point in time.
I assert that it is more interesting to look at the GOOG as it is now.
Because I am semi retired and an addled goose to boot, I spend my time looking at what Google’s open source technology announcements that seem to suggest the company will be doing tomorrow or next week. I collect factoids such as the “I’m feeling doubly lucky” invention, the “programmable search engines” invention, the “dataspaces” research effort, and new patent documents for a Google “content delivery demonstration”, among others — many others I wish to add.
My forthcoming Google: The Digital Gutenberg explains what Google has created. I hypothesize about what the “digital Gutenberg” could enable. Knowing where Google came from and what it did is indeed helpful. But that information will not be enough to assist the businesses increasingly disrupted by Google. By the time business sectors figure out what’s going on, I fear it may be too late for these folks. Their Baedekers don’t provide much actionable information about Googleland. A failure to understand Googleland will accelerate the competitive dislocation. Analysts who fall into the trap brilliantly articulated in John Ralston Saul’s Voltaire’s Bastards will continue confuse the real Google with the imaginary Google. The right information is nine tenths of any battle. Apply this maxim to the GOOG is my thought.
Stephen Arnold, February 22, 2009
TurboWire: Search for the Children of Some Publishing Executives
February 22, 2009
A bit of irony: at a recent dinner party, a publishing executive explained that his kids had wireless, Macbooks, and mobile phones. He opined that his kids knew the rules for downloading. I was standing behind the chair in which his son was texting and downloading a torrent. The publishing executive stood facing his son and talking to me about his ability to manage digital information. I asked the son what he was downloading. He said, “Mall Cop”. From Netflix I asked? He said, “Nope, a torrent like always.”
If you want to take a look at some of the functionality for search and retrieval of copyrighted materials, check out TurboWire. You can download a copy here. Click here for the publisher’s Web site. The features include search (obviously) and:
- Auto-connect, browse host, multiple search.
- Connection quality control.
- Library management and efficient filtering.
- Upload throttling.
- Direct connection to known IP addresses.
- Full-page connection monitor.
- Built-in media player.
Oh, talking about piracy is different from preventing one’s progeny from ripping and shipping in my opinion. And, no, I did not tell my host that he was clueless. I just smiled and emitted a gentle honk.
Stephen Arnold, February 22, 2009
What Is Vint Cerf Saying
February 16, 2009
Lidija Davis’s “Vint Cerf: Despite Its Age, the Internet Is Still Filled with Problems” does a good job of providing an overview of Vint Cerf’s view of the Internet. You can read the article here. Mr. Davis provides a snapshot of the issues that must be addressed if she captured the Google evangelist’s thoughts accurately:
According to Cerf, and many others, inter-cloud communication issues such as formats and protocols, as well as inter or intra-cloud security need to be addressed urgently.
I found the comments about bit rot interesting and highly suggestive. She quite rightly points out that her summary presents only a small segment of the talk.
When I read her pretty good write up, I had one thought: “Google wants to become the Internet.” If the company pulls off this grand slam play, then the issues identified by Evangelist Cerf can be addressed in a more forthright manner. My reading of the Guha patent documents, filed in February 2007, reveals some of the steps Google’s programmable search engine includes to tackle the problems. Mr. Cerf identified and Ms. Davis reported. I find the GoogleNet an interesting idea to ponder. With some content pulled from Google caches and the Google CDN (content delivery network), Google may be the appropriate intermediary and enforcer in this increasingly unstable “space”.
Stephen Arnold, February 16, 2009

