Kosmix: YAGK (Yet Another Google Killer)

January 20, 2009

Kosmix like Cuil.com has some fibrous tendrils that connect to the Google. Not surprisingly, the Kosmix system does not tackle the Google head on. Think of Kosmix as an automated portal for information. When I visit the site, I see what’s new, I have “hot” topic to click and explore. I have trends. I have videos. In short, I get search without search. There is a search box, and it works reasonably well.

Kosmix splash page. An information portal for the 21st century.

One of the wizards behind Kosmix is Anand Rajaraman, who has considerable visibility in the Silicon Valley technology world. I have followed his Web log posts because he has demonstrated keen insight into the technical activities at Google. In December 2008 he wrote “Kosmix Adds Rocketfuel to Power Voyage of Exploration” here. Several points earned a place in my notes about search; to wit:

Kosmix raised an additional $20 million in financing
Google=Search+Find. But Kosmix=Explore+Browse
The system is based on algorithmic categorization technology.

A feature summary appears on the Kosmix Web log here.

Written by Stephen E. Arnold · Filed Under Cloud computing, Enterprise, Feature, Google, Publishing, Search, Semantic, Text processing | 1 Comment

Google’s Knol Milestone

January 18, 2009

Everyone in the drainage ditch in Harrod’s Creek, Kentucky, thinks Knol is a Wikipedia clone. This addled goose begs to differ. This addled goose thinks Knol is a way for the Google to obtain “knowledge” about topics and the experts who contribute to a Knol (a unit of knowledge). Sure, Knol can be used like Wikipedia, but the addled goose thinks the Knol is more, much, much more.

At any rate, the Google announced on January 16, 2008, after the goose tucked its head under its wing for the week that there are now 100,000 Knols. What this goose found interesting was the headline: “100,000th Knol Published.” I love that word “published”. Google emphasizes that it is not a publisher, but it is interesting to me how the word turns up. You can read the story here.

The blog post contains some interesting insights into Knol; for example, people from 197 countries visit Knol “on an average day.” The interface is available in eight languages. Visitors are editing Knols.

Now how long will it take Knol to reach one million entries?

Stephen Arnold, January 18, 2009

Written by Stephen E. Arnold · Filed Under Cloud computing, Database, Google, Library automation, Publishing, Text analytics, Text processing | Comments Off on Google’s Knol Milestone

Received Wisdom about Microsoft Google Off by 30 Degrees

January 16, 2009

The dead tree version of the Wall Street Journal arrived this morning (January 16, 2009) and greeted me with Robert Guth’s article “Microsoft Bid to Beat Google Builds on a History of Misses”. You can find an online version here. You can also find a discussion by Larry Dignan here. Both of these write ups set my teeth on edge, actually, my beak. I am an addled goose, as you may know.

The premise of the Wall Street Journal article is that Microsoft had chances to do what Google is doing; to wit: sell ads, build search traffic, and buy Overture.com, among other missteps. The implication in these examples is that “woulda coulda shoulda” argument that characterizes people with a grip on received wisdom or what “everybody” knows and believes.

Mir. Dignan adds some useful points, overlooked by Mr. Guth; namely, Microsoft lacked a coherent Web strategy. Also, had Microsoft moved into ads that alone did not address Google’s focus on search. Mr. Dignan emphasizes that “you can’t count Microsoft out–even now.”

Let me from my hollow in Kentucky where the mine drainage has frozen a nice suphurous yellow this frosty morn offer a different view of the problem Microsoft faces. You can cherish these nuggets of received wisdom. I want to point out where these individual, small Google nuggets fit in the gold mine of online in the 21st century.

Received wisdom is useful but often is incomplete. Filling in the gaps makes a difference when determining what steps to take. Image source: http://www.grahamphillips.net/Ark/Ark_2_files/moses_with_tablets.jpg

What Google Did in 1998

Google looked at search and the problems then dominant companies faced. I can’t run down the numerous technical challenges. (If you want detail, click here.) I can highlight three steps taken by Google when Microsoft and others dabbling in the Internet were on equal footing.

First, Google looked at the bottlenecks in the various subsystems that go together to index digital information and make it findable. These bottlenecks were no surprise in 1998 and they aren’t today. Google identified issues with parallel processing, organizing the systems, and getting data moving the right place at the right time. Google tackled this problem head on by rethinking how the operating system could better coordinate breaking a task into bite sized chunks and then getting each chunk worked on and the results back where they were needed without bringing the computer to its knees. This problem still bedevils quite a few search engine companies, and Google may not have had a perfect solution. But Google correctly identified a problem and set out to solve it by looking for tips and tricks in the research computing literature and by tapping the expertise at AltaVista.com.

Second, Google figured that if it was going to index digital information on any scale, the company needed a way to build capacity without paying for the high end, exotic, and often flakey equipment used by some companies. One example of this type of hardware goof is the AltaVista.com service itself. It used the DEC Alpha chip, which was the equivalent of a Fabergé egg that generated the heat of a gas tungsten arc welding device. Google invested time and effort in cobbling together a commodity hardware solution.

Third, Google looked at what work had to be done when indexing and query processing. The company had enough brain power to realize that the types of read write processes that are part of standard operating systems and database systems would not be suitable for online services. Instead of embracing the traditional approach like every other commercial indexing outfit did in the 1998 to 2000 period (a critical one in Google’s technical development), Google started over. Instead of pulling an idea from the air, Google looked in the technical literature. Google took the bride’s approach to innovation: something borrowed, something new, etc. The result was what is now one of the core competitive advantages of Google–the suite of services that can deliver fast read speeds and still deliver acceptable performance with a Google Apps user saves a file.

Keep in mind that Google has been working on its business for a decade. Google is no start up. Google has a head start measured in years, not months or weeks.

Written by Stephen E. Arnold · Filed Under Business strategy, Cloud computing, Feature, Google, Microsoft, Technology, Yahoo | 4 Comments

Googzilla: Subtle No More

January 15, 2009

TechCrunch ran a very interesting article called “Gmail Grew 43 Percent Last Year. AOL Mail And Hotmail Need To Start Worrying” here. The story provides meat on the bone of the headline. I loved the chart showing the GOOG moving on, keeping on. After reading this article, you will want to pop over to this official Google page and read about the GOOG’s utility to migrate Web logs from one service to another. Like Gmail, the migration tool will wow the Python heads and leave others cold. Over time, the migration tool will probably get some tweaks and then one day, the GOOG is the big dog in the blog house. Competitors who ignore Google’s emulation of American revolutionaries in buckskins will find themselves on the wrong end of the bayonet. The tactics are explained in more detail here.

Stephen Arnold, January 15, 2009

Written by Stephen E. Arnold · Filed Under Business strategy, Cloud computing, Google, News, Online (general) | Comments Off on Googzilla: Subtle No More

Enterprise Web 2.0 Predictions

January 14, 2009

Dion Hinchcliffe has popped on my radar twice today. First, I wrote about his role as the chief technical officer at Web search at Nexplore. My newsreader then delivered “Eight Predictions for Enterprise Web 2.0 in 2009” here. Mr. Hinchcliffe has more confidence in Enterprise 2.0 than I do. I am concerned about Enterprise 1.0 organizations unable to survive. Pushing the troubled companies to embrace Web 2.0 technologies as a survival strategy is a stretch for me. Mr. Hinchcliffe’s angle is quite different from mine. You must read his essay. In this Web log article I want to comment about three of his eight predictions. Note: I am not picking on Mr. Hinchcliffe; I am focusing on three ideas.

First, he points out that communities will become a priority for most organizations. I am on the fence about social systems in organizations. I have two concerns.I think much of the “social software” trend is hype. More significantly, I remain cautious because it is not clear how companies can operate social software without increasing certain risks. Regulated organizations come to mind.

Second, the idea that information technology will be aligned is a good one. But I find that organizations under pressure often become more internally divisive. The information technology unit often becomes the pivot point for certain power struggles. The poorly performing division may be viewed as a time sink and left spinning in the wind. The high performance unit will demand and probably get additional support. The harmonization of technology with poorly defined or on-the-fly business processes is not going to occur in many troubled organizations. One can argue that culture and organizational behavior has more impact on alignment than any other factors. Sorry, I don’t see much progress. I do see quite a bit of consulting.

Third, Service Oriented Architectures will become more streamlined. One can’t argue with a statement with the word “more”. My view is that most of the SOA progress will be in getting the systems to work with less latency. I don’t think the code will become more compact or the methods more streamlined. Most of the SOA applications are works in progress. The gating factor will be the organization’s appetite to invest and the expertise of the information technology staff and consultants.

I think that one important technical shift will be eDiscovery. Mr. Hinchcliffe does not include that application. Litigation is not on the minds of the Web 2.0 Enterprise 2.0 bandwagon riders. One legal and botched eDiscovery process will suck up any available IT dollars in 2009.

In short, I find the assertions Mr. Hinchcliffe advances as music to the ears of consultants and vendors of trendy applications. My view is that the financial problems in many companies will be increasing in the first six months of 2009. Perhaps the economy will bloom in the second half of 2009. But until there is a way to link return on investment to the services and systems Mr. Hinchcliffe describes, I don’t think his predictions will come true. Of the eight, I think two or three have a reasonable chance of gaining traction. The rest will remain the rosy tipped vision of a world designed for consultants and start ups.

Stephen Arnold, January 14, 2009

Written by Stephen E. Arnold · Filed Under Business strategy, Cloud computing, Enterprise, News, Technology | 1 Comment

Crazy Stats: Interesting Yet Hardly Web 2.0

January 12, 2009

I think the clever wordsmiths who snagged the Web 2.0 meme are blowing smoke. Losing money is not a business model. Nevertheless, I enjoyed this list of Web 2.0 statistics. I think the word “statistics” as used by TheFutureBuzz.com means “unverifiable factoids”. The article is “49 Amazing Social Media, Web 2.0, and Internet Stats” is here. Three of the unsubstantiated factoids that caught my attention were:

Google’s one trillion urls. Impossible to verify. Ranks with Amazon’s assertions about the number of objects managed in its AWS service. More PR fluff than factual bedrock.
The 70 million videos on Google. Nice assertion, no verification.
133 million Web logs indexed by Technorati. Yep, but how many have been orphaned. The total number of Web logs remains a mystery.

If you love these types of factoids, TheFutureBuzz.com article is for you.

Stephen Arnold, January 12, 2009

Written by Stephen E. Arnold · Filed Under Cloud computing, News, Online (general), Social | Comments Off on Crazy Stats: Interesting Yet Hardly Web 2.0

British Library Dubunks Myth of a Google Generation

January 11, 2009

Libraries are fighting for money and a role in the digital world. The plight of white shoe publishers is well known. Newspapers, once the life blood of information, are now stuffed with soft news or, what’s worse, old information. The shift from desktop boat anchor computers to sleek hand held devices is moving forward. Flag ship PC vendors like Dell Computers is in a fight for Wall Street respectability. The television and motion picture pasha believe that the fate of the traditional music publishing business is not theirs.

On January 16, 2008 (the date and the information come from this source), the British Library press room issued or issues or will issue “Pioneering Research Shows Google Generation Is a Myth.” The news release summarizes the study Information Behaviour of the Research of the Future. Here’s the link I located but it did not work without some clicking around. The report strikes me as something developed in an alternate universe where the Googleplex and its information system are small potatoes indeed.

He does not exist, but this member of the Google generation made it to the cover of the British Library debunking the myth study. In the future, this lad will be retrieving information from a mobile device, no PC or library required thinks this addled goose.

The study was, according to the press release,

Commissioned by the British Library and JISC (Joint Information Systems Committee), the study calls for libraries to respond urgently to the changing needs of researchers and other users. Going virtual is critical and learning what researchers want and need crucial if libraries are not to become obsolete, it warns. “Libraries in general are not keeping up with the demands of students and researchers for services that are integrated and consistent with their wider Internet experience”, says Dr Ian Rowlands, the lead author of the report.

Now this paragraph seems to suggest that “something” has happened and that libraries must “respond urgently to the changing needs of researchers and other users.” My hunch is that libraries are not surfing on the Google but paddling along trying to keep Googzilla’s spikey back in view.

Most of these curves head south, right? © British Library 2009 and presumably in the universe which I inhabit.

The news release also suggests libraries must turn to “Page 2.0”, which I presume is another silly reference to the made up world of Search 2.0, Enterprise 2.0, and Web 2.0. The news release from the future ends with the mysterious phrase “The panel:”.

Keep in mind that I am writing this notice on January 11, 2009, at 9 30 am Eastern time. The news release is from the future. It has a date of January 16, 2009. One would think that the British Library, operating outside the normal space time continuum could do more than tell me that the myth of the Google generation does not exist. Clever headline aside, libraries must define a role for themselves before funding dwindles even more. University libraries might be grandfathered into the institutional budget. Other types? Might be a tough sale.

In my opinion, what does not exist among some in the library profession is a firm grip on the hear and now. I am 65, and I think the Google generation exists. I wish it were not so, but it exists and the world one hopes will be better for the generation’s presence. Libraries seem to exist in a medieval world. Even Shakespeare is in step with the shift from paper to digital information. Consider Hamlet’s statement from one of the versions of the play crafted from Shakespeare’s foul papers:

Let us go in together,
And still your fingers on your lips, I pray.
The time is out of joint—O cursèd spite,
That ever I was born to set it right!
Nay, come, let’s go together.

No myth this, sprites.

Stephen Arnold, January 11, 2009

Written by Stephen E. Arnold · Filed Under Cloud computing, Google, Mobile, News, Online (general), Publishing, Technology | 2 Comments

Microsoft’s Data Robustness

January 11, 2009

The “we may go out of business” Seattlepi.com Web site ran a story with the cruel title “Microsoft’s Servers Overloaded by Interest in Windows 7.” You can read this sort of weird headline and its accompanying story here. The story makes clear that Microsoft’s investments in its data centers was not up to the load imposed by the faithful downloading Windows 7.

The misstep was described as a “borkfest” by Lifehacker here. This goose isn’t sure what a borkfest is, but he can make a guess. Gina Trapani’s article nails the problem. She wrote:

If lack of infrastructure to handle an insane traffic spike over a few hours was truly the problem (even though these were conditions Microsoft created), there are lots of alternatives they could’ve used that would have kept their servers up. In fact, users have been happily downloading and distributing the Windows 7 beta build 7000 now for weeks using an efficient file-sharing protocol called BitTorrent.

When the GOOG streamed its live concert test last year, the Googlers tapped Akamai. Did Microsoft use its own content delivery network? Did Microsoft contract out the job? Whoever handled the job may want to check out another line of work in my opinion. Seattlepi.com quotes a Microsoft Web log. I noted this sentence: “We are adding some additional infrastructure support to the Microsoft.com properties before we post the public beta.” Good idea.

Stephen Arnold, January 11, 2009

Written by Stephen E. Arnold · Filed Under Cloud computing, Microsoft, News, Online (general), Technology | Comments Off on Microsoft’s Data Robustness

Why Cloud Computing Is Right for 2009

January 10, 2009

The Standard ran Kathleen Lau’s ComputerWorld Canada story “Seven Reasons Cloud Computing Works in a Touch Economy” here. Most of the points are spot on. However, I was taken aback by the financial points: Lower up front costs, reduced financial risk, lower capital expense, and lower operational expense. On the surface, each of these points seem logical. But if one thinks about the generalization, the cost assertions may not work. A couple of quick examples:

First, a company shifts data and apps to the cloud. The vendor’s system craps out when a key proposal must be generated. The client loses the contract. This type of problem is tough to budget, so most people ignore the issue until it occurs. Those costs can be direct or time shifted. In order to prevent a single vendor from dropping the ball, additional money and effort are required to create a hot fail cloud environment.

Second, the notion that operational expenses will fall may also be easy to say but tough to achieve. High end data centers touch the machines and software. The customers don’t. When a system has been customized, the costs of troubleshooting and remediating can be high. In my experience, it only takes a weekend of overtime to blow the operational budget out of the water. Not all cloud apps will work this way, but when one does, the costs can be high indeed.

Finally, any financial assertion about cloud computing has to present assumptions and example costs. Telling me that the tree will cost about $500 to remove can become a $5,000 repair bill if the arborist manages to get the tree to fall on the neighbor’s bass boat.

More detail, please!

Stephen Arnold, January 10, 2009

Written by Stephen E. Arnold · Filed Under Cloud computing, Financial, News | 1 Comment

Social Search: Manipulating for Money

January 9, 2009

Mike Elgan wrote “How China’s 50 Cent Army Could Wreck Web 2.0” here. The point of this article is that a person with money can hire Chinese computer users to insert comments into social networks. The infusion of posts would, in effect, distort the much-ballyhooed wisdom of crowds. Mr. Elgan does a good job of explaining how these army works and pointing out the fragility of user-dependent Web 2.0 services. I think he strays from the tethering ring when we asserts that the Chinese “army” can undermine free speech, but otherwise, he’s spot on.

However–and I know you relish my “howevers”–a few of my addled goose observations are now in order.

First, the “social network” revolution is not as zippy as most pundits assert. Mr. Elgan’s write up explains how the person with money can pay to make a specific issue, product, or person percolate upwards. Money can’t buy happiness but it sure can buy visibility in a Web 2.0 service that depends on user inputs.

Second, social networks is more of marketing story than a technology innovation. Sure, MySpace.com and Facebook.com move well beyond discussion fora and individual Web pages. These sites have knitted together functions and surfed on young-at-heart users who need a way to connect in today’s Jetson’s world. As the young-at-heart grow old and infirm, their use of network communication methods will persist, but these methods are extensions of older technologies, not sudden inventions.

Third, the implications of a technology cannot be accurately predicted. As a result, when an issue arises with a technology application or suite of technology applications like social networks, the “fix” will be more technology. My concern with MySpace.com and Facebook.com stems not from what they do, but my concern arises from the new technologies these services will require to handle the problems. For example, what’s the fix for the Chinese “army” issue? Think more stringent controls. The casualty is not free speech. It is freedom.

Stephen Arnold, January 9, 2009

Written by Stephen E. Arnold · Filed Under Cloud computing, News, Online (general), Privacy, Search, Social, Technology | 4 Comments

« Previous Page — Next Page »

Search the site
Subscribe to Beyond Search
Feature archive
News archive

Stephen E. Arnold monitors search, content processing, text mining and related topics from his high-tech nerve center in rural Kentucky. He tries to winnow the goose feathers from the giblets. He works with colleagues worldwide to make this Web log useful to those who want to go "beyond search". Contact him at sa [at] arnoldit.com. His Web site with additional information about search is arnoldit.com.

Categories
- 3D-Printing
- Acquisition
- Advertising
- Aggregation
- AI
- Alexa
- algorithms
- Amazon
- Amazonia
- Analytics
- Appliance
- Applications
- Audio
- Augmented Reality
- Big data
- Bing
- Bitcoin
- Bitext
- Book review
- Business intelligence
- Business process
- Business strategy
- Censorship
- Cloud computing
- Company Profile
- Conferences
- Connectors
- Consulting
- Consumer
- Content processing
- Copyright
- Corporate Concerns
- Cost
- Crawl
- Crowdfunding
- cryptocurrency
- Customer support
- Cyber OSINT
- cybercrime
- cybersecurity
- Dark Web
- DarkCyber
- Data
- Data mining
- Database
- Deepfakes
- Digital Assistant
- Digital Library
- E2EE
- ECommerce
- EDiscovery
- Editorial opinion
- Education
- Emoticons
- Employment
- Enterprise
- Enterprise search
- Entity extraction
- Ethics
- Facebook
- Faceted search
- Factualities
- Feature
- Federated search
- Financial
- Fogint
- Google
- Governance
- Government
- Hackers
- healthcare
- IBM Watson
- Image search
- Indexing
- Infrastructure
- Innovation
- Integration
- intelware
- Interface
- Internet
- Interview
- Investment
- law enforcement
- Legal matters
- Library automation
- Management
- Marketing
- Mathematics
- Metadata
- Microsoft
- Mobile
- Natural language processing
- News
- NGIA
- Online (general)
- Open Access
- Open source
- OSINT
- Osint Radar
- Overflight
- Palantir
- Patents
- Personnel
- Podcast
- Policeware
- Portals
- Predictive coding
- Privacy
- Profile
- Publishing
- Quotation
- Real time search
- Reference tool
- Rich media
- Robot Writer
- Search
- Search enabled applications
- search engine
- Search quality
- Security
- Semantic
- Sentiment analysis
- SEO
- SharePoint
- Short Honks
- Smart Technology
- Social
- Social Media
- software
- Statistics
- Taxonomy
- Technology
- Text analytics
- Text processing
- Tools
- Tor
- Training
- Translation
- Twitter
- Uncategorized
- Unstructured Data
- User experience
- User Interface
- Vertical search
- Video
- visualization
- Voice search
- Voice technology
- Web 3
- Web Services
- Webinar
- Windows
- Work flow
- XML
- Yahoo

Beyond Search

Kosmix: YAGK (Yet Another Google Killer)

Google’s Knol Milestone

Received Wisdom about Microsoft Google Off by 30 Degrees

Googzilla: Subtle No More

Enterprise Web 2.0 Predictions

Crazy Stats: Interesting Yet Hardly Web 2.0

British Library Dubunks Myth of a Google Generation

Microsoft’s Data Robustness

Why Cloud Computing Is Right for 2009

Social Search: Manipulating for Money

Search the site

Categories

Archives

Recent Posts

Meta

Share this:

Share this:

Share this:

Share this:

Share this:

Share this:

Share this:

Share this:

Share this:

Share this:

Search the site

Categories

Archives

Recent Posts

Meta