Beyond Keyword Search

March 4, 2009

An interesting tie up between LinkedIn and Twitter caught my attention. The story appeared in Search Engine Journal. Dev Basu’s “LinkedIn Teams Up with Twitter through Company Buzz” reported here that the networking service LinkedIn and the micro blogging service Twitter have teamed to offer an enterprise service. Mr. Basu wrote:

Every second thousands of people are sending out messages about topics and companies through twitter. Company Buzz lets you tap into this information flow to find relevant trends and comments about your company. Install the application and instantly see what people are saying.

This is an interesting development. Confusion about the meaning of the term “search” is commonplace. In a telephone conversation yesterday, two people on the conference call used the word “search” to describe what their organization needed. I asked each to define their understanding of the word “search”. One said, “We need to find specific data in our research reports. Not the whole document. Just the pertinent chunk.” The other said, “We need to know who knows what about a specific topic.”

The word “search” is used without much thought given to what different people mean when they throw the buzzword around.

This deal between LinkedIn and Twitter comes close to what quite a few people in the last couple of months have been describing as “search”. Key word retrieval has a place, but users want more. Will LinkedIn and Twitter dominate this market space? Hard to say. I think the deal is one to watch.

Stephen Arnold, March 4, 2009

Written by Stephen E. Arnold · Filed Under Mobile, News, Online (general), Real time search, Social | Comments Off on Beyond Keyword Search

MapReduce in a Browser: A Glimpse of the Google in 2011

March 4, 2009

I have no idea who is behind Igvita, but I will pay closer attention. You will want to read “Collaborative Map-Reduce in the Browser” here. When I read the write up and scanned the code, I thought, “Yep, this is the angle the Google is taking with Chrome, containers, and a bunch of other Googley patent documents’ “inventions”. I won’t spoil your fun. For me, the most important information in the write up is the diagram. A happy quack to Igvita. Heck, have two quacks.

Stephen Arnold, March 4, 2009

Written by Stephen E. Arnold · Filed Under Cloud computing, Google, News, Online (general), Technology | Comments Off on MapReduce in a Browser: A Glimpse of the Google in 2011

Autonomy IDOL Metrics

March 3, 2009

I was updating my files and noticed that the company had added metrics to its IDOL write up. You can find the information here. Among the information I noted were these points:

Support over 470 million documents on 64-bit platforms
Accurately index in excess of 110 GB/hour with guaranteed index commit times (I.e. how fast an asset can be queried after it is indexed) of sub 5ms
Execute over 2,600 queries per second, with subsecond response times on a single machine with two CPUs when used against 70 million pieces of content, while querying the entire index for relevant information
Support hundreds of thousands of enterprise users, or millions of web users, accessing hundreds of terabytes of data
Save storage space with an overall footprint of less than 15% of the original file size.

These metrics are quite amazing. To buttress the argument, the company quotes a number of consultants. Happy customers include Satyam, a firm that has been in a bit of a swamp. The write up about Autonomy IDOL’s security support is equally remarkable. I did a calculation based on public data about Google. You can find that write up here. Notice that Autonomy’s system processes more queries per second than Google’s, if these data are accurate. If you have other metrics about Autonomy or any other search engine, feel free to post these data in the comments section of this Web log.

Stephen Arnold, March 3, 2009

Written by Stephen E. Arnold · Filed Under Enterprise, News, Online (general), Search, Technology, Text processing | 15 Comments

SEO: Good, Bad, Ugly

March 3, 2009

A happy quack to the reader who sent me a link to the February 20, 2009, article by George for Insiders View: Insurance Blow here. “More and More SEO Scams” made the statement:

It seems that there are few whitehat agencies these days. I always advocate some gray hat to stay on top and some blackhat to determine what others are doing. But this is getting ridiculous. The economic climate has pushed people out of the city so instead of brokering toxic investments, they’re now brokering SEO services.

Strong words. I had seen the About.com posting “How to Avoid Being Taken by SEO Scams and Bad SEO Companies” here, but I was not sure how widespread the problem was. Dave Taylor here made this comment in his “SEO Company Promises Top Three Positions: A Scam?”:

Of all the aspects of the Internet, none seems to be so full of con artists and purveyors of dubious businesses than so-called search engine optimization companies. The reason for this is that the basics of SEO (which I’ll call it for simplicity) are simple and can be explained in five minutes. Heck, Google even has a free guide to SEO best practices.

Image source: http://3.bp.blogspot.com/_jhSlOGUoB5k/R-1-flxJm0I/AAAAAAAAE40/y1pVNDBfyXE/s400/scam.jpg

Several thoughts:

As the economy slides toward a financial black hole, some companies hope their Web sites can be a source of sales leads and revenue. Managers turn to their marketing advisors and Web professionals to deliver a return on the Web investment. Pressure increases.
The dominance of Google in Web search means that a company not in the Google index does not exist in some cases. A company whose product or service does not come up on the first page of Google results may not get much traffic.
The quality of Web sites (content, coding) becomes increasingly important. But quality takes thought, time, and effort.

When one mixes these three ingredients together, search engine optimization becomes a must. If a company can afford to buy Google AdWords, then the Web site must have compelling landing pages and the technical plumbing to make it easy for the person landing on a link to take the desired action.

Written by Stephen E. Arnold · Filed Under Business strategy, Feature, Online (general), Search, SEO, Technology, Text processing | 9 Comments

A Wrinkle in the Government Procurement Envelope

March 3, 2009

Government agencies buy quite a bit of hardware, storage, and systems to deal with digital information. I avoid Washington, DC. I went to grade school there. I fought traffic on I 270 when I worked in the city for a decade after getting booted from a third tier university. I then did the SDF to BWI run on Southwest for five or six years when I was hooked up with a government-centric services firm. I don’t know that much about procurements, but I do know when what looks like a trivial event could signal a larger shift. You can take a look at the ComputerWorld story “DOJ Accuses EMC of Improper Pricing” here. If I were writing the headline, I would have slapped in an “allegedly”. Keep in mind I am reporting second hand news and offering a comment. I am not sure how accurate or how much oomph this DOJ (Department of Justice) matter has. The thrust of the story is that DOJ is sniffing into payments and tie ups. Now most folks in Harrods Creek, Kentucky don’t pay much attention to the nuances of Federal acquisition regulations. Let’s assume that this is little more than a clerical error. But in my opinion this single matter signals a tougher line on how companies that manufacture or create hardware and software deal with the government. Some organizations sell direct to the government and others take the lead and turn it over to partners. The relationships among the manufacturers and the partners and the government is a wonderland of interesting activities. Why is this important? Search vendors operate in different ways and some systems trigger significant hardware acquisitions. With a massive Federal deficit, I wonder, “Is this single alleged action a harbinger of closer scrutiny of some very high profile companies’ business dealings?” My hunch is, “Yep.” Some companies will want to tidy their business processes. When rocks get flipped over, some interesting things can be spotted. One major search vendor does not sell directly to the US government. The vendor deals through partners. Some partners are loved more than others. My thought is that if I were investigating these tie ups, I would prefer to see partners treated in an equitable way with documentation that backs up the compensation, limits, and responsibilities with regard to the US government and the source of the hardware or software. If the system is “informal”, I would dig a little deeper to make sure that US government procurement guidelines were followed to the letter. Just my opinion. I might come out of retirement to do some of the old time procurement fact finding when spring comes.

Stephen Arnold

Written by Stephen E. Arnold · Filed Under Business strategy, EDiscovery, Financial, News, Online (general) | Comments Off on A Wrinkle in the Government Procurement Envelope

Mysteries of Online 9: Time

March 3, 2009

Electronic information has an interesting property: time distortion. The distortion has a significant effect on how users of electronic information participate in various knowledge processes. Information carries humans along much as a stream whisks a twig in the direction of the flow. Information, unlike water, moves in multiple directions, often colliding, sometimes reinforcing, and at others in paradoxical ways that leave a knowledge worked dazed, confused, and conflicted. The analogy of information as a tidal wave connotes only a partial truth. Waves come and go. Information flow for many people and systems is constant. Calm is tough to locate.

Vector fields. Source: http://www.theverymany.net/uploaded_images/070110_VectorField_test012_a-789439.jpg

In the good old days of cuneiform tablets, writing down the amount of wheat Eknar owed the king required specific steps. First, you had to have access to suitable clay, water, and a clay kneading specialist. Second, you needed to have a stylus of wood, bone, or maybe the fibula of an enemy removed in a timely manner. Third, you had to have your data ducks in a row. Dallying meant that the clay tablet would harden and make life more miserable than it already was. Once the document was created, the sun or kiln had to cooperate. Once the clay tablet was firm enough to handle without deleting a mark for a specified amount of wheat, the tablet was stacked in a pile inside a hut. Forth, the access the information, the knowledge worker had to locate the correct hut, find the right pile, and then inspect the tablets without breaking one, a potentially bad move if the king had a short temper or needed money for a war or a new wife.

In the scriptorium in the 9th century, information flow wasn’t much better. The clay tablets had been replaced with organic materials like plant matter or for really important documents, the scraped skin of sheep. Keep in mind that other animals were used. Yep, human skin worked too. Again time intensive processes were required to create the material on which a person would copy or scribe information. The cost of the materials made it possible to get patrons to spit out additional money to illustrate or illuminate the pages. Literacy was not widespread in the 9th century and there were a number of incentives to get sufficient person power to convert foul papers to fair copies and then to compendia. Not just anyone could afford a book. Buying a book or similar document did not mean the owner could read. The time required to produce hand copies was somewhat better than the clay tablet method or the chiseled inscriptions or brass castings used by various monarchs.

Yep, I will have it done in 11 months, our special rush service.

With the invention of printing in Europe, the world rediscovered what the Chinese had known for 800, maybe a thousand years. No matter. The time required to create information remained the same. What changed was that once a master set of printing plates had been created. A printer with enough capital to buy paper (cheaper than the skin and more long lasting than untreated plant fiber and less ink hungry than linen based materials) could manufacture multiple copies of a manuscript. The out of work scribes had to find a new future, but the impact of printing was significant. Everyone knows about the benefits of literacy, books, and knowledge. What’s overlooked is that the existence of books altered the time required to move information from point A to point B. Once time barriers fell, distance compressed as well. The world became smaller if one were educated. Ideas migrated. Information moved around and had impact, which I discussed in another Mysteries of Online essay. Revolutions followed after a couple hundred years, but the mindless history classes usually ignore the impact of information on time.

If we flash forward to the telegraph, time accelerated. Information no longer required a horse back ride, walk, or train ride from New York to Baltimore to close a real estate transaction. Once the new fangled electricity fell in love with information, the speed of information increased with each new innovation. In fact, more change in information speed has occurred since the telegraph than in previous human history. The telephone gave birth to the modem. The modem morphed into a wireless USB 727 device along with other gizmos that make possible real time information creation and distribution.

Time Earns Money

I dug out notes I made to myself sometime in the 1982 – 1983 time period. The implications of time and electronic information caught my attention for one reason. I noted that the revenue derived from a database with weekly updates was roughly 30 percent greater than information derived from the same database on a monthly update cycle. So, four updates yielded a $1.30, not $1.00. I wrote down, “Daily updates will generate an equal or greater increase.” I did not believe that the increase was infinite. The rough math I did 25 years ago suggested that with daily updates the database would yield about 1.6 percent more revenue than the same database with a monthly update cycle. In 1982 it was difficult to update a commercial database more than once a day. The cost of data transmission and service charges would gobble up the extra money, leaving none for my bonus.

In the financial information world, speed and churn are mutually reinforcing. New information makes it possible to generate commissions.

Time, therefore, not only accelerated the flow of information. Time could accelerate earnings from online information. Simply by u9pdating a database, the database would generate more money. Update the database less frequently, the database would generate less money. Time had value to the users.

I found this an interesting learning, and I jotted it down in my notebook. Each of the commercial database in which I played a role were designed for daily updates and later multiple updates throughout the day. To this day, the Web log in which this old information appears is updated on a daily basis and several times a week, it is updated multiple times during the day. Each update carries and explicit time stamp. This is not for you, gentle and patient reader. The time stamp is for me. I want to know when I had an idea. Time marks are important as the speed of information increases.

Implications

The implications of my probably third-hand insight included:

The speed up in dissemination means that information impact is broader, wider, and deeper with each acceleration.
Going faster translates to value for some users who are willing and eager to pay for speed. The idea is that knowing something (anything) first is an advantage.
Fast is not enough. Customers addicted to information speed want to know what’s coming. The inclusion of predictive data adds another layer of value to online services.
Individuals who understand the value of information speed have a difficult time understanding why more online systems and services cannot deliver what is needed; that is, data about what will happen with a probability attached to the prediction. Knowing that something has a 70 chance of taking place is useful in information sensitive contexts.

Let me close with one example of the problem speed presents. The Federal government has a number of specialized information systems for law enforcement and criminal justice professionals. These systems have some powerful, albeit complex, functions. The problem is that when a violation or crime occurs, the law enforcement professionals have to act quickly. The longer the reaction time, the greater the chance that the bad egg will tougher to apprehend increases. Delay is harmful. The systems, however, require that an individual enter a query, retrieve information, process it and then use another two or three systems in order to get the reasonably complete picture of the available information related to the matter under investigation.

The systems have a bottleneck. The human. Law enforcement personnel, on the other hand, have to move quickly. As a result, the fancy online systems operate in one time environment and the law enforcement professionals operate in another. The opportunity to create systems that bring both time universes together is significant. Giving a law enforcement team mobile comms for real time talk is good, but without the same speedy and fluid access to the data in the larger information systems, the time problem becomes a barrier.

Opportunity in online and search, therefore, is significant. Vendors who pitch another fancy search algorithm are missing the train in law enforcement, financial services, competitive intelligence, and medical research. Going fast is no longer a way to add value. Merging different time frameworks is a more interesting area to me.

Stephen Arnold, February 26, 2009

Written by Stephen E. Arnold · Filed Under Cloud computing, Database, EDiscovery, Feature, News, Online (general), Publishing, Technology, Text processing | Comments Off on Mysteries of Online 9: Time

Microsoft Trumps Google, Dismisses Its Enterprise Services

March 2, 2009

Microsoft seems to be returning to its glory days as vanquisher of the weak and destroyer of the newcomers. Phil Wainewright wrote “Microsoft Pumps Cloud, Trumps Google with GSK.” I must admit the GSK threw me. It is the insiders way to refer to Glaxo SmithKline, a pharmaceutical giant. The comment that stuck in my beak was:

Not only that. Ron Markezich, corporate VP of Microsoft Online Services, was scathing of Google’s efforts to make headway in the enterprise market. “Google we really do not feel is ready for the enterprise,” he said in a call briefing bloggers on the announcement an hour ago. “They’re offering three nines SLA and they’ve missed three of the last six months,” he added, referring to last week’s Gmail outage and earlier incidents. In a sideswipe at Google’s offer of a 15-day credit for last week’s outage, he went on to add that Microsoft maintains its services at four-nines availability, while backing up its three-nines SLA with financial penalties: “We don’t just give service credits, we give hard dollars if we miss an SLA.” [Emphasis added]

My take on this announcement includes these thoughts:

Looks like each Google announcement will trigger an aggressive response from Microsoft
Microsoft is sending a signal to Google and probably to any other company that it intends to protect its customer base. Good cheer and happiness may not be part of the response
Google must have landed a kaisho (open hand strike). Microsoft’s statement (cited above) suggests to me that Google is not an annoyance; Google is a threat.

More from the battle front as reports arrive.

Stephen Arnold, March 2, 2009

Written by Stephen E. Arnold · Filed Under Enterprise, Google, Microsoft, News, Online (general) | Comments Off on Microsoft Trumps Google, Dismisses Its Enterprise Services

YAGG Update: PageRank Tweak or Bug

March 2, 2009

If you are mesmerized by things Google, you will want to navigate to Search Engine Roundtable and read “Google March 2009 PageRank Update or Glitch?” here. The article provides links to a couple of posts that identify what may be a potential glitch or goof as in “yet another Google goof” or YAGG. I know the acronym annoys Alex, a potential Googlephile. The article quotes a Googler who uses the phrase “some kind of glitch”, which may be old news if you were bitten by the Gfail issue a few days ago.

Stephen Arnold, March 2, 2009

Written by Stephen E. Arnold · Filed Under Cloud computing, Google, News, Online (general), Search, Technology | Comments Off on YAGG Update: PageRank Tweak or Bug

Another YAGG: Picasa Privacy

March 2, 2009

Philipp Lenssen at Googleblogoscoped wrote “Picasa Privacy Oddity” here. If the information is accurate, the Google has another YAGG (yet another Google glitch) to resolve. Mr. Lenssen wrote:

this goes to show that not password-protecting a sign-in locked album’s image URLs themselves is still not as utterly-security-obsessive as could be (which is noteworthy considering Picasa Web Album’s mixed privacy history of the past).

Alex, once a reader, is not too keen of my YAGG coinage or my pointing out the feet of clay that Googzilla may have. Worth watching I suppose.

Stephen Arnold, March 2, 2009

Written by Stephen E. Arnold · Filed Under Google, News, Online (general) | Comments Off on Another YAGG: Picasa Privacy

Potential Trouble for LexisNexis and Westlaw

March 2, 2009

Most online surfers don’t click to Reed Elsevier’s LexisNexis or Thomson Reuters Westlaw. The reason? These commercial services charge money–quite a lot of money–to access legal documents. Executives at both firms can deliver compelling elevator pitches about the added value each company brings to legal documents. In the pre-crash era, legal indexing was a manual process. Then the cost crunch arrived so both outfits are trying to slap software against the thorny problem of making sense of court documents, rulings, and assorted effluvia of America’s legal factories. I may write about how these two quasi US outfits have monopolized for fee legal information about American law for lawyers, government agencies. Both Reed and Thomson then turn around and sell access to these documents to the agencies that created them in the first place. I wonder if the good senator is aware of this aspect of commercial online services’ busness practices?

What’s the trouble? I bet you thought I was going to mention Google. Wrong. Google is on the edge of indexing legal information in a more comprehensive way. But the right now trouble is Senator Joe Lieberman. Wired reported that the good senator wondered by public documents are not available without a charge. You can read the story “Lieberman Asks, Why Are Court Docs Still Behind Paid Firewall?” here. Senator Lieberman’s question may lead to a hearing. The process could, in my opinion, start a chain reaction that further erodes the revenue Reed Elsevier and Thomson Reuters derive from public documents. Somewhere in the chain, the Google will beef up the legal content in its Uncle Sam service here.

At their core, Reed Elsevier and Thomson Reuters are traditional publishing and information companies. As such, their business model is fragile. Within the present financial pressure cooker, the Lieberman question could blow the lid off these two organization’s for fee legal business. If government agencies shift to a service provided by Google, Microsoft, or Yahoo, I think these two dead tree outfits will crash to the forest floor.

What the likelihood of this downside scenario. I would put it at better than 60 percent. Have another view? Share it, please. Set the addled goose straight.

Stephen Arnold, March 2, 2009

Written by Stephen E. Arnold · Filed Under Business strategy, Google, Library automation, Microsoft, News, Online (general), Publishing, Yahoo | 6 Comments

« Previous Page — Next Page »

Search the site
Subscribe to Beyond Search
Feature archive
News archive

Stephen E. Arnold monitors search, content processing, text mining and related topics from his high-tech nerve center in rural Kentucky. He tries to winnow the goose feathers from the giblets. He works with colleagues worldwide to make this Web log useful to those who want to go "beyond search". Contact him at sa [at] arnoldit.com. His Web site with additional information about search is arnoldit.com.

Categories
- 3D-Printing
- Acquisition
- Advertising
- Aggregation
- AI
- Alexa
- algorithms
- Amazon
- Amazonia
- Analytics
- Appliance
- Applications
- Audio
- Augmented Reality
- Big data
- Bing
- Bitcoin
- Bitext
- Book review
- Business intelligence
- Business process
- Business strategy
- Censorship
- Cloud computing
- Company Profile
- Conferences
- Connectors
- Consulting
- Consumer
- Content processing
- Copyright
- Corporate Concerns
- Cost
- Crawl
- Crowdfunding
- cryptocurrency
- Customer support
- Cyber OSINT
- cybercrime
- cybersecurity
- Dark Web
- DarkCyber
- Data
- Data mining
- Database
- Deepfakes
- Digital Assistant
- Digital Library
- E2EE
- ECommerce
- EDiscovery
- Editorial opinion
- Education
- Emoticons
- Enterprise
- Enterprise search
- Entity extraction
- Ethics
- Facebook
- Faceted search
- Factualities
- Feature
- Federated search
- Financial
- Fogint
- Google
- Governance
- Government
- Hackers
- healthcare
- IBM Watson
- Image search
- Indexing
- Infrastructure
- Innovation
- Integration
- intelware
- Interface
- Internet
- Interview
- Investment
- law enforcement
- Legal matters
- Library automation
- Management
- Marketing
- Mathematics
- Metadata
- Microsoft
- Mobile
- Natural language processing
- News
- NGIA
- Online (general)
- Open Access
- Open source
- OSINT
- Osint Radar
- Overflight
- Palantir
- Patents
- Personnel
- Podcast
- Policeware
- Portals
- Predictive coding
- Privacy
- Profile
- Publishing
- Quotation
- Real time search
- Reference tool
- Rich media
- Robot Writer
- Search
- Search enabled applications
- search engine
- Search quality
- Security
- Semantic
- Sentiment analysis
- SEO
- SharePoint
- Short Honks
- Smart Technology
- Social
- Social Media
- software
- Statistics
- Taxonomy
- Technology
- Text analytics
- Text processing
- Tools
- Tor
- Training
- Translation
- Twitter
- Uncategorized
- Unstructured Data
- User experience
- User Interface
- Vertical search
- Video
- visualization
- Voice search
- Voice technology
- Web 3
- Web Services
- Webinar
- Windows
- Work flow
- XML
- Yahoo

Beyond Search

Beyond Keyword Search

MapReduce in a Browser: A Glimpse of the Google in 2011

Autonomy IDOL Metrics

SEO: Good, Bad, Ugly

A Wrinkle in the Government Procurement Envelope

Mysteries of Online 9: Time

Microsoft Trumps Google, Dismisses Its Enterprise Services

YAGG Update: PageRank Tweak or Bug

Another YAGG: Picasa Privacy

Potential Trouble for LexisNexis and Westlaw

Search the site

Categories

Archives

Recent Posts

Meta

Share this:

Share this:

Share this:

Share this:

Share this:

Share this:

Share this:

Share this:

Share this:

Share this:

Search the site

Categories

Archives

Recent Posts

Meta