The IBM Watson PR Blitz Continues
January 9, 2014
Content marketing is alive and well at IBM. I read two Watson related stories this morning. Let’s look at each and see if there are hints about how IBM will generate $10 billion in revenue from the game show winning Watson information system.
The New York Times
“IBM Is Betting Watson Can Earn Its Keep” appears on page B 9 of the hard copy which arrives in Harrod’s Creek most days. A digital instance of this Quentin Hardy write up may be online at http://nyti.ms/1krYgfx. If not, contact a Google Penguin for guidance.
The write up contains a quote to note:
Virginia M. Rometty, CEO of IBM: Watson does more than find the needle in the haystack. It understands the haystack. It understands concepts.
The best haystack quote I have heard came from Matt Kohl, student of Gerald Salton and founder of Personal Library Software. Dr. Kohl pointed out that that haystacks involve needles, multiple haystacks and multiple needles, and other nuances that make clear how difficult locating information can be.
The quote attributed to Ms. Rometty also nods to Autonomy’s marketing. Autonomy, since 1996, emphasized that one of the core functions of the Bayesian-Shannon-Laplace-Volterra method was identifying concepts automatically. Are IBM and arch rival Hewlett Packard using the same 18 year old marketing lingo? If so, I wonder how that will play out against the real-life struggles HP seems to be experiencing in the information retrieval sector.
There are several other interesting points in the content marketing-style article:
- IBM is “giving Watson $1 billion and a nice office.” I wonder if the nuance of “giving” is better than “investing.”
- $100 million will be allocated “for venture investments related to Watson’s so-called data analysis and recommendation technology.” One hopes that IBM’s future acquisitions deliver value. IBM already owns iPhrase, a “smart search system,” some of Dr. Ramanathan Guha’s semantic technology, Vivisimo, and the text processing component of SPSS called Clementine. That’s a lot of in hand technology, but IBM wants to buy more. What are the costs of integration?
- IBM has to figure out how to “cohere” with other IBM initiatives. Is Cognos now part of Watson? What happens to the IBM Almaden research flowing from Web Fountain and similar initiatives? What is the role of Lucene, which I heard is the plumbing of Watson?
The IBM write up will get wide pick up, but the article strikes me as raising some serious questions about Watson initiative. There may be 750 eager developers wanting to write applications for Watson. I am waiting for an Internet accessible demonstration against a live data set.
The Wall Street Journal, Round 2, January 9, 2014
The day after running “IBM Struggles to turn Watson into Big Business”, the real news outfit ran a second story called “IBM Set to Expand Watson’s Reach.” I saw this on page B2 of the hard copy that arrived in Harrod’s Creek this morning. Progress. There was no WSJ delivery on January 6 and January 7 because it was too cold. You may be able to locate a digital version of the story at http://on.wsj.com/1ikQa3X. (Same Penguin advice applies if the article is not available online.)
This January 9, 2014, story includes a quote to note:
Michael Rhodin, IBM senior vice president, Watson unit: We are now moving into more of a rapid expansion phase. We’ve made incredible progress. There is lots more to do. We would not be pursuing it if we did not think think had big commercial potential.
We then learn that by 2018, Watson will generate $1 billion per year. Autonomy was founded in 1996 and at the time of its purchase by Hewlett Packard, the company reported revenue in the $800 million range. IBM wants to generate more revenue from search in less time than Autonomy. No other enterprise search and content processing vendor has been able to match Autonomy’s performance. In fact, Autonomy’s rapid growth after 2004 was due in part to acquisitions. Autonomy paid about $500 million for Verity and IBM’s $100 million for investments may not buy much in a search sector that has consolidated. Oracle paid about $1 billion for Endeca which generated about $130 million a year in 2011.
Net Net
Watson has better PR than most of the search and content processing companies I track. How many people at the Watson unit pay attention to SRCH2, Open Search Server, Sphinx Search, SearchDaimon, the Dassault Cloud 360 system, and the dozens and dozens of other companies pitching information retrieval solutions.
I would wager that the goals for Watson are unachievable in the time frame outlined. The ability of a large company to blast past Autonomy’s revenue benchmark will require agility, flexibility, price wizardry, and a product that delivers verifiable value.
As the second Wall Street Journal points out, “IBM is looking to revive growth after six straight quarters of revenue declines.”
IBM may be better at content marketing than hitting the revenue targets for Watson at the same time Hewlett Packard is trying to generate massive revenues from the Autonomy technology. Will Google sit on its hands as IBM and HP scoop up the enterprise deals? What about Amazon? Its search system is a so-so offering, but it can offer some sugar treats to organizations looking to kick tires with reduced risk.
Many organizations are downloading open source search and data management systems. These are good enough when smart software is still a work in progress. With 2,000 people working on Watson, the trajectory of this solution will be interesting to follow.
Stephen E Arnold, January 9, 2014
Distraction Addiction: Welcoming Predictive Search Systems
January 9, 2014
The article on Business Insider titled Here’s How Many Times People Switch Devices In a Single Hour provides insight into the studies being undertaken by both Google and Facebook into following users from device to device. They need to demonstrate to advertisers that the ad one user saw on his laptop at work later caused him to make a purchase from his smartphone. The article states
“A new study from the British unit of advertising buyer OMD shows just how massively important this cross-device tracking has become to monitoring a given consumer’s behavior.
In looking at the behavior of 200 Brits during one evening, OMD found that the average person shifted his attention between his smartphone, tablet, and laptop a staggering 21 times in one hour.”
This study’s findings may not come as huge surprise. An article on Salon titled How Baby Boomers Screwed Their Kids and Created Millennial Impatience argues that the Generation Y is the most distracted and impatient batch of people yet. The article contends,
“According to a study at Northwestern University, the number of children and young people diagnosed with attention deficit hyperactivity disorder (ADHD) shot up 66 percent between 2000 and 2010. Why the sudden and huge spike in a frontal lobe dysfunction over the course of a decade… What I believe is likely happening, however, is that more young people are developing an addiction to distraction. An entire generation has become addicted to the dopamine-producing effects of text messages, e-mails and other online activities.”
This “addiction to distraction” is often held up by Gen Y’ers as an ability to “multi-task”. But what does it mean to be someone unable to focus? In Buddhism there is the belief that if you are doing more than one focused task, you are not truly alive.
With telework, the workplace is now the world.
We have all succumbed at one time or another to the call of checking our e-mail, Facebook, or Twitter account, but when we are doing it so often that it takes over our concentration, what have our lives become? There is a wide gap between flitting from these exciting distractions and actually gaining some foothold of understanding. And the more we do jump back and forth between tasks, the less likely it becomes that any knowledge is created or stored. The Salon article paints a bleak picture, starting off with the dark Philip Larkin poem “This Be the Verse” (it is hardly “High Windows”) and including this dreary image of the future,
A Full Text Engine Blooms in Life
January 9, 2014
Basic search for static Web sites stink. They are just a generic code that takes a one-size fits all approach to search and as we all know that never works. Stavros Korokithakis realized this problem and decided that he wanted to create a full-text search engine that was accurate. In his article, “Writing A Full-Text Search Engine Using Bloom Filters,” Korokithakis details how he wrote his own search using an inverted index and bloom filters. An inverted index works by mapping every word in a document to the ID of the document. As one can imagine that list grows very big and the basic search engine for a static Web site returns every hit. A search plug-in limits itself to titles, tags, and key words. How do you get the same results for a static search?
A bloom filter is the answer. A bloom filter is a data structure that stores elements in a fixed number of bits and tells users whether it has seen those elements before queried. It is also apparently easy to implement a bloom filter:
- “Create one filter per document and add all the words in that document in the filter.
- Serialize the (fixed-size) filter in some sort of string and send it to the client.
- When the client needs to search, iterate through all the filters, looking for ones that match all the terms, and return the document names.
- Profit!”
He even has a quick implementation guide in Python. It sounds like a wonderful way to improve static Web site search, but could not the same problem be solved with a simple plug-in as described above? With the rampant use of people relying pre-made Web site servers such as Word Press, tumblr, etc. they come with built-in plug-ins. Is this for the bigger Web sites people deploy?
Whitney Grace, January 09, 2014
Sponsored by ArnoldIT.com, developer of Augmentext
IBM Wrestling with Watson
January 8, 2014
“IBM Struggles to turn Watson into Big Business” warrants a USA Today treatment. You can find the story in the hard copy of the newspaper on page A 1 and A 2. I saw a link to the item online at http://on.wsj.com/1iShfOG but you may have to pay to read it or chase down a Penguin friendly instance of the article.
The main point is that IBM targeted $10 billion in Watson revenue by 2023. Watson has generated less than $100 million in revenue I presume since the system “won” the Jeopardy game show.
The Wall Street Journal article is interesting because it contains a number of semantic signals, for example:
- The use of the phrase “in a ditch” in reference to a a project at the University of Texas M.D. Anderson Cancer Center
- The statement “Watson is having more trouble solving real-life problems”
- The revelation that “Watson doesn’t work with standard hardware”
- An allegedly accurate quote from a client that says “Watson initially took too long to learn”
- The assertion that “IBM reworked Watson’s training regimen”
- The sprinkling of “could’s” and “if’s”
I came away from the story with a sense of déjà vu. I realized that over the last 25 years I have heard similar information about other “smart” search systems. The themes run through time the way a bituminous coal seam threads through the crust of the earth. When one of these seams catches fire, there are few inexpensive and quick ways to put out the fire. Applied to Watson, my hunch is that the cost of getting Watson to generate $10 billion in revenue is going to be a very big number.
The Wall Street Journal story references the need for humans to learn and then to train Watson about the topic. When Watson goes off track, more humans have to correct Watson. I want to point out that training a smart system on a specific corpus of content is tricky. Algorithms can be quite sensitive to small errors in initial settings. Over time, the algorithms do their thing and wander. This translates to humans who have to monitor the smart system to make sure it does not output information in which it has generated confidence scores that are wrong or undifferentiated. The Wall Street Journal nudges this state of affairs in this passage:
In a recent visit to his [a Sloan Kettering oncologist] pulled out an iPad and showed a screen from Watson that listed three potential treatments. Watson was less than 32% confident that any of them were [sic] correct.
Then the Wall Street Journal reported that tweaking Watson was tough, saying:
The project initially ran awry because IBM’s engineers and Anderson’s doctors didn’t understand each other.
No surprise, but the fix just adds to the costs of the system. The article revealed:
IBM developers now meet with doctors several times a week.
Why is this Watson write up intriguing to me? There are four reasons:
First, the Wall Street Journal makes clear that dreams about dollars from search and content processing are easy to inflate and tough to deliver. Most search vendors and their stakeholders discover the difference between marketing hyperbole and reality.
Second, the Watson system is essentially dependent on human involvement. The objective of certain types of smart software is to reduce the need for human involvement. Watching Star Trek and Spock is not the same as delivering advanced systems that work and are affordable.
Third, the revenue generated by Watson is actually pretty good. Endeca hit $100 million between 1998 and 2011 when it was acquired by Oracle. Autonomy achieved $800 million between 1996 and 2011 when it was purchased by Hewlett Packard. Watson has been available for a couple of years. The problem is that the goal is, it appears, out of reach even for a company with IBM’s need for a hot new product and the resources to sell almost anything to large organizations.
Fourth, Watson is walking down the same path that STAIRS III, an early IBM search system, followed. IBM embraced open source to help reduce the cost of delivering basic search. Now IBM is finding that the value-adds are more difficult than key word matching and Boolean centric information retrieval. When a company does not learn from its own prior experiences in content processing, the voyage of discovery becomes more risky.
Net net: IBM has its hands full. I am confident that an azure chip consultant and a couple of 20 somethings can fix up Watson in a nonce. But if remediation is not possible, IBM may vie with Hewlett Packard as the pre-eminent example of the perils of the search and content processing business.
Stephen E Arnold, January 8, 2014
Autonomy: Mixed Signals from HP in December 2013
January 7, 2014
Before I headed south for a couple of weeks where the sun shines, I read “HP Software Chief: Big Data Role Gives Autonomy a Boost.” After I read the story, I thought, “Maybe HP is going to hunker down and make Autonomy sales.” The story, which I assume is spot on, stated:
HP Software executive vice-president George Kadifa, who is also a member of the company’s executive council, says big-data analytics firm Autonomy is bouncing back from the controversy that followed the $11bn (£7bn) takeover by HP in August 2011.
There was a quote, which I assume is accurate:
“We’re doing great with Autonomy. Clearly, a year ago it was quite problematic — between disclosures about accounting issues and stuff like that,” Kadifa, [HP executive vice president] said.
I assume that HP knows that Longsand Limited, an Autonomy property that is now HP’s had a fellow named Sergio Erik Letelier, a lawyer, on the Longsand board of directors. See http://bit.ly/1eiFW1b. With this link, is it possible that HP has a way to get some useful insight into Autonomy?
Upon my return from sunny climes, I was catching up with my Overflight summaries and noted, “HP Axes Autonomy Cambridge Jobs.” According to the story in Business Weekly:
HP has categorically denied that it is making staff at Autonomy Cambridge redundant and considering closing its operations in the UK technology cluster. It says it is actively hiring more staff and upgrading the Cambridge Business Park premises. Autonomy and Aurasma staff began quitting the businesses after Mike Lynch departed. According to informed sources, the exodus reflected general disgruntlement at the bureaucratic way the US giant was trying to run the companies following its mega-billion takeover. The demise of one of Cambridge’s great technology success stories is particularly sad as Lynch had built it into the second biggest tech business in the cluster’s history behind ARM.
I find the flow of information about HP and Autonomy fascinating. With Silicon Valley struggling to capture super bright technology wizards, my thought is that HP might want to leverage Mr. Kadifa’s apparent upbeat view of Autonomy in Cambridge. There are some bright folks in the part of the world. A few of them have the math skills to exploit the Bayes, Shannon, Moore-Penrose, and the Volterra method. A happy Cambridge business community may help cultivate a productive source of new hires for HP in my view.
My question, “Which is it? Autonomy a success or a disappointment? Which is it: staff additions or staff reductions in the shadow of Cambridge University?
Autonomy remains a focal point for search and content processing. Interesting stuff.
Stephen E Arnold, January 7, 2014
iPhrase Profile Now Available
January 7, 2014
The Xenky.com Vendor Profiles page hosts free reports about important search and content processing vendors. A profile of iPhrase, acquired by IBM in 2006, is now available. iPhrase is important for a number of reasons. You can access the free iPhrase profile at http://bit.ly/1a1H9Y1.
iPhrase embraced ROI or return on investment as a key value proposition for the complex system. The company departed from Autonomy’s “reduce duplicate work” and tried to create “hard numbers” for licensees’ “value” from the iPhrase system. IBM bought the company, so the ROI for the entrepreneurs was probably okay. The ROI for licensees might be more difficult to determine.
The company was, like Fulcrum Technologies and Autonomy, in the repository business. The indexes pointed to content in the repositories, used the data to enhance search results, and provided “discovery services.” For fans of XML and computationally interesting approaches to search, iPhrase is a system of note. The period from 1996 to 1999 spawned a number of enterprise search vendors. The similarity of most is fascinating. The research computing efforts paid off as entrepreneurs migrated lab demos into the commercial market.
Third, the company lives on today. Just as OpenText uses aging search technology, so does iPhrase’s owner. If you have OmniFind Discovery in your organization, you have some of the 1999 technology goodness available to you. The Xenky profiles make clear that most of the search methods have been recycled multiple times. What’s different is the marketers’ lack of familiarity with pioneering efforts from days of yore.
In a recent LinkedIn discussion, one eager person wanted information about how to establish the “ROI” of search. Anyone looking for how some quite intelligent folks approached “value” for complex information retrieval infrastructure, the iPhrase profile may be useful.
Is it surprising that today’s vendors insist that their firms’ software is revolutionary? The Xenky profiles make one thing clear—there’s not much new happening in search. In fact, marketers are reinventing the wheel. The LinkedIn discussions speak to the assertion, “You don’t know what you don’t know.”
The Xenky profiles put the challenge of enterprise search and content processing in a historical context.
Next up is a free Autonomy report covering the period from 1996 with a look back to Cambridge Neurodynamics up to December 2007. Is a profile of a company now owned by Hewlett Packard of value?
You may be surprised because Autonomy is one search vendor marching to a different drummer.
Stephen E Arnold, January 7, 2014
Information Black Holes: Autonomy and Its Value Proposition
January 6, 2014
I follow two or three LinkedIn groups. Believe me. The process is painful. On the plus side, LinkedIn’s discussions of “enterprise search” reveal the broken ribs in the body of information retrieval. On the surface, enterprise search and content processing appear to be fit and trim. The LinkedIn discussion X-ray reveals some painful and potentially life-threatening injuries. Whether it is marketing professionals at search vendors or individuals with zero background in information retrieval, the discussions often give me a piercing headache.
The eruption of digital information posed a challenge to UK firms in Autonomy’s “Information Black Holes” report. © Autonomy, 1999
One of the “gaps” in the enterprise search sector is a lack of historical perspective. Moderators and participants see only the “now” of their search work. When looking down the information highway, the LinkedIn search group participants strain to see bright white lines. Anyone who has driven on the roads in Kentucky knows that lines are neither bright nor white. Most are faded, mere suggestions of where the traffic should flow.
In 1999, I picked up a printed document called “Information Black Holes.” The subtitle was this question, “Will the Evolution of EIPs Save British Business £17 Billion per Year?” The author of the report was an azure chip consulting firm doing business as “Continental Research.” The company sponsoring the research was Autonomy. Autonomy as a concept relates to “automatic”, “automation,” and “autonomous.” This connotation is a powerful one. Think “automation” and the mind accepts an initial investment followed by significant cost reductions. Autonomy had a name and brand advantage from its inception. Who remembers Cambridge Neurodynamics? Not many of the 20 something flogging search and content processing systems in 2014 I would wager.
As you may know, Hewlett Packard purchased Autonomy in 2011. I doubt that HP has a copy of this document, and I know that most of the LinkedIn enterprise search group members have not read the report. I understand because 15 year old marketing collateral (unlike Kentucky bourbon) does not often improve with age. But “Information Black Holes” is an important document. Unwittingly today’s enterprise search vendors are addressing many of the topics set forth in the 1999 Autonomy publication.
More Changes to Google Search Results
December 31, 2013
We learn about a couple of new changes Google is making to their search-result pages in “Google SERPs Updates: In-Depth Articles & Knowledge Graph Results for Car Shoppers” at Search Engine Watch. The car-shopping feature makes sense; Google has added vehicles to its Knowledge Graph in a way that allows users to do their comparison shopping right in the search results. That’s handy, and places Google in competition with auto comparison and shopping sites.
The in-depth article part is a little more complex. The company is positioning this change as helpful to those 10 percent of users Google says are after more than just a quick answer. While they do promise to include “up to” three in-depth articles and a link to pre-load “up to” ten more, these results are now pushed to the bottom of the page.
Writer Jennifer Slegg tells us:
“In-depth articles previously appeared in the middle of the search results. This update should help appease those webmasters who are concerned about organic search results being pushed lower and lower on the page, while still giving the searchers the information they want….
This change is currently available in English on Google.com, however they plan to expand the feature to more countries and languages in the future. Not all search results will have in-depth articles, but the program is expanding with more topics, particularly things that are related to current events. Google promises that alongside reputable and established news sources like the Washington Post and The Guardian, readers will also find in-depth content from smaller blogs and publications.”
This being a Search Engine Watch article, it does pass on Google’s advice for webmasters hoping to reach these users who are after comprehensive content. If you belong that slice of inquisitive searchers, just remember to scroll down and click through for the good stuff. Of course, it would make things easier for the search giant if that pesky ten percent would just get with the program and take what Google offers. Maybe someday.
Cynthia Murrell, December 31, 2013
Sponsored by ArnoldIT.com, developer of Augmentext
Ovum Review Shows Funnelback Search Company to be Winning
December 29, 2013
For a perfectly balanced and objective review read the summarized report titled Ovum Technology Audit of Funnelback Search on funnelback.com. You can download the report for free in its entirety, but what appears in summary seems to be a good indicator of the sort of information you will receive.
The report explores Funnelback closely, noting that the company:
“offers organisations [sic] rapid time-to-value for a wide range of search functions at a relatively low cost. The variety of deployment options available ensures that the solution can address most organisational IT architecture requirements… Funnelback includes a highly intuitive contextual navigation function, which dynamically creates filters across unstructured and semi-structured content across many information sources… In addition, there is a behavioural learning capability, which automatically monitors the search patterns of users and tunes the algorithms to their requirements to deliver more personalised results.”
The report did fail to mention that the Australian search engine technology company’s name comes from the combining of two Australian spiders, the funnel-web and redback. It is also meant to imply the company’s objective of funneling data and important information back to the customer. Ovum‘s five star “report” interrupts itself to applaud Funnelback’s functionality and SaaS solution several times, so perhaps there simply wasn’t room.
Chelsea Kerwin, December 29, 2013
Sponsored by ArnoldIT.com, developer of Augmentext
EasyAsk Upgrades Musician Superstore Website Search
December 27, 2013
The article titled Metakinetic Teams Up with EasyAsk To Provide A New On-Site Search Solution To Andertons on the ecommerce agency Metakinetic’s website promotes the partnership formed in order to overhaul their top client’s site search. Andertons superstore for musicians is being upgraded, the article explains,
“The solution from EasyAsk has been rapidly deployed as a Software as a Service (SaaS) solution and empowers Andertons to take control of a single solution for their on-site search, navigation and merchandising functions. Using natural language processing, the solution allows Andertons to give its website visitors an easy and intuitive way to navigate the site, helping them to easily find the product for which they are searching.”
The natural language capability offered by EasyAsk taps into the unstructured data on the products and makes it searchable. Keywords from product descriptions will ensure that every relevant item appears, but the user also has the option of placing limits such as price minimums and maximums. A Director from Metakinetic named Darren Bull lauded the EasyAsk team for their professionalism and efficiency, and claimed perfect confidence in seeing an uplift in sales as a result of the changes made. Online customers have high expectations for the ease and ingeniousness of shopping websites, and the adjustments to the Andertons’ might have just brought them into the 21st century, just in time for Christmas.
Chelsea Kerwin, December 27, 2013
Sponsored by ArnoldIT.com, developer of Augmentext