Understanding Intention: Fluffy and Frothy with a Few Factoids Folded In
October 16, 2017
Introduction
One of my colleagues forwarded me a document called “Understanding Intention: Using Content, Context, and the Crowd to Build Better Search Applications.” To get a copy of the collateral, one has to register at this link. My colleague wanted to know what I thought about this “book” by Lucidworks. That’s what Lucidworks calls the 25 page marketing brochure. I read the PDF file and was surprised at what I perceived as fluff, not facts or a cohesive argument.
The topic was of interest to my colleague because we completed a five month review and analysis of “intent” technology. In addition to two white papers about using smart software to figure out and tag (index) content, we had to immerse ourselves in computational linguistics, multi-language content processing technology, and semantic methods for “making sense” of text.
The Lucidworks’ document purported to explain intent in terms of content, context, and the crowd. The company explains:
With the challenges of scaling and storage ticked off the to-do list, what’s next for search in the enterprise? This ebook looks at the holy trinity of content, context, and crowd and how these three ingredients can drive a personalized, highly-relevant search experience for every user.
The presentation of “intent” was quite different from what I expected. The details of figuring out what content “means” were sparse. The focus was not on methodology but on selling integration services. I found this interesting because I have Lucidworks in my list of open source search vendors. These are companies which repackage open source technology, create some proprietary software, and assist organizations with engineering and integrating services.

The book was an explanation anchored in buzzwords, not the type of detail we expected. After reading the text, I was not sure how Lucidworks would go about figuring out what an utterance might mean. The intent-centric systems we reviewed over the course of five months followed several different paths.
Some companies relied upon statistical procedures. Others used dictionaries and pattern matching. A few combined multiple approaches in a content pipeline. Our client, a firm based in Madrid, focused on computational linguistics plus a series of procedures which combined proprietary methods with “modules” to perform specific functions. The idea for this approach was to reduce the errors in intent identification from accuracy between 65 percent to 80 percent to accuracy approaching and often exceeding 90 percent. For text processing in multi-language corpuses, the Spanish company’s approach was a breakthrough.
I was disappointed but not surprised that Lucidworks’ approach was breezy. One of my colleagues used the word “frothy” to describe the information in the “Understanding Intention” document.
As I read the document, which struck me as a shotgun marriage of generalizations and examples of use cases in which “intent” was important, I made some notes.
Let me highlight five of the observations I made. I urge you to read the original Lucidworks’ document so you can judge the Lucidworks’ arguments for yourself.
Imitation without Attribution
My first reaction was that Lucidworks had borrowed conceptually from ideas articulated by Dr. Gregory Grefenstette and his book Search Based Applications: At the Confluence of Search and Database Technologies. You can purchase this 2011 book on Amazon at this link. Lucidworks’ approach, unlike Dr. Grefenstette’s borrowed some of the analysis but did not include the detail which supports the increasing importance of using search as a utility within larger information access solutions. Without detail, the Lucidworks’ document struck me as a description of the type of solutions that a company like Tibco is now offering its customers.
MarkLogic Aims to Take on Oracle in Enterprise Class Data Hub Frameworks
October 10, 2017
MarkLogic is trying to give Oracle a run for its money in the world of enterprise-class data hubs. According to a recent press release on ITWire, “MarkLogic Releases New Enterprise Class Data Hub Framework to Enhance Agility and Speed Digital Transformations.”
How does this Australian legend plan on doing this? According to the release:
Traditionally, integrating data from silos has been very costly and time consuming for large organizations looking to make faster and better decisions based on their data assets. The Data Hub Framework simplifies and speeds the process of building a MarkLogic solution by providing a framework around how to data model, load data, harmonize data, and iterate with new data and compliance requirements.
But is that enough to unseat Oracle, who has long had a seat at the head of the table? Especially, since they have their own new framework hitting the market. That is still up for debate, but MarkLogic is confident in their ability to compete. According to the piece:
Unlike other databases, NoSQL was specifically designed to ingest and integrate all types of disparate data to find relationships among data, and drive searches and analytics—within seconds.
This battle is just beginning and we have no indication of who has the edge, but you can bet it will be an interesting fight in the marketplace between these two titans.
Patrick Roland, October 10, 2017
The Future of Visual and Voice Search
October 4, 2017
From the perspective of the digital marketers they are, GeoMarketing ponders, “How Will Visual and Voice Search Evolve?” Writer David Kaplan consulted Bing Ads’ Purna Virji on what to expect going forward. For example, though companies are not yet doing much to monetize visual search, Virji says that could change as AIs continue to improve their image-recognition abilities. She also emphasizes the potential of visual search for product discovery—If, for example, someone can locate and buy a pair of shoes just by snapping a picture of a stranger’s feet, sales should benefit handsomely. Virji had this to say about traditional, voice, and image search functionalities working together:
A prediction that Andrew Ng had made when he was still with Baidu was that that ‘by 2020, 50 percent of all search will be image or voice.’ Typing will likely never go away. But now, we have more options. Just like mobile didn’t kill the desktop, apps didn’t kill the browser, the mix of visual, voice, and text will combine in ways that are natural extensions of user behavior. We’ll use those tools depending on the specific need and situation at the moment. For example, you could ‘show’ Cortana a picture of a dress in a magazine via your phone camera and say ‘Hey Cortana, I’d love to buy a dress like this,’ and she can go find where to buy it online. In this way, you used voice and images to find what you were looking for.
The interview also touches on the impact of visual search on local marketing and how its growing use in social media offers data analysts a wealth of targeted-advertising potential.
Cynthia Murrell, October 4, 2017
Oracle: Sparking the Database Fire
October 3, 2017
Hadoop? Er, what? And Microsoft .SQLServer? Or MarkLogic’s XML, business intelligence, analytics, and search offering? Amazon’s storage complex? IBM’s DB2? The recently-endowed MongoDB?
I thought of these systems when I read “Targeting Cybersecurity, Larry Ellison Debuts Oracle’s New ‘Self-Driving’ Database.”
For me, the main point of the write up is that the Oracle database is coming. There’s nothing like an announcement to keep the Oracle faithful in the fold.
If the write up is accurate, Oracle is embracing buzzy trends, storage that eliminates the guess work, and security. (Remember Secure Enterprise Search, the Security Server, and the nifty credential verification procedures? I do.)
The new version of Oracle, according to the write up, will deliver self driving. Cars don’t do this too well, but the Oracle database will and darned soon.
The 18c Autonomous Database or 18cad will:
- Fix itself
- Cost less than Amazon’s cloud
- Go faster
- Be online 99.995 percent of the time
And more, of course.
Let’s assume that Oracle 18cad works as described. (Words are usually easier to do than software I remind myself.)
The customers look to be big winners. Better, faster, cheaper. Oracle believes its revenues will soar because happy customers just buy more Oracle goodies.
Will there be a downside?
What about database administrators? Some organizations may assume that 18cad will allow some expensive database administrator (DBA) heads to roll.
What about the competition? I anticipate more marketing fireworks or at least some open source “sparks” and competitive flames to heat up the cold autumn days.
Stephen E Arnold, October 3, 2017
Antitrust Legislation Insufficient for Information Marketplace
October 3, 2017
At his blog, Continuations, venture capitalist Albert Wenger calls for a new approach to regulating the information market in his piece, “Right Goals, Wrong Tools: EU Antitrust Case Against Google.” Citing this case against Google, he observes that existing antitrust legislation is not up to the task of regulating companies like Google. Instead, he insists, we need solutions that consider today’s realities. He writes:
We need alternative regulatory tools that are more in line with how computation works and why the properties of information tend to lead to concentration. We want networks and network effects to exist because of their positive externalities. Imagine as a counter factual a world of highly fragmented operating systems for smartphones – it would make it extremely difficult for app developers to write apps that work well for everyone (hard enough across iOS and Android). At the same time we want to prevent networks and network effect companies from becoming so powerful and extractive that they stifle innovation. For instance, I have written before about how the app store duopoly has prevented certain kinds of innovation. Antitrust is a sledge hammer that was invented at a time of large industrial companies that had no network effects. Using it now is a bad idea and doubly so because it goes only after Google which has by far the more open mobile operating system when compared to Apple.
Wenger suggests a solution could lie in a requirement for open standards, or in the “right to be represented by a bot.” He points to his 17 minute Ted talk, embedded in the article, for more on his public policy suggestions.
Cynthia Murrell, October 3, 2017
The Narrowing App Market
September 29, 2017
If you are thinking of going into app development, first take a gander at this write-up; Business Insider reports, “Half of Digital Media Time Is Spent in Five Apps.” Citing comScore’s 2017 US Mobile App Report , writer Laurie Beaver tells us:
Users spend 90% of their mobile app time in their top five apps, making up 51% of total digital time spent. Perhaps more alarming is that half of the time spent on smartphones is within just one app. That drops dramatically to 18% of time for the second most used app. This suggests that unless a brand’s or business’ app is the first or second most used (most likely Facebook- or Google-owned), it’s unlikely to get any meaningful share of users’ attention.
There are a few reasons for developers to take heart—the number of app downloads is picking up, and users have become more willing to allow push notifications. Most importantly, perhaps, is that users are making in-app purchases; that is where most apps make their money. Beaver continues:
Nevertheless, the report shows the astonishing influence Facebook and Google have over how US mobile app users spend their time. And given the increasingly large share the top five apps have, it’s likely to only become more difficult for brands and publishers to receive any share of users’ time. Alternate app experiences such as Apple’s iMessage apps, Google’s Instant Apps, and Facebook Messenger’s Instant Games could provide brands and publishers with new avenues to reach consumers where they’re spending their time. While these services are nascent, they do provide a promising option for businesses moving forward.
We’re reminded that apps have gained ground over browsers, and are now the main way folks get online. However, the trends toward app consolidation and app abandonment may lead to a “post-app” future. Never fear, though—Business Insider’s research service, BI Intelligence, offers a report titled “The End of Apps” ($495) that could help businesses and developers prepare for the future. Founded in 2007, Business Insider is headquartered in New York City.
Cynthia Murrell, September 29, 2017
Let the Tweets Lead Your Marketing, Come What May
September 14, 2017
It seems that sales and marketing departments just can’t keep up with consumer patterns and behaviors. The latest example of this is explained in a DMA article outlining how to utilize social media to reach target leads. As people rely more on their own search and online acumen and less on professionals (IRL), marketing has to adjust.
Aseem Badshah, Founder, and CEO of Socedo, explain the problem and a possible solution:
Traditionally, B2B marketers created content based on the products they want to promote. Now that so much of the B2B decision making process occurs online, content has to be more customer-centric. The current set of website analytics tools provide some insights, but only on the audience who have already reached your website. Intent data from social media can help you make your content more relevant. By analyzing social media signals and looking at which signals are picking up in volume over time, you can gain new insights into your audience that helps you create more relevant content.
While everything Badshah says may be true, one has to ask themselves, is following the masses always a good thing? If a business wants to maintain their integrity to their field would it be in their best interest to follow the lead of their target demographic’s hashtags or work harder at marketing their product/service despite the apparent twitter-provided disinterest?
Catherine Lamsfuss, September 14, 2017
IBM Cloud As a Rube Goldberg Machine
September 10, 2017
Navigate to AdAge. Select the IBM ad. Its title is “IBM Cloud: Cloud for Enterprise: Pinball.” I snapped this image from the video which seems to represent a pinball game. Does this look like a Rube Goldberg machine? I think so.
Stephen E Arnold, September 10, 2017
Smart Software: An AI Future and IBM Wants to Be There for 10 Years
September 7, 2017
I read “Executives Say AI Will Change Business, but Aren’t Doing Much about It.” My takeaway: There is no there there—yet. I noted these “true factoids” waltzing through the MIT-charged write up:
- 20% of the 3,000 companies in the sample use smart software
- 5% use smart software “extensively” (No, I don’t know what extensively means either.)
- About one third of the companies in the sample “have an AI strategy in place.”
Pilgrims, that means there is money to be made in the smart software discontinuity. Consulting and coding are a match made in MBA heaven.
If my observation is accurate, IBM’s executives read the tea leaves and decided to contribute a modest $240 million for the IBM Watson Artificial Intelligence Lab at MIT. You can watch a video and read the story from Fortune Magazine at this link.
The Fortune “real” journalism outfit states:
This is the first time that a single company has underwritten an entire laboratory at the university.
However, the money will be paid out over 10 years. Lucky parents with children at MIT can look forward to undergrad, graduate, and post graduate work at the lab. No living in the basement for this cohort of wizards.
Several questions arise:
- Which institution will “own” the intellectual property of the wizards from MIT and IBM? What about the students’ contributions?
- How will US government research be allocated when there is a “new” lab which is funded by a single commercial enterprise? (Hello, MITRE, any thoughts?)
- Will young wizards who formulate a better idea be constrained? Might the presence or shadow of IBM choke off some lines of innovation until the sheepskin is handed over?
- Are Amazon, Facebook, Google, and Microsoft executives kicking themselves for not thinking up this bold marketing play and writing an even bigger check?
- Will IBM get a discount on space advertising in MIT’s subscription publications?
Worth monitoring because other big name schools might have a model to emulate? Company backed smart software labs might become the next big thing to pitch for some highly regarded, market oriented institutions. How much would Cambridge University or the stellar University of Louisville capture if they too “sold” labs to commercial enterprises? (Surprised at my inclusion of the University of Louisville? Don’t be. It’s an innovator in basketball recruiting and recruiting real estate mogul talent. Smart software is a piece of cake for this type of institution of higher learning.)
Stephen E Arnold
Old School Searcher Struggles with Organizing Information
September 7, 2017
I read a write up called “Semantic, Adaptive Search – Now that’s a Mouthful.” I cannot decide if the essay is intended to be humorous, plaintive, or factual. The main idea in the headline is that there is a type of search called “semantic” and “adaptive.” I think I know about the semantic notion. We just completed a six month analysis of syntactic and semantic technology for one of my few remaining clients. (I am semi retired as you may know, but tilting at the semantic and syntactic windmills is great fun.)
The semantic notion has inspired such experts as David Amerland, an enthusiastic proponent of the power of positive thinking and tireless self promotion, to heights of fame. The syntax idea gives experts in linguistics hope for lucrative employment opportunities. But most implementations of these hallowed “techniques” deliver massive computational overhead and outputs which require legions of expensive subject matter experts to keep on track.
The headline is one thing, but the write up is about another topic in my opinion. Here’s the passage I noted:
The basic problem with AI is no vendor is there yet.
Okay, maybe I did not correctly interpret “Semantic, Adaptive Search—Now That’s a Mouthful.” I just wasn’t expecting artificial intelligence, a very SEO type term.
But I was off base. The real subject of the write up seems to be captured in this passage:
I used to be organized, but somehow I lost that admirable trait. I blame it on information overload. Anyway, I now spend quite a bit of time searching for my blogs, white papers, and research, as I have no clue where I filed them. I have resorted to using multiple search criteria. Something I do, which is ridiculous, is repeat the same erroneous search request, because I know it’s there somewhere and the system must have misunderstood, right? So does the system learn from my mistakes, or learn the mistakes? Does anyone know?
Okay, disorganized. I would never have guessed without a title that references semantic and adaptive search, the lead paragraph about artificial intelligence, and this just cited bit of exposition which makes clear that the searcher cannot make the search systems divulge the needed information.
One factoid in the write up is that a searcher will use 2.73 terms per query. I think that number applies to desktop boat anchor searches from the Dark Ages of old school querying. Today, more than 55 percent of queries are from mobile devices. About 20 percent of those are voice based. Other queries just happen because a greater power like Google or Microsoft determines what you “really” wanted is just the ticket. To me, the shift from desktop to mobile makes the number of search terms in a query a tough number to calculate. How does one convert data automatically delivered to a Google Map when one is looking for a route with an old school query with 2.73 terms? Answer: You maybe just use whatever number pops out from a quick Bing or Google search from a laptop and go with the datum in a hit on an ad choked result list.
The confused state of search and content processing vendors is evident in their marketing, their reliance on jargon and mumbo jumbo, and fuzzy thinking about obtaining information to meet a specific information need.
I suppose there is hope. One can embrace a taxonomy and life will be good. On the other hand, disorganization does not bode well for a taxonomy created by a person who cannot locate information.
Well, one can use smart software to generate those terms, the Use Fors and the See Alsos. One can rely on massive amounts of Big Data to save the day. One can allow a busy user of SharePoint to assign terms to his or her content. Many good solutions which make information access a thrilling discipline.
Now where did I put that research for my latest book, “The Dark Web Notebook”? Ah, I know. In a folder called “DWNB Research” on my back up devices with hard copies in a banker’s box labeled “DWNB 2016-2017.”
Call me old fashioned but the semantic, syntactic, artificially intelligent razzmatazz underscores the triumph of jargon over systems and methods which deliver on point results in response to a query from a person who knows that for which he or she seeks.
Plus, I have some capable research librarians to keep me on track. Yep, real humans with MLS degrees, online research expertise, and honest-to-god reference desk experience.
Smart software and jargon requires more than disorganization and arm waving accompanied by toots from the jargon tuba.
Stephen E Arnold, September 7, 2017


