Apple and App Search: Maybe a New Approach Will Work

April 13, 2015

I remember looking for a teleprompter app via my iPad. I used the Apple store and punched in the query “teleprompter.” I got some hits, but the information returned forced me to download apps, test them, and then do some poking around on message boards.

The finding part of the Apple app search worked okay. It did nothing to reassure me that I was not overlooking an app presented with different terms used to describe what I needed: A way to display a script on an iPad. The most important feature I needed was simply not findable via the Apple search system. Run this query: “Support for Wi Drive.” Let me know how that works out for you.

I read “Report: Apple Acquired Startup Ottocat for Its App Store Search Technology.” The important point is that Apple is now taking a look at its existing technology and reaching what I perceive as a pragmatic decision: Buy something that maybe sort of works.

According the write up:

Ottocat’s technology allows the app shopper to use increasingly specific search terms to zero in on the right app. The technology also adds some metadata around the app listing — things like star ratings and percentile rankings. Ottocat also created tools for app developers to get their apps in front of just the right kind of user.

Will it work? Who knows but I hope so. The iPad’s been around with its many apps for five years. Speed is relative but not precision and recall.

Stephen E Arnold, April 13, 2015

 

Medical Search: A Long Road to Travel

April 13, 2015

Do you want a way to search medical information without false drops, the need to learn specialized vocabularies, and sidestep Boolean? Apparently the purveyors of medical search systems have left a user scratch without an antihistamine within reach.

Navigate to Slideshare (yep, LinkedIn) and flip through “Current Advances to Bridge the Usability Expressivity Gap in biomedical Semantic Search.” Before reading the 51 slide deck, you may want to refresh yourself with Quertle, PubMed, MedNar, or one of the other splendiferous medical information resources for researchers.

The slide deck identifies the problems with the existing search approaches. I can relate to these points. For example, those who tout question answering systems ignore the difficulty of passing a question from medicine to a domain consisting of math content. With math the plumbing in many advanced medical processes, the weakness is a bit of a problem and has been for decades.

The “fix” is semantic search. Well, that’s the theory. I interpreted the slide deck as communicating how a medical search system called ReVeaLD would crack this somewhat difficult nut. As an aside: I don’t like the wonky spelling that some researchers and marketers are foisting on the unsuspecting.

I admit that I am skeptical about many NGIA or next generation information access systems. One reason medical research works as well as it does is its body of generally standardized controlled term words. Learn MeSH and you have a fighting chance of figuring out if the drug the doctor prescribed is going to kill off your liver as it remediates your indigestion. Controlled vocabularies in scientific, technology, engineering, and medical domains address the annoying ambiguity problems encounter when one mixes colloquial words with quasi consultant speak. A technical buzzword is part of a technical education. It works, maybe not too well, but it works better than some of the wild and crazy systems which I have explored over the years.

You will have to dig through old jargon and new jargon such as entity reconciliation. In the law enforcement and intelligence fields, an entity from one language has to be “reconciled” with versions of the “entity” in other languages and from other domains. The technology is easier to market than make work. The ReVeaLD system is making progress as I understand the information in the slide deck.

Like other advanced information access systems, ReVeaLD has a fair number of moving parts. Here’s the diagram from Slide 27 in the deck:

image

There is also a video available at this link. The video explains that Granatum Project uses a constrained domain specific language. So much for cross domain queries, gentle reader. What is interesting to me is the similarity between the ReVeaLD system and some of the cyber OSINT next generation information access systems profiled in my new monograph. There is a visual query builder, a browser for structured data, visualization, and a number of other bells and whistles.

Several observations:

  • Finding relevant technical information requires effort. NGIA systems also require the user to exert effort. Finding the specific information required to solve a time critical problem remains a hurdle for broader deployment of some systems and methods.
  • The computational load for sophisticated content processing is significant. The ReVeaLD system is likely to such up its share of machine resources.
  • Maintaining a system with many moving parts when deployed outside of a research demonstration presents another series of technical challenges.

I am encouraged, but I want to make certain that my one or two readers understand this point: Demos and marketing are much easier to roll out than a hardened, commercial system. Just as the EC’s Promise program, ReVeaLD may have to communicate its achievements to the outside world. A long road must be followed before this particular NGIA system becomes available in Harrod’s Creek, Kentucky.

Stephen E Arnold, April 13, 2015

Spelling Suggestions via the Bisect Module

April 13, 2015

I know that those who want to implement their own search and retrieval systems learn that some features are tricky to implement. I read “Typos in Search Queries at Khan Academy.”

The author states:

The idea is simple. Store a hash of each word in a sorted array and then do binary search on that array. The hashes are small and can be tightly packed in less than 2 MB. Binary search is fast and allows the spell checking algorithm to service any query.

What is not included in the write up is detail about the time required and the frustration experienced to implement what some senior managers assume is trivial. Yep, search is not too tough when the alleged “expert” has never implemented a system.

With education struggling to teach the three Rs, the need for software that caulks the leaks in users’ ability to spell is a must have.

Stephen E Arnold, April 13, 2015

The Challenge of Synonyms

April 12, 2015

I am okay with automated text processing systems. The challenge is for software to keep pace with the words and phrases that questionable or bad actors use to communication. The marketing baloney cranked out by vendors suggests that synonyms are not a problem. I don’t agree. I think that words used to reference a subject can fool smart software and some humans as well. For an example of the challenge, navigate to “The Euphemisms People Use to Pay Their Drug Dealer in Public on Venmo.” The write up presents some of the synonyms for controlled substances; for example:

  • Kale salad thanks
  • Columbia in the 1980s
  • Road trip groceries
  • Sanity 2.0
  • 10 lbs of sugar

The synonym I found interesting was an emoji, which most search and content processing systems cannot “understand.”

image

and

image

Attensity asserts that it can “understand” emojis. Sure, if there is a look up list hard wired to a meaning. What happens if the actor changes the emoji? Like other text processing systems, the smart software may become less adept than the marketers state.

But why rain on the hype parade and remind you that search is difficult? Moving on.

Stephen E Arnold, April 12, 2015

Search and Identify a YouTube or Vimeo Tune

April 12, 2015

Need to identify a song used in a YouTube video? “Name That tune on Any YouTube Video with MooMa.sh” explains that now you can perform this search and retrieval task. Navigate to http://www.mooma.sh/. Paste a YouTube, Vimeo, or Dailymotion link into the search box and Moo1. That’s the service’s name for search, not mine. There is a video explaining how the service works and a Freshman Comp 101 write up that explains how. I use Samba Pump, for which I paid a fee. MooMa.sh reported:

image

Stephen E Arnold, April 12, 2015

Elastic What: Stretching Understanding to the Snapping Point

April 10, 2015

I love Amazon. I love Elastic as a name for search. I hate confusion. Elasticsearch is now “Elastic.” I get it. But after I read “Amazon Launches New File Storage Service For EC2”, there may be some confusion between Amazon’s use of Elastic, various Amazon “elastic” services, and search. Is Amazon going to embrace the word “elastic” to describe its information retrieval system. Will this cause some confusion with the open source search vendor Elastic? I find it interesting that name confusion is an ever present issue in search. I have mentioned what happens when a company loses control of its name. Examples range from Thunderstone (a maker of search and search appliances) and the consumer software with the same name. Smartlogic (indexing software) is now facing encroachment from Smartlogic.io (consulting services). Brainware, now owned by Lexmark, lost control of its brand when distasteful videos appeared with the label Brainware. The brand was blasted with nasty bits. Where is the search oriented Brainware now? Retired I believe just as I am.

Little wonder some people have difficulty figuring out which vendor offers what software. Stretch your mind around the challenge of explaining that you want the Amazon elastic and the Elastic elastic. Vendors seem to operate without regard to the need to reduce signal mixing.

Stephen E Arnold, April 10, 2015

A Former Googler Reflects

April 10, 2015

After a year away from Google, blogger and former Googler Tim Bray (now at Amazon) reflects on what he does and does not miss about the company in his post, “Google + 1yr.” Anyone who follows his blog, ongoing, knows Bray has been outspoken about some of his problems with his former employer: First, he really dislikes “highly-overprivileged” Silicon Valley and its surrounds, where Google is based. Secondly, he found it unsettling  to never communicate with the “actual customers paying the bills,” the advertisers.

What does Bray miss about Google? Their advanced bug tracking system tops the list, followed closely by the slick and efficient, highly collaborative internal apps deployment. He was also pretty keen on being paid partially in Google stock between 2010 and 2014. The food on campus is everything it’s cracked up to be, he admits, but as a remote worker, he rarely got to sample it.

It was a passage in Bray’s “neutral” section that most caught my eye, though. He writes:

“The number one popular gripe against Google is that they’re watching everything we do online and using it to monetize us. That one doesn’t bother me in the slightest. The services are free so someone’s gotta pay the rent, and that’s the advertisers.

“Are you worried about Google (or Facebook or Twitter or your telephone company or Microsoft or Amazon) misusing the data they collect? That’s perfectly reasonable. And it’s also a policy problem, nothing to do with technology; the solutions lie in the domains of politics and law.

“I’m actually pretty optimistic that existing legislation and common law might suffice to whack anyone who really went off the rails in this domain.

“Also, I have trouble getting exercised about it when we’re facing a wave of horrible, toxic, pervasive privacy attacks from abusive governments and actual criminals.”

Everything is relative, I suppose. Still, I think it understandable for non-insiders to remain a leery about these companies’ data habits. After all, the distinction between “abusive government” and businesses is not always so clear these days.

Cynthia Murrell, April 10, 2015

Stephen E Arnold, Publisher of CyberOSINT at www.xenky.com

 

Enterprise Search and Marketers: Think Endpoint Computing

April 9, 2015

I have to hand it to the mid tier consultants. Just when I thought the baloney about enterprise search had begun to recede, I learned I was wrong. That puts me in my place.

Search is now “endpoint computing.” I know this because I received an email from the incubator-spawned X1 search company. I have tested X1 over the years, and I have come to think neutral thoughts about the company’s administrative options and its interface.

The method of communicating with me was a somewhat dry email that began with the salutation, “Hello.”

image

The email offered me a report by the ever fascinating Gartner Group. The point of the email is that X1 is a cool vendor. That’s nice. Curious I clicked on the link and was redirected to this page:

image

Okay, a lead generating system. I filled out the information and then I received another email. This one was a bit more serious.

The author, an earnest person named “Janice” wanted to speak with me to discuss my search requirement. Furthermore the person looks forward to speaking with me about “unified search and discovery for virtual, cloud, and hybrid environments.” X1 was founded in 2003and has experienced several management changes, which is common in the “unified search and discovery for virtual, cloud, and hybrid environments” market.

What makes X1 cool? To answer the question I had to read the Gartner Report, a task which I know is a chore.

image

The idea is that search is now endpoint computing. Okay. I guess. The report reassures me that the information in the report is not an “exhaustive list of vendors.” That’s good because in the report there are five companies mentioned:

  • Login Consultants, a workspace consultant, but I don’t know what this term means
  • Tanium, a company offering endpoint security and systems management, which strikes me as a consulting outfit
  • X1, a search and retrieval vendor offering desktop search, eDiscovery, and enterprise search
  • Kaviza (a where are they now company which puzzles me) a virtualized desktop outfit now owned by Citrix
  • Framehawk (another where are they outfit), a company in the high definition user experience business (I have no idea what this means). Apparently Citrix does because Citrix also acquired Framehawk.

Quite an eclectic list. I remember when I worked at Ziff Communications in Manhattan. I listened to a group of editors working up a list of top trends over lunch. So much for methodology. The approach produced a somewhat eclectic list which was, in my opinion, of little value. The list was silly. But these were professionals. Who was I?

So the Gartner list is neither exhaustive nor coherent from my point of view.

What’s cool about X1 search as endpoint computing?

According to the mid tier consulting firms’ authors, X1 is cool because:

“Implementing VDI that provides a user experience that’s equal or superior to a distributed PC environment has been a huge challenge for organizations. While much of the innovation in the VDI space over the past few years has been focused on reducing cost and complexity, some vendors, like X1, have concentrated on removing barriers or exceptions that make VDI a compromise rather than a business enabler.” (page 3)

In the context of the firms profiled by Gartner’s “expert, the explanation of the X1 cool factor baffles me. I am not confused. I just don’t know what Gartner is trying to communicate.

I have several thoughts running through my head:

First, Gartner obviously has a financial model in place that makes it possible for the mid tier consulting firm to crank out analyses that seem to be authoritative. On closer inspection, the terminology and the information provided are not particularly useful. Does Gartner write these for free and allow the “cool” vendors to distribute these analyses for free? Why do I get a copy for free? Hmmm.

Second, there are obviously companies which value the Gartner endorsement even if it is not exactly clear what the message is. These companies—specifically X1—have seized upon the Gartner report as a way to generate leads and sales. I have no problem with that, but sending information that makes sense would appeal to me more than what I perceive as “information free” commentary.

Third, I continue to worry about the chance for meaningful discourse about the relative merits of information retrieval systems. The presentation of vendors in the context of buzzwords does little to convince me of the merits of X1 or the credibility of Gartner Group. I suppose that is why there are blue chip consulting firms and mid tier (azure chip) consulting firms. One good point: Unlike IDC’s Dave Schubmehl, the report was not $3,500 available on Amazon with my name slapped on as the “author.”

Score one for Gartner’s merrie band.

Stephen E Arnold, April 9, 2015

Twitter Search: Well, Sort Of

April 9, 2015

I read “Updating Trends on Mobile.” I am more interested in more detailed information about Twitter content, users, and tags. General purpose or massified outputs are of little utility in my little world.

I noted this passage:

We’ve been working to make content easier to find over the last several months in places like your home timeline – with recaps and Tweets from within your network – and through efforts like MagicRecs. We’ll continue to make improvements like these in the future.

If you navigate to the Twitter search page and enter a string like “enterprise search”, you will see variants of the term or phrase expressed as Twitter hash tags. The trends displayed were reflective of what Twitter’s log suggest is hot. Here’s an example:

image

How many of these trends do you recognize. I knew about iOS 8.3, Apple Watch, and not much else.

Queries for tweets remain a bit problematic for me.

Stephen E Arnold, April 9, 2015

The Cost of a Click Through Bing Ads

April 9, 2015

Wow. As an outsider to the world of marketing, I find these figures rather astounding. MarketingProfs shares an infographic titled, “The 20 Most Expensive Bing Ads Keywords.” The data comes from a recent analysis by WordStream of 10 million English keywords, grouped into categories. Writer Vahe Habeshian tells us:

“WordStream analyzed some 10 million English keywords and grouped the them into categories to determine the most expensive types of keywords (see infographic, below).

“(Also see a similar analysis of the most expensive keywords in Google AdWords advertising from 2011.)

“The most expensive keyword on Bing Ads is ‘lawyer,’ which would cost advertisers seeking the top ad spot a whopping $109.21 per click. Not surprisingly, the top 5 keywords are related to the legal world, indicating how lucrative clients can be.”

Yes, almost $110 per click whether legitimate, a human error, or a robot script. That’s a lot of fruitless clicks. It seems irrational, but it must be working if companies keep spending the dough. Right?

The word in second place, “attorney,” comes to $101.77 per click, and “DUI” is a comparative bargain at $68.56. After the top five, law-related words, there are such valuable terms as “annuity,” “rehab,”  and “exterminator.” See the infographic for more examples.

Cynthia Murrell, April 09, 2015

Stephen E Arnold, Publisher of CyberOSINT at www.xenky.com

« Previous PageNext Page »

  • Archives

  • Recent Posts

  • Meta