Watson Goes Blekko

March 28, 2015

I read “Goodbye Blekko: Search Engine Joins IBM’s Watson Team.” According to the write up, “Blekko’s home page says its team and technology are now part of IBM’s Watson technology.” I would not know this. I do not use the service. I wrestled with the implementation of Blekko on a news service and then wondered if Yandex was serious about the company. Bottom line: Blekko is not one of my go to search systems, and I don’t cover it in my Alternatives to Google lectures for law enforcement and intelligence professionals.

The write up asserts:

Blekko came out of stealth in 2008 with Skrenta promising to create a search engine with “algorithmic editorial differentiation” compared to Google. Its public search engine finally opened in 2010, launching with what the site called “slashtags” — a personalization and filtering tool that gave users control over the sites they saw in Blekko’s search results.

Another search system becomes part of the puzzling Watson service. How many information access systems does IBM require to make Watson the billion dollar revenue generator or at least robust enough to pay the rent for the Union Square offices?

IBM “owns” the Clementine system which arrived with the SPSS purchase. IBM owns Vivisimo, which morphed into a Big Data system in the acquisition news release, iPhrase, and the wonky search functions in DB2. Somewhere along the line, IBM snagged the Illustra system. From its own labs, IBM has Web Fountain. There is the decades old STAIRS system which may still be available as Service Master. And, of course, there is the Lucene system which provides the dray animals for Watson. Whew. That is a wealth of information access technology, and I am not sure it is comprehensive.

My point is that Blekko and its razzle dazzle assertions now have to provide something that delivers a payoff for IBM. On the other hand, maybe IBM Watson executives are buying technology in the hopes that one of the people “aquihired” or the newly bought zeros and ones will generate massive cash flows.

Watson has morphed from a question answering game show winner into all manner of fantastic information processing capabilities. For me, Watson is an example of what happens when a lack of focus blends with money, executive compensation schemes, and a struggling $100 billion outfit.

Lots of smoke. Not much revenue fire. Stakeholders hope it will change. I am looking forward to a semantically enriched recipe for barbeque sauce that includes tamarind and other spices not available in Harrod’s Creek, Kentucky. Yummy. A tasty addition to the quarterly review menu: Blekko with revenue and a piquant profit sauce.

Perhaps IBM next will acquire Pertimm and the Qwant search system which terrrifes Eric Schmidt? Surprises ahead. I prefer profitable, sustainable revenues however.

Stephen E Arnold, March 28, 2015

Semantic Search Becomes Search Engine Optimization: That Is Going to Improve Relevance

March 27, 2015

I read “The Rapid Evolution of Semantic Search.” It must be my age or the fact that it is cold in Harrod’s Creek, Kentucky, this morning. The write up purports to deliver “an overview of the history of semantic search and what this means for marketers moving forward.” I like that moving forward stuff. It reminds me of Project Runway’s “fashion forward.”

The write up includes a wonky graphic that equates via an arrow Big Data and metadata, volume, smart content, petabytes, data analysis, vast, structured, and framework. Big Data is a cloud with five little arrows pointing down. Does this mean Big Data is pouring from the sky like yesterday’s chilling rain?

The history of the Semantic Web begins in 1998. Let’s see that is 17 years ago. The milestone is in the context of the article, the report “Semantic Web road Map.” I learned that Google was less than a month old. I thought that Google was Backrub and the work on what was named Google begin a couple, maybe three years, earlier. Who cares?

The Big Idea is that the Web is an information space. That sounds good.

Well in 2012, something Big happened. According to the write up Google figured out that 20 percent of its searches were “new.” Aren’t those pesky humans annoying. The article reports:

long tail keywords made up approximately 70 percent of all searches. What this told Google was that users were becoming interested in using their search engine as a tool for answering questions and solving problems, not just looking up facts and finding individual websites. Instead of typing “Los Angeles weather,” people started searching “Los Angeles hourly weather for March 1.” While that’s an extremely simplified explanation, the fact is that Google, Bing, Facebook, and other internet leaders have been working on what Colin Jeavons calls “the silent semantic revolution” for years now. Bing launched Satori, a knowledge storehouse that’s capable of understanding complex relationships between people, things, and entities. Facebook built Knowledge Graph, which reveals additional information about things you search, based on Google’s complex semantic algorithm called Hummingbird.

Yep, a new age dawned. The message in the article is that marketers have a great new opportunity to push their message in front of users. In my book, this is one reason why running a query on any of the ad supported Web search engines returns so much irrelevant information. In my just submitted Information Today column, I report how a query for the phrase “concept searching” returned results littered with a vendor’s marketing hoo-hah.

I did not want information about a vendor. I wanted information about a concept. But, alas, Google knows what I want. I don’t know what I want in the brave new world of search. The article ignores the lack of relevance in results, the dust binning of precision and recall, and the bogus information many search queries generate. Try to find current information about Dark Web onion sites and let me know how helpful the search systems are. In fact, name the top TOR search engines. See how far you get with Bing, Google, and Yandex. (DuckDuckGo and Ixquick seem to be aware of TOS content by the way.)

So semantic in the context of this article boils down to four points:

  1. Think like an end user. I suppose one should not try to locate an explanation of “concept searching.” I guess Google knows I care about a company with a quite narrow set of technology focused on SharePoint.
  2. Invest in semantic markup. Okay, that will make sense to the content marketers. What if the system used to generate the content does not support the nifty features of the Semantic Web. OWL, who? RDF what?
  3. Do social. Okay, that’s useful. Facebook and Twitter are the go to systems for marketing products I assume. Who on Facebook cares about cyber OSINT or GE’s cratering petrochemical business?
  4. And the keeper, “Don’t forget about standard techniques.” This means search engine optimization. That SEO stuff is designed to make relevance irrelevant. Great idea.

Net net: The write up underscores some of the issues associated with generating buzz for a small business like the ones INC Magazine tries to serve. With write ups like this one about Semantic Search, INC may be confusing their core constituency. Can confused executives close deals and make sense of INC articles? I assume so. I know I cannot.

Stephen E Arnold, March 27, 2015

Need to Remove SharePoint Results?

March 26, 2015

I read “SharePoint 2013 Items Removed with Search Result Removal Return from the Dead!” The article explains how to remove results from a user’s search results. If a user cannot locate specific information, that is a benefit, right? The write up includes links to two Microsoft documents that provide more detail. Are your search results comprehensive? Heh, heh, heh.

Stephen E Arnold, March 26, 2015

FTC and Google: Never Complain, Never Explain Usually

March 26, 2015

I read “FTC Addresses Its Choice Not to Sue Google.” The write up reports that the FTC is explaining its decision not to chase Google around the conference table. Heck, would that tire out the Googlers, making it tough to stay awake in a White House meeting?

According to the write up:

“All five Commissioners (three Democrats and two Republicans) agreed that there was no legal basis for action with respect to the main focus of the investigation — search,” the statement released on Wednesday read. “The Commission’s decision on the search allegations was in accord with the recommendations of the F.T.C.’s Bureau of Competition, Bureau of Economics, and Office of General Counsel.”

I think this means, “No problemo.”

I also found this statement about the FTC’s expertise in information governance interesting:

In the final paragraph of the commissioners’ statement, the agency once more expressed regret at the inadvertent release of its internal document. “We are taking additional steps to ensure that such a disclosure does not occur in the future,” it said.

That’s good. The future. Many search vendors point out that the functions their marketers say are available today really mean in the “future.” Is this a characteristic of our digital era.

Stephen E Arnold, March 26, 2015

Big Data and Their Interesting Processes

March 25, 2015

I love it when mid tier consultants wax enthusiastically about Big Data. Search your data lake, enjoins one clueless marketer. Big Data is the future, sings a self appointed expert. Yikes.

To get a glimpse of exactly what has to be done to process certain types of Big Data in an economical yet timely manner, I suggest you read “Analytics on the Cheap.” The author is 0X74696D. Get it?

The write up explains the procedures required to crunch data and manage the budget. The work flow process I found interesting is:

  • Incoming message passes through our CDN to pick up geolocation headers
  • Message has its session authenticated (this happens at our routing layer in Nginx/OpenResty)
  • Message is routed to an ingest server
  • Ingest server transforms message and headers into a single character-delimited querystring value
  • Ingest server makes a HTTP GET to a 0-byte file on S3 with that querystring
  • The bucket on S3 has S3 logging turned on.
  • We ingest the S3 logs directly into Redshift on a daily basis.

The write up then provides code snippets and some business commentary. The author also identifies the upside of the approach used.

Why is this important? It is easy to talk about Big Data. Looking at what is required to make use of Big Data reveals the complexity of the task.

Keep this hype versus real world split in mind the next time you listen to a search vendor yak about Big Data.

Stephen E Arnold, March 25, 2015

Relaxing a Query: PostgreSQL Style

March 22, 2015

If you are a user of PostgreSQL and want to implement fuzzy, relaxed, or “show ‘em something sort of close to the user’s query,” you will want to read “Super Fuzzy Searching on PostgreSQL.” Fuzzy search makes it possible to show a user who is not quite sure how terms appear in an index. Fuzzy is not exactly like “close” in horseshoes. More algorithmic magic is at play in information retrieval systems.

The article explains PostgreSQL fuzzy capabilities and launches into the notion of trigrams. Keep in mind that Manning & Napier (creators of DR LINK) possess some n-gram patents. The old Brainware which may have once been SER) also possesses some n-gram type patents. I recall hearing years ago that Brainware developed a trigram search system which worked reasonably well when looking for similar patent claims. Brainware is now part of a printer company, and I have lost track of the search technology. I suppose I could investigate the Brainware/Lexmark status, but I have other tasks beckoning my attention.

The write up explains how to implement trigrams for PostgreSQL. The code examples are useful and the tips for dealing with large datasets are quite helpful. The author does not mention the n-gram related patents. I assume that the author assumes that the patent holders assume no one is infringing. That is a triple assumption set. int ere sti ngt rig ram coi nci den ce_

Stephen E Arnold, March 22, 2015

Adobe: A Document Cloud Looms

March 19, 2015

Adobe is moving from PDF creation to document management. I avoid Adobe Acrobat because it bedeviled me years ago with a PDF dongle. The dongle had a counter. After we created the number of documents authorized by the dongle, the opportunity to purchase another dongle arose. Exciting. That warned me off the outfit.

I brushed against Adobe when I researched the original Enterprise Search Report in 2003. That was a mere 12 years ago, yet the memory is still fresh. I was trying to figure out what vendor provided the search system for Adobe products. After reading publicly accessible information and making fruitless attempts to speak to a person who knew about search at Adobe, I learned by accident the name of the provider.

Do you recognize the name Lextek. I sure did not. I offer a no cost summary of this company and its search system at this link. I was fascinated with Lextek because I had difficulty locating information using the Adobe products which incorporated this system. I had a short list of other search systems Adobe has used over the years to the same result. I invite you to fire up an Adobe product and try to locate the information needed to solve a problem or learn a procedure or figure out what state an Adobe software product is in. Let me know how that works out for you.

I read “Adobe Unveils Cloud Electronic Document Service.” I learned that “Adobe Systems will launch a cloud-based document management service within the month.” That’s soon. The article continued:

The company said the core of the new service is Adobe Acrobat, the world’s most sought-after document management software. The upgraded Adobe Acrobat Document Cloud enables document managers to produce, check and confirm official documents on both personal computers and mobile devices. They also can put an electronic signature to the Portable Document Format (PDF) file to give it a legal force, the company said.

Yikes, another silo of data for an organization to “federate.”

Several questions crossed my mind:

  • What is the search system for the system? (Lextek’s owners operate a confectionary store if I understood the research my team assembled.)
  • What is the programmatic access Adobe will provide to an organization placing its PDF documents in the Adobe Document Cloud?
  • What is the security provided for these customers?

Adobe’s play is an interesting one. I wonder if the company will allow its customers to mark documents “public” and then provide an online access service? Worth watching.

Stephen E Arnold, March 19, 2015

SharePoint Gets Serious with Information Governance

March 19, 2015

SharePoint has enjoyed continued success over the last 15 years, but it has not been without some bumps along the way. Information governance is one of the noted areas in which Share has fallen flat. Read more in the CMS Wire article, “Keeping SharePoint In Check with Information Governance.”

The article begins:

“Historically, SharePoint was thought to cause as many information governance problems as it solved. The 2001 to 2003 versions did not show Microsoft putting much effort into helping customers with information governance. But after the massive take up of SharePoint Portal Server 2007 licenses, and the often negative conversations coming out of the sizable SharePoint user community, Microsoft started to take governance issues seriously.”

In addition to keep an eye on your news feed for the latest SharePoint buzz, staying tuned to experts in the field is a great way to save time and get pointed information pertaining to improving a SharePoint installation. Stephen E. Arnold has one such SharePoint feed on his Web site, ArnoldIT.com. Focusing on tips, tricks, and news, Arnold collocates much of content that users and managers alike will find helpful for navigating day-to-day SharePoint operations.

Emily Rae Aldridge, March 19, 2015

Stephen E Arnold, Publisher of CyberOSINT at www.xenky.com

Duck Duck Jumbawumba?

March 18, 2015

Usually if you want a private search, free of targeted ads you head on over to DuckDuckGo.com. While DuckDuckGo holds its on against bigger search engines, because it is the nice guy of search, no one has really come out to challenge water fowl. The Pittsburgh Post-Gazette has a story about another private-based search engine: “Hampton Entrepreneur Seeks To Launch Privacy-Friendly Search Engine,” but you cannot so much as call it a DuckDuckGo rival as another option.

Michael DeKort launched a $125,000 Kickstarter campaign to fund Jumbawumba, a search engine that uses Google’s prowess while retaining a user’s privacy. It also would create cohesive search results using video, images, news, and Web sites on one page, instead of four.

How does it work?

“Jumbawumba taps Google’s vast reach. To Google’s eyes, though, the queries come from Jumbawumba, not from the originating computer, Mr. DeKort said. And while Google, Bing and Yahoo! keep records of each computer’s searches, and use them to tailor advertising, Jumbawumba pledges not to store any data on one-time searches. (It would keep records of ongoing search queries, but wouldn’t sell them to marketing firms, Mr. DeKort said.) Jumbawumba’s computer server will ultimately be overseas, limiting government access, though the company would respect law enforcement subpoenas.”

While private search engines like Jumbawumba will probably never be able to compete with Google, it is good to know that Michael DeKort are fighting to protect online privacy. The more the merrier for private search!

Whitney Grace, March 18, 2015

Stephen E Arnold, Publisher of CyberOSINT at www.xenky.com

Qwant Develops Qwant Junior, the Search Engine for Children

March 17, 2015

The article on Telecompaper titled Qwant Tests Child-Friendly Search Engine discusses the French companies work. Qwant is focused on targeting 3 to 13 year olds with Qwant Junior, in partnership with the Education Ministry. Twenty percent of the company is owned by digital publishing powerhouse Axel Springer. The child-friendly search engine will attempt to limit the access to inappropriate content while encouraging children to use the search engine to learn. The article explains,

“The new version blocks or lists very far down in search results websites that show violence and pornography, as well as e-commerce sites. The version features an education tab separately from the general web search that offers simplified access to educational programme, said co-founder Eric Leandri. Qwant Junior’s video tab offers child-appropriate videos from YouTube, Dailymotion and Vimeo. After tests with the ministry, the search engine will be tested by several hundred schools.”

Teaching youngsters the ways of the search engine is important in our present age. The concept of listing pornography “very far down” on the list of results might unsettle some parents of young teens smart enough to just keep scrolling, but it is France! Perhaps the expectation of blocking all unsavory material is simply untenable. Qwant is planning on a major launch by September, and is in talks with Brazil for a similar program.

Chelsea Kerwin, March 17, 2014

Stephen E Arnold, Publisher of CyberOSINT at www.xenky.com

« Previous PageNext Page »

  • Archives

  • Recent Posts

  • Meta