Thunderstone Rumbles about Webinator
August 13, 2015
There is nothing more frustrating than being unable to locate a specific piece of information on a Web site when you use its search function. Search is supposed to be quick, accurate, and efficient. Even if Google search is employed as a Web site’s search feature, it does not always yield the best results. Thunderstone is a company that specializes in proprietary software application developed specifically for information management, search, retrieval, and filtering.
Thunderstone has a client list that includes, but not limited to, government agencies, Internet developer, corporations, and online service providers. The company’s goal is to deliver “product-oriented R&D within the area of advanced information management and retrieval,” which translates to them wanting to help their clients found information very, very fast and as accurately as possible. It is the premise of most information management companies. On the company blog it was announced that, “Thunderstone Releases Webinator Web Index And Retrieval System Version 13.” Webinator makes it easier to integrate high quality search into a Web site and it has several new appealing features:
- “Query Autocomplete, guides your users to the search they want
- HTML Highlighting, lets users see the results in the original HTML for better contextual information
- Expanded XML/SOAP API allows integration of administrative interface”
We like the HTML highlighting that offers users the ability to backtrack and see a page’s original information source. It is very similar to old-fashioned research: go back to the original source to check a fact’s veracity.
Whitney Grace, August 13, 2015
Sponsored by ArnoldIT.com, publisher of the CyberOSINT monograph
Browser Wars: Sparks Fly During Firefox Microsoft Snipe Hunt
August 12, 2015
i read “Firefox Sticks It to Microsoft, Redirects Cortana Searches in Windows 10.” Ah, ha. Windows 10. Some day, maybe. I assume the endless reboots upon updating are merely a rumor. The real story is that a Firefox user can use Cortana to search something other than Bing.
The write up says:
After blasting Microsoft’s attempts to set Edge as the default browser in Windows 10, Mozilla is enjoying some sweet revenge by steering Firefox users away from Bing. With the newly-released Firefox 40, users no longer have to use Bing for web searches from Cortana on the Windows 10 taskbar. Instead, Firefox will show results from whatever search engine the user has chosen as the default. Using Firefox isn’t the only way to replace Cortana’s Bing searches with Google or another search engine. But Firefox is currently the only browser that does so without the need for third-party extensions. (It wouldn’t be surprising, however, if Google follows suit.)
Poor Microsoft. The company has been trying to build bridges. The result is a dust up.
I long for the good old days. Microsoft would have been careful to avoid getting stuck in a squabble about its Siri and Google Voice killer Cortana.
What impact will this have? My hunch is that as Windows 10 flows into the hands of those who fondly recall Bob, then the issue will become more serious.
For now, this is amusing to me. Recall I am a person who abando0ned my Lumia Windows phone because the silly Cortana feature was in a location which made activation impossible to avoid. I dumped the phone. End of story.
Stephen E Arnold, August 12, 2015
Google Seeks SEO Pro
August 12, 2015
Well, isn’t this interesting. Search Engine Land tells us that “Google Is Hiring an SEO Manager to Improve its Rankings in Google.” The Goog’s system is so objective, even Google needs a search engine optimization expert! That must be news to certain parties in the European Union.
Reporter Barry Schwartz spotted the relevant job posting at the company’s Careers page. Responsibilities are as one might expect: develop and maintain websites; maintain and develop code that will engage search engines; keep up with the latest in SEO techniques; and work with the sales and development departments to implement SEO best practices. Coordination with the search-algorithm department is not mentioned.
Google still stands as one of the most sought-after employers, so it is no surprise they require a lot of anyone hoping to fill the position. Schwartz notes, though, that link-building experience is not specified. He shares the list of criteria:
“The qualifications include:
*BA/BS degree in Computer Science, Engineering or equivalent practical experience.
*4 years of experience developing websites and applications with SQL, HTML5, and XML.
*2 years of SEO experience.
*Experience with Google App Engine, Google Custom Search, Webmaster Tools and Google Analytics and experience creating and maintaining project schedules using project management systems.
*Experience working with back-end SEO elements such as .htaccess, robots.txt, metadata and site speed optimization to optimize website performance.
*Experience in quantifying marketing impact and SEO performance and strong understanding of technical SEO (sitemaps, crawl budget, canonicalization, etc.).
*Knowledge of one or more of the following: Java, C/C++, or Python.
*Excellent problem solving and analytical skills with the ability to dig extensively into metrics and analytics.”
Lest anyone doubt the existence of such an ironic opportunity, the post reproduces a screenshot of the advertisement, “just in case the job is pulled.”
Cynthia Murrell, August 12, 2015
Sponsored by ArnoldIT.com, publisher of the CyberOSINT monograph
Yebol: A Goner, Folks
August 11, 2015
I received a couple of messages about Yebol. The brand name referenced a human and semantic search engine which disappeared in the 2009-2010 time period. The system has been associated with Hong Feng Yin. The buzzwords associated with the system were meme theory and optimization, clustering and classification, etc. I am not sure what has triggered references to the system, but my file data shows that this is a system that anticipated Qwant.com. After a PR and marketing push in 2009, the Yebol shout became muted. The comments and links to Xavier Lur’s write up are a joint in time. Even Wikipedia knows this cat’s nine lives have been exhausted.
Stephen E Arnold, August 11, 2015
Advice for Smart SEO Choices
August 11, 2015
We’ve come across a well-penned article about the intersection of language and search engine optimization by The SEO Guy. Self-proclaimed word-aficionado Ben Kemp helps website writers use their words wisely in, “Language, Linguistics, Semantics, & Search.” He begins by discrediting the practice of keyword stuffing, noting that search-ranking algorithms are more sophisticated than some give them credit for. He writes:
“Search engine algorithms assess all the words within the site. These algorithms may be bereft of direct human interpretation but are based on mathematics, knowledge, experience and intelligence. They deliver very accurate relevance analysis. In the context of using related words or variations within your website, it is one good way of reinforcing the primary keyword phrase you wish to rank for, without over-use of exact-match keywords and phrases. By using synonyms, and a range of relevant nouns, verbs and adjectives, you may eliminate excessive repetition and more accurately describe your topic or theme and at the same time, increase the range of word associations your website will rank for.”
Kemp goes on to lament the dumbing down of English-language education around the world, blaming the trend for a dearth of deft wordsmiths online. Besides recommending that his readers open a thesaurus now and then, he also advises them to make sure they spell words correctly, not because algorithms can’t figure out what they meant to say (they can), but because misspelled words look unprofessional. He even supplies a handy list of the most often misspelled words.
The development of more and more refined search algorithms, it seems, presents the opportunity for websites to craft better copy. See the article for more of Kemp’s language, and SEO, guidance.
Cynthia Murrell, August 11, 2015
Sponsored by ArnoldIT.com, publisher of the CyberOSINT monograph
Flawed Search As a Tactic
August 10, 2015
I read “Why Facebook’s Video theft Problem Can’t Last.” My initial reaction was, “Sure it can.” The main point of the write up struck me as:
But then popular YouTuber Hank Green leveled a number of allegations at Facebook’s video team, including a charge of rampant copyright infringement from Facebook users who are uploading videos from YouTube and other platforms without creators’ consent. Facebook has responded that it has measures in place to address copyright infringement, including allowing users to report stolen content and suspending accounts guilty of repeated violations.
I noted this statement:
For Facebook, video represents an irresistible new business opportunity. Early experiments with running natively inside the News Feed showed that it kept users on the site longer — and kept them from clicking external links that took them to YouTube and elsewhere.
Money and irresistible are words which flow.
The gem appears deep in the write up:
Facebook hasn’t made it easy for creators like Green to find instances of copyright infringement — there’s no way to filter Facebook searches for videos. And even if the stolen videos can be found, creators must fill out multiple forms, meaning it could be several days (and countless views) before a stolen video is taken down.
I find it interesting that search and retrieval may not do the trick. Then the bureaucratic process adds a deft touch.
I will file this item in my follow up folder. I know I can search my system for text files. Search which does not allow one to find information may be a tactic which serves other purposes. Is flawed search a business advantage? If one cannot find something, does that mean the “something” is not there?
Stephen E Arnold, August 10, 2015
The Girl with the Advert Tattoo
August 10, 2015
It looks like real publishing companies are now into tattoos or, at least, into leveraging ink’s growing popularity. The Verge reports, “The Desperate Book Industry and ‘Tatvertising’ are a Perfect, Tragic Match.” Reporter Kaitlyn Tiffany tells us that Hachette Austrailia put out the call for a model willing to be tattooed and photographed as part of a promotion for the next Steig Larsson book, “The Girl in the Spider’s Web.” Tiffany likens the effort to a practice, widely considered predatory, that was common just after the turn of the millennium: websites paying those desperate for cash to have ads tattooed on them, (sometimes on their faces!) But, hey, at least those people were paid good money; apparently the reward for this scheme was meant to be the tattoo itself. The article elaborates:
“But why the [heck] does it need to be a real tattoo? When reached for comment, a representative from Razor & JOY, the advertising agency in charge of the campaign, told me, ‘The character of Lisbeth doesn’t do things in half measures — and so we wanted our marketing to capture this passion.’ The representative also explained that the compensation for the woman who is cast would be something… less than monetary: ‘This campaign is an opportunity to give a truly passionate fan a free tattoo that is unique to a strong literary character.’ And a new type of degrading, unpaid labor in the publishing industry was born.”
I’m not sure I’d personally consider this scheme “predatory,” but apparently Tiffany was not alone in her outrage. I visited the link she supplies in her article, and was greeted with a take-back notice; it reads, in part, “The campaign was conceived with good intentions … but some people have been offended. As this was never our intention, we have listened and we have decided we will not continue with the tattoo element of the campaign.” At least the company was wise enough to make a change in response to criticism. I wonder, though, what they will come up with next.
Cynthia Murrell, August 10, 2015
Sponsored by ArnoldIT.com, publisher of the CyberOSINT monograph
A Call for More Friendly Enterprise Search Results
August 10, 2015
An idea from ClearBox Consulting would bring enterprise search results in line with today’s online searches. The company’s blog asserts, “Enterprise Search? We Need Some Answers on a Card.” Writer Sam Marshall likes the way Google now succinctly presents key information about a user’s query in a “card” at the top of the results page, ahead of the old-school list of relevant links. For example, he writes:
“Imagine you want to know the time of the next train between two cities. When you type this into Google, the first hit isn’t a link to a site but a card like the one below. It not only gives the times but also useful additional information: a map, trip duration, and tabs for walking, driving, and cycling. Enterprise search isn’t like this. The same query on an intranet gives the equivalent of a link to a PDF containing the timetable for the whole region. It’s like saying ‘here’s the book, look it up yourself’. This is not only a poor user experience for the employee, but a direct cost to the employer in wasted time. I’d like to see enterprise search move away from results pages of links to providing pages of answers too, and cards are a powerful way of doing this.”
Marshall emphasizes some advantage of the card approach: the most important information is right there, separated from related but irrelevant data; cards work better on mobile devices; and cards are user-friendly. Besides, he notes, since this format is now popular with sites from Facebook to Twitter, users are becoming familiar with them.
The card concept could be enhanced, Marshall continues, by personalizing results to the individual—tapping into employee profiles or even GPS data. For more information, see the article; it utilizes a hypothetical query about paternity leave to well-illustrate its point. Though enterprise search is not exactly known for living on the cutting edge of technology, developers would be foolish not to incorporate this (or a similar) efficient format.
Cynthia Murrell, August 10, 2015
Sponsored by ArnoldIT.com, publisher of the CyberOSINT monograph
IBM Spends to Make Watson Healthier, Hopefully Quickly
August 7, 2015
I noted the article “IBM Adds Medical Images to Watson, Buying Merge Healthcare for $1 Billion.” The company is in the content management business. Medical images are pretty much of a hassle whether in the good old fashioned film form or in digital versions. The few opportunities I have had to looked at murky gray or odd duck enhanced color images, I marveled at how a professional would make sense of the data displayed. Did this explanation trigger thoughts of IBM FileNet?
The image processing technology available from specialist firms permitting satellite or surveillance image analysis are a piece of cake compared to the medical imaging examples I reviewed. From my point of view the nifty stuff available to an analyst looking at the movement of men and equipment were easier to figure out.
Merge delivers a range of image and content management services to health care outfits. The systems can work with on premises systems and park data in the cloud in a way that keeps the compliance folks happy.
According to the write up:
When IBM set up its Watson health business in April, it began with a couple of smaller medical data acquisitions and industry partnerships with Apple, Johnson & Johnson and Medtronic. Last week, IBM announced a partnership with CVS Health, the large pharmacy chain, to develop data-driven services to help people with chronic ailments like diabetes and heart disease better manage their health.
Now Watson is plopping down a $1 billion to get a more substantive, image centric, and—dare I say it—more traditional business.
The idea I learned:
“We’re bringing Watson and analytics to the largest data set in health care — images,” John Kelly, IBM’s senior vice president of research who oversees the Watson business, said in an interview.
The idea, as I understand the management speak, is that Watson will be able to perform image analysis, thus allowing IBM to convert Watson into a significant revenue generator. IBM does need all the help it can get. The company has just achieved a milestone of sorts; IBM’s revenue has declined for 13 consecutive quarters.
My view is that the integration of the Merge systems with the evolving Watson “solution” will be expensive, slow, and frustrating to those given the job of making image analysis better, faster, and cheaper.
My hunch is that the time and cost required to integrate Watson and Merge will be an issue in six or nine months. Once the “integration” is complete, the costs of adding new features and functions to keep pace with regulations and advances in diagnosis and treatment will create a 21st century version of FileNet. (FileNet, as you, gentle reader, know as the 2006 acquisition. At the time, nine years ago, IBM said that the FileNet technology would
“advance its Information on Demand initiative, IBM’s strategy for pursuing the growing market opportunity around helping clients capture insights from their information so it can be used as a strategic asset. FileNet is a leading provider of business process and content management solutions that help companies simplify critical and everyday decision making processes and give organizations a competitive advantage.”
FileNet was an imaging technology for financial institutions and a search system which allowed a person with access to the system to locate a check or other scanned document.)
And FileNet today? Well, like many IBM acquisitions it is still chugging along, just part of the services oriented architecture at Big Blue. Why, one might ask, was the FileNet technology not applicable to health care? I will leave you to ponder the answer.
I want to be optimistic about the upside of this Merge acquisition for the companies involved and for the health care professionals who will work with the Watsonized system. I assume that IBM will put on a happy face about Watson’s image analysis capabilities. I, however, want to see the system in action and have some hard data, not M&A fluff, about the functionality and accuracy of the merged systems.
At this moment, I think Watson and other senior IBM managers are looking for a way to make a lemon grove from Watson. Nothing makes bankers and deal makers happy than a big, out of the blue acquisition.
Now the job is to find a way to sell enough lemons to pay for the maintenance and improvement of the lemon grove. I assume Watson has an answer to on going costs for maintenance and enhancements, bug finding and stomping, and the PR such activities trigger. Yep, costs and revenue. Boring but important to IBM’s stakeholders.
Stephen E Arnold, August 7, 2015
Quality and Text Processing: An Old Couple Still at the Alter
August 6, 2015
I read “Why Quality Management Needs Text Analytics.” I learned:
To analyze customer quality complaints to find the most common complaints and steer the production or service process accordingly can be a very tedious job. It takes time and resources.
This idea is similar to the one expressed by Ronen Feldman in a presentation he gave in the early 2000s. My notes of the event record that he reviewed the application of ClearForest technology to reports from automobile service professionals which presented customer comments and data about repairs. ClearForest’s system was able to pinpoint that a particular mechanical issue was emerging. The client responded to the signals from the ClearForest system and took remediating action. The point was that sometime in the early 2000s, ClearForest had built and deployed a text analytics system with a quality-centric capability.
I mention this point because many companies are recycling ideas and concepts which are in some cases long beards. ClearForest was acquired by the estimable Thomson Reuters. Some of the technology is available as open source at Calais.
In search and content processing, the case examples, the lingo, and even the technology has entered what I call its “recycling” phase.
I learned about several new search systems this week. I looked at each. One was a portal, another a metasearch system, and a third a privacy centric system with a somewhat modest index. Each was presented as new, revolutionary, and innovative. The reality is that today’s information highways are manufactured from recycled plastic bottles.
Stephen E Arnold, August 6, 2015