Xooglers Craft a New Search Engine
September 16, 2014
If you are interested in searching for off color content, you will be thrilled to learn that Boodigo awaits your input. Gizmodo reports that the system surfaces off color content on Tumblr. Now a Yahoo property, the Xoogler running that show wanted to suppress off color content. Well, now another Xoogler has made it easy to surface Tumblr’s seamier and steamier content. We have not run queries on Boodigo. We will leave that to your discretion.
Stephen E Arnold, September 16, 2014
Alternatives to Windows Search
September 16, 2014
For some common searches, Windows’ built-in desktop search function works just fine. Other times, though, our hard-drive hunts call for something more. Reporter Martin Brinkmann at ghacks.net shares his list of “The Best Free Desktop Search Programs for Windows.” He writes:
Desktop search tools offer faster searches, better options and filters, and a better user experience as a consequence. These tools can be sorted into two main categories: programs that require indexing before they can be used, and programs that work right out of the box without it. Let’s take a look at the requirements for this top list.
Requirements
*A free version of the program needs to be available.
*Search all files and don’t limit results.
*Compatibility with all recent 32-bit and 64-bit editions and versions of the Windows operating system.
*Top list of desktop search programs
The list takes a quick look at each application so that you know what it is about. Below that is a table that you can use to compare the core functionality followed by our recommendations.
Brinkmann describes 11 services and tacks on four more suggested by readers. Curiously, absent is one of our recommendations, Sowsoft Effective File Search. For the rest, see the ghacks article for Brinkmann’s observations, and don’t forget to scroll down for his handy-dandy comparison table.
Cynthia Murrell, September 16, 2014
Sponsored by ArnoldIT.com, developer of Augmentext
Bilingual Search Engine YaSabe Sees Growth through Word of Mouth and Media Partnerships
September 11, 2014
The article on Elevation DC titled Herndon-Based Bilingual Search Engine Expands Reach covers the growth of YaSabe, the Spanish and English search engine helping Spanish-speaking Americans find the information they need. The search engine actually finds data that is English and translates it into Spanish before tagging it. The article states,
“Its categories are geared toward the information Spanish speakers might need: bilingual service providers, jobs for people fluent in more than one language, 18 different types of Latin cuisine. Azim Tejani, the company’s executive vice president, says that 20 percent of YaSabe’s traffic comes directly to the site, 50 percent comes from search engines where users search for terms like “pedicura” instead of “pedicure” and the remaining traffic comes from its partnerships with media companies serving Spanish-speaking Ameri[cans].”
Tejani is also quoted in the article as saying that YaSabe is mobile-centered as opposed to web-centered. According to Tejani, some 30% of YaSabe users rely mainly on their mobile phones to access the internet. He credits the growth of YaSabe both with community guides as well as strengthened relationships with Spanish-language media partners such as Univision and Mundo Hispanico. Univision in particular has seen great success since YaSabe began running the TV network’s search engines in 2013.
Chelsea Kerwin, September 11, 2014
Sponsored by ArnoldIT.com, developer of Augmentext
Bing Says It Will Make Searches Easier?
September 9, 2014
Tech snobs are reevaluating their opinions of Bing. Why? Because Apple has made it the default search engine for its products. Bing, however, is trying to improve its search quality and Search Engine Watch says “Bing Makes Technical Searches Easier.” Bing has streamlined its technical searches, which means if you are searching for technical jargon or more specifically: API and codes, non-alphanumeric characters, software information, and answers about Microsoft products, things are about to get easier.
Bing’s goal is to make technical search be more accurate, similar to recent endeavors to make search respond to natural speech. The article says that developers trying to find code hidden in documents and consumers will benefit the most from the new search.
“Bing also says it has determined the top factors among consumers looking for software include cost, reviews, and safety, as well as official or verified sites from which to download the software and similar products that might be better than the product included in the original search. As a result, Bing says it has developed an experience in which the entity pane provides a quick description of the product, along with clearly displayed information about cost, official and trusted download locations, and reviews.”
Then if you are searching for answers about Microsoft products, instant answers, a new feature, will appear at the top of results. Not a bad idea, considering that troubleshooting a Microsoft machine is harder than using Adobe Photoshop in Chinese.
When it comes to technical searches, usually you have to spend hours surfing through outdated forums that might resolve your problem and searching for the one piece of code. It’s needed and much appreciated. Good for you, Bing!
Whitney Grace, September 09, 2014
Sponsored by ArnoldIT.com, developer of Augmentext
ElasticSearch Rides The Rails
September 8, 2014
If you have been reading this blog for a while, then you are aware that search is an important feature for using any computer with ease. Without search, people would be forced to scan information one piece at a time or rely on indices. For those who remember microfiche, you can understand. Search in applications has been a semi-fleeting endeavor for some developers, but SitePoint has an article, “Full-Text Search In Rails With ElasticSearch” that explains how to integrate ElasticSearch into a Rails application.
“A full-text search engine examines all of the words in every stored document as it tries to match search criteria (text specified by a user) Wikipedia. For example, if you want to find articles that talk about Rails, you might search using the term “rails”. If you don’t have a special indexing technique, it means fully scanning all records to find matches, which will be extremely inefficient. One way to solve this is an “inverted index” that maps the words in the content of all records to its location in the database.”
As applications become more versatile, they will need to be searched. The article provides one way to make your applications searchable, scan the Web with a search engine and learn about other ways to integrate search. Also make sure that it is a decent search code, otherwise it will not be worth the deployment.
Whitney Grace, September 08, 2014
Sponsored by ArnoldIT.com, developer of Augmentext
Another Weekend, More HP Autonomy Mud Slinging
September 6, 2014
I read “HP Alleges Autonomy Email Warning of Falling Revenues and ‘Imaginary Deals’ Shows Fraud.”
HP released an email from Autonomy’s CFO to Autonomy’s president. It would be helpful to have a larger number of emails and some context for a message sent from a mobile phone.
According to the write up:
HP claims the disclosure supports its allegations of fraud against Dr Lynch, who was then chief executive of Autonomy. It has today accused him in a Californian court of lying “to an extraordinary extent” about the performance of his company during the due diligence process that led to its $11.7bn (£7.1bn) acquisition.
If Autonomy were in such bad shape, how did HP miss these signals?
HP is going to keep Autonomy in the center of its content marketing campaign. The charges and counter charges underscore the risks associated with search and content processing software.
Stephen E Arnold, September 6, 2014
An IBM Watson Boot Camp: Gimmee 20!
September 5, 2014
Hopefully a demo will become available. Do you think?
Navigate to “IBM, CUNY Launch Watson Student App Competition.” From a content marketing article, I learned from eWeek:
The contest, known as the CUNY-IBM Watson Case Competition, is an opportunity to learn and develop apps for applying the IBM Watson cognitive technology to improve the operation of organizations and the delivery of services to customers. The IBM Watson technology embodies the future, and this competition enables CUNY students to be part of the new generation involved in the jobs and businesses that will be created.
This is not the first Watson competition. The content marketing article does a round up.
Alas, no links to demos. Just more Watson is wonderful; for example:
Indeed, some possible examples to apply IBM Watson are improving the quality and effectiveness of public undergraduate education and helping to better deliver public services such as public safety, health and transportation. Teams of CUNY students will work through various milestones during the fall 2014 semester, while being mentored by IBM, CUNY faculty and other experts in the field. Teams of three to five students will present their preliminary concepts during Watson “boot camp” Oct. 24 and 25. The finalists will participate in a final round of presentations on Jan. 15, 2015, when cash prizes will be awarded to the top three teams.
Indeed.
Stephen E Arnold, September 5, 2014
Open Text Excellence: Oh, the System Did It
September 5, 2014
This is the outfit that once employed the name surfer Dave Schubmehl. He is the IDC expert who sold information on Amazon without my permission. Once he bailed, I assumed Open Text would improve.
Nope. Wrong.
I received this in the mail today.
OpenText <UKMarketing@opentext.com>
3:04 PM (3 hours ago)
to me
If your email program has trouble displaying this email, view it as a web page:
http://now.eloqua.com/es.asp?s=459&e=364560&elq=e8df3eefea2d4395ac3aa3fd70a82281We would like to give you our sincere apologies
Dear Stephen ,
As an unfortunate consequence of a system problem, we have been made aware that an email titled “OpenText UK Partner Day” has been accidentally sent to a wider audience than expected. You received this in error and we would ask that you ignore the email.
Best regards
OpenText UK Communications Team
Not only do I live in Harrod’s Creek, Kentucky, I have never attended an Open Text event. I do know that Red Dot used the Autonomy search system and that Red Dot performance was—ahem, well, let’s see—processing queries in minutes at one client location, long enough for staff to get a coffee…outside the building.
Also, I know Open Text has to support BASIS, Bray’s SGML Search, BRS Search, and probably some other systems. My, isn’t this too expensive to do well?
Anyway, Open Text apologizes for its spam and erroneous communications. Nice stuff. I like the passive voice. Who wants to assign responsibility for spam? Anyone? Oh, a system problem.
Stephen E Arnold, September 5, 2014
Galaxy Consulting Explains Vivisimo at IBM
September 5, 2014
The Galaxy Consulting Blog shares information on all things information. Recently, they spelled out details on one of IBM’s smarter acquisitions in the profile, “Search Applications – Vivisimo.” In our opinion, that outfit is one of the more solid search providers. The write-up begins with a brief rundown of the company’s history, including its purchase by IBM in 2012. We learn:
“Vivisimo Velocity Platform is now IBM InfoSphere Data Explorer. It stays true to its heritage of providing federated navigation, discovery and search over a broad range of enterprise content. It covers broad range of data sources and types, both inside and outside an organization.
“In addition to the core indexing, discovery, navigation and search engine the software includes a framework for developing information-rich applications that deliver a comprehensive, contextually-relevant view of any topic for business users, data scientists, and a variety of targeted business functions.”
As one should expect, InfoSphere handles many types of data from disparate sources with aplomb, and its support for mobile tech is a feature ahead of the curve. Perhaps most importantly, the platform boasts strong security while maintaining scalability. See the article for a detailed list of InfoSphere’s features.
Before IBM snapped it up in 2012, Vivisimo passed through the hands of Yippy, which had purchased it in 2010. The firm is headquartered in Pittsburgh but maintains other offices on the East Coast and in Europe.
Cynthia Murrell, September 05, 2014
Sponsored by ArnoldIT.com, developer of Augmentext
Questions about Statistical Data
September 4, 2014
Autonomy, Recommind, and dozens of other search and content processing firms rely on statistical procedures. Anyone who has survived Statistics 101 believe in the power of numbers. Textbook examples are—well—pat. The numbers work out even for B and C students.
The real world, on the other hand, is different. What was formulaic in the textbook exercises is more difficult with most data sets. The data are incomplete, inconsistent, generated by systems whose integrity is unknown, and often wrong. Human carelessness, the lack of time, a lack of expertise, and plain vanilla cluelessness makes those nifty data sets squishier than a memory foam pillow.
If you have some questions about statistical evidence in today’s go go world, check out “I Disagree with Alan Turing and Daniel Kahneman Regarding the Strength of Statistical Evidence.”
I noted this passage:
It’s good to have an open mind. When a striking result appears in the dataset, it’s possible that this result does not represent an enduring truth or even a pattern in the general population but rather is just an artifact of a particular small and noisy dataset. One frustration I’ve had in recent discussions regarding controversial research is the seeming unwillingness of researchers to entertain the possibility that their published findings are just noise.
An open mind is important. Just looking at the outputs of zippy systems that do prediction for various entities can be instructive. In the last couple of months, I learned that predictive systems:
- Failed to size the Ebola outbreak by orders of magnitude
- Did not provide reliable outputs for analysts trying to figure out where a crashed airplane was
- Came up short regarding resources available to ISIS.
The Big Data revolution is one of those hoped for events. The idea is that Big Data will allow content processing vendors to sell big buck solutions. Another is that massive flows of unstructured content can only be tapped in a meaningful way with expensive information retrieval solutions.
Dreams, hopes, wishes—yep, all valid for children waiting for the tooth fairy. The real world has slightly more bumps and sharp places.
Stephen E Arnold, September, 2014