Search without Words: The ViSenze API

January 5, 2016

I read “GuangDa Li, Co-Founder and CTO ViSenze on Enabling Search without Key Words.” The article, I wish to point out, is written in words. To locate the article, one will have to use words to search for information about Dr. Li. Dragging his image to Google Images will not do the trick. The idea for search without words continues to attract attention. Ecommerce and law enforcement are keen to find alternatives to word centric queries. Searching for a text message with a particular emoji is not easy using words and phrases.

According to the write up:

In February 2013, GuangDa Li along with Oliver Tan, an industry veteran started ViSenze, a spin-off company from NExT, a research centre jointly established between National University of Singapore (NUS) and Tsinghua University of China. ViSenze has developed a technology that enables search without keywords. Users simply need to click a photo and ViSenze brings you the relevant search results based on that image.

The write up contains several points which I found interesting.

First, Mr. Li said:

Because of my background in internet media processing, I anticipated the change in the industry about 4 years ago – there was a sharp rise in the amount of multimedia content on the internet. The management, search and discovery of media content has become more and more demanding.

Image search is a challenge. Once promising systems to query video like Exalead’s system have dropped from public view. Video search on most services is frustrating.

Second, the business model for ViSenze is API focused. Mr. Li said:

ViSearch Search API is our flagship product and it also serves as the fundamentals for our other vertical applications. The key advantage of ViSearch API is that it is a perfect combination of latency, scalability and accuracy.

The third passage of interest to me was:

We used to be in stealth mode for a while. Only after our API was launched on the Rakuten Taiwan Ichiba website, did we start to talk with investors. It just happened.

I interpreted this to suggest that Rakuten recognizes that traditional eCommerce search systems like Amazon are vulnerable to a different information access approach.

Should Amazon worry about Rakuten or regulators? Amazon does not worry about much it seems. Its core search and cloud based search systems are, in my view, old school and frustrating for some users. Maybe ViSenze will offer a way to deliver a more effective solution for Rakuten. Competition might motive Amazon to do a better job with its own search and retrieval systems.

Stephen E Arnold, January 5, 2016

Search Engine for Children? Thinga

January 5, 2016

Short honk: Nervous about your child navigating to Yandex.com and entering a harmless query such as Czech auditions? No worries. Point them at Thinga. For information about about child-friendly search engine, get the details from “Thinga Is a Search Engine Designed for Kids.” Here’s the passage I highlighted:

Thinga has built its own content library that have been hand picked by Heinley’s team or pulled from websites that have been white listed and are kid-friendly, so basically if it isn’t inside their database, then kids won’t be able to search for it. The downside is that we suppose at the start, search results might be a little bit limited but we expect that over time it will grow.

This sounds a bit like The Point (Top 5% of the Internet) which was available in 1993 and then acquired by Lycos. It is useful to know that good ideas come and go. A smile to the Point team and Chris Kitze too. Thinga uses a different business model from our ad driven system. What is that angle? Ecommerce and maybe a printed magazine.

Stephen E Arnold, January 5, 2016

How Big Data Is Missing the Mark

January 5, 2016

At this point in the Big Data sensation, many businesses are swimming in data without the means to leverage it effectively. TechWeek Europe cites a recent survey from storage provider Pure Storage in its write-up, “Big Data ‘Fails Businesses’ Due to Access, Skills Shortage.” Interestingly, most of the problems seem to have more to do with human procedures and short-sightedness than any technical shortcomings. Writer Tom Jowitt lists the three top obstacles as a lack of skilled workers, limited access to information, and bureaucracy. He tells us:

“So what exactly is going wrong with Big Data to be causing such problems? Well over half (56 percent) of respondents said bureaucratic red tape was the most serious obstacle for business productivity. ‘Bureaucratic red tape around access to information is preventing companies from using their data to find those unique pieces of insight that lead to great ideas,’ said [Pure Storage’s James] Petter. ‘Data ownership is no longer just the remit of the CIO, the democratisation of insight across businesses enables them to disrupt the competition.’ But regulations are also causing worry, with one in ten of the companies citing data protection concerns as holding up their dissemination of information and data throughout their business. The upcoming EU General Data Protection Regulation will soon affect every single company that stores data.”

The survey reports that missed opportunities have cost businesses billions of pounds per year, and almost three-quarters of respondents say their organizations collect data that is just collecting dust. Both cost and time are reasons that information remains unprocessed. On the other hand, Jowitt points to another survey by CA Technologies; most of its respondents expect the situation to improve, and for their data collections to bring profits down the road. Let us hope they are correct.

 

Cynthia Murrell, January 5, 2016

Sponsored by ArnoldIT.com, publisher of the CyberOSINT monograph

Fasten Your Seat Belts: Search Driven Analytics

January 4, 2016

Editor’s Note: ThoughtSpot has no relationship with EMC.

The buzzword meisters are salivating. A term kicked around by folks like Lucidworks (really?) and Radiology Software has been snapped up by EMC. Yep, I know. EMC is not a search vendor, and I was surprised to learn that it was in the analytics business. Hey, that’s what happens when one lives in rural Kentucky.

According to EMC, the “new” concept is the spark behind ThoughtSpot. I learned from “Introducing ThoughtSpot 3: The World’s First Product to Harness Collective Intelligence for Search Driven Analytics”:

ThoughtSpot 3 combines the ease of search with the intelligence of machine learning to deliver a powerful analytic solution that anyone can use to quickly get the right answers out of their data.

Slam dunk. Stock up on EMC shares which are trading in value territory. The company has reported flat revenues and profit margins, but search driven analytics, now in Version 3, is something that makes mid tier consulting firms quiver.

image

Aberdeen allegedly said:

“As the desire for data-driven decisions grows across the business world, there is a greater appetite for people capable of creating data insights,” said Aberdeen Vice President and Principal Analyst Michael Lock. “For companies looking to create insights faster and more easily, early findings from Aberdeen’s latest survey indicate that Best-in-Class organizations are adopting language-driven analytics, for example search-driven analytics and code-free discovery, at a greater rate than lesser performers.”

That’s sufficient for me. Now we just need to watch the revenues of EMC and other vendors almost certain to embrace a buzzword with some rubber left on the 15 inch recap.

Stephen E Arnold, January 4, 2015

Are Search Unicorns Sub Prime Unicorns?

January 4, 2016

The question is a baffler. Navigate to “Sorting Truth from Myth at Technology Unicorns.” If the link is bad or you have to pay to read the article in the Financial Times, pony up, go to the library, or buy hard copy. Don’t complain to me, gentle reader. Publishers are in need of revenue. Now the write up:

The assumption is that a unicorn exists. What exists are firms with massive amounts of venture funding and billion dollar valuations. I know the money is or was real, but the “sub prime unicorn” is a confection from a money thought leader Michael Moritz. A subprime unicorn is a co9mpany “built on the flimsiest of edifices.” Does this mean fairy dust or something more substantial?

According to the write up:

High quality global journalism requires investment. Please share this article with others using the link below, do not cut & paste the article. But the way in which private market valuations have become skewed and inflated as start-ups have delayed IPOs raises questions about the financing of innovation. Despite the excitement, venture capital has produced weak returns in recent decades — only a minority of funds have produced rewards high enough to compensate investors for illiquidity and opacity.

Why would funding start ups perform better than a start up financed by mom, dad, and one’s slightly addled, but friendly, great aunt?

The article then makes a reasonably sane point:

With the rise in US interest rates, the era of ultra-cheap financing is ending. As it does, Silicon Valley’s unicorns are losing their mystique and having to work to raise equity, sometimes at valuations below those they achieved before. The promise of private financing is being tested, and there will be disappointments. It does not pay to be dazzled by mythical beasts.

Let’s think a moment about search and content processing. The mid tier consulting firms—the outfits I call azure chip outfits—have generated some pretty crazy estimates about the market size for search and content processing solutions.

The reality is at odds with these speculative, marketing fueled prognostications. Yep, I would include the wizards at IDC who wanted $3,500 to sell an eight page document with my name on it without my permission. Refresh yourself on the IDC Schubmehl maneuver at this link.

Based on my research, two enterprise search outfits broke $150 million in revenues prior to 2011: Endeca tallied an estimated $150 million in revenues and Autonomy reported $700 million in revenues. Both outfits were sold.

Since 2012 exactly zero enterprise search firms have generated more than $700 million in revenues. Now the wild and crazy funding of search vendors has continued apace since 2012. There are a number of search and retrieval companies and some next generation content processing outfits which have ingested tens of millions of dollars.

How many of these outfits have gone public in the zero cost money environment? Based on my records, zero. Why haven’t Attivio, BA Insight, Coveo, Palantir and others cashed in on their technology, surging revenues, and market demand?

There are three reasons:

  1. The revenues are simply acceptable, not stunning. In the post Fast Search & Transfer era, twiddling the finances carries considerable risks. Think about a guilty decision for a search wizard. Yep, bad.
  2. The technology is a rehash gilded with new jargon. Take a look at the search and content processing systems, and you find the same methods and functions that have been known and in use for more than 30 years. The flashy interfaces are new, but the plumbing still delivers precision and recall which has hit a glass ceiling at 80 to 90 percent accuracy for the top performing systems. Looking for a recipe with good enough relevance is acceptable. Looking for a bad actor with a significant margin for error is not so good.
  3. The smart software performs certain functions at a level comparable to the performance of a subject matter index when certain criteria are met. The notion of human editors riding herd on entity and synonym dictionaries is not one that makes customers weep with joy. Smart software helps with some functions, but today’s systems remain anchored in human operators, and the work these folks have to perform to keep the systems in tip top share is expensive. Think about this human aspect in terms of how Palantir explains architects’ changes to type operators or the role of content intake specialists using the revisioning and similar field operations.

Why do I make this point in the context of unicorns? Search has one or two unicorns. I would suggest Palantir is a unicorn. When I think of Palantir, I consider this item:

To summarize, only a small number of companies reach the IPO stage.

Also, the HP Autonomy “deal” is a quasi unicorn. IBM’s investment in Watson is a potential unicorn if and when IBM releases financial data about his TV show champion.

Then there are a number of search and content processing creatures which could be hybrids of a horse and a donkey. The investors are breeders who hope that the offspring become champions. Long shots all.

The Financial Times’s article expresses a broad concept. The activities of the search and content processing vendors in the next 12 to 18 months will provide useful data about the genetic make up of some technology lab creations.

Stephen E Arnold, January 4, 2015

Information Technology Units to Do the Change Thing Now. Right Now!

January 1, 2016

I read “Report: IT Departments to Take on Digital Transformation in 2016.” Yep, wake up one day and say, “I will transform myself.” That works really well. How many New Year’s Resolutions have you created? None. How many have you carried out over a 12 month period? None.

The “report” is interesting because it suggests that organizations’ information technology units will undergo a digital transformation. In my experience, organizations’ IT departments are like a slow moving train. Slow moving trains can be tough to stop.

I highlighted this passage in plum crazy purple:

The report [from Pierre Audoin Consultants] claims digitalization will focus on two connected trends: customer experience and the Internet of Things (IoT). It also predicts that organizations will look to mature technologies like big data/ analytics, social media, mobility and cloud computing to create new products and services. However, new business models, processes and value chains after pioneers like Amazon, eBay, Booking.com, Uber and Spotify will continue to put existing businesses – and their IT departments – under greater pressure to rethink their business models.

To illustrate the firm’s prognosticative capabilities, there are 10 trends for 2016. Here you go:

  • Digitization
  • Cloud computing
  • Two speed IT
  • Industry 4.0/Internet of Things
  • Big data / analytics
  • Sourcing / skill management / offshore
  • Standardization / automation / optimization
  • Agile development / Dev Ops
  • Vendor management
  • Security.

Some of these trends are puzzlers; for example, two speed IT and Industry 4.0. Others strike me as “been there, done that” jargon; for example, standardization / optimization and vendor management.

I am not sure if IT outfits are going to wake up to a new world of change on January 1, 2016. In fact, my hunch is that change is likely to be like black ink spreading across the pocket of a white shirt. One notices and then reacts. Opportunism, knee jerk decision, cost controls, and the necessary adaptation to dwindling revenues characterize many outfits with which I am familiar.

Search and content processing vendors, for instance, have not change much in the last 30 or 40 years. The new terminology does not equate to technological innovation in many cases.

Stephen E Arnold, January 1, 2016

Data Managers as Data Librarians

December 31, 2015

The tools of a librarian may be the key to better data governance, according to an article at InFocus titled, “What Librarians Can Teach Us About Managing Big Data.” Writer Joseph Dossantos begins by outlining the plight data managers often find themselves in: executives can talk a big game about big data, but want to foist all the responsibility onto their overworked and outdated IT departments. The article asserts, though, that today’s emphasis on data analysis will force a shift in perspective and approach—data organization will come to resemble the Dewey Decimal System. Dossantos writes:

“Traditional Data Warehouses do not work unless there a common vocabulary and understanding of a problem, but consider how things work in academia.  Every day, tenured professors  and students pore over raw material looking for new insights into the past and new ways to explain culture, politics, and philosophy.  Their sources of choice:  archived photographs, primary documents found in a city hall, monastery or excavation site, scrolls from a long-abandoned cave, or voice recordings from the Oval office – in short, anything in any kind of format.  And who can help them find what they are looking for?  A skilled librarian who knows how to effectively search for not only books, but primary source material across the world, who can understand, create, and navigate a catalog to accelerate a researcher’s efforts.”

The article goes on to discuss the influence of the “Wikipedia mindset;” data accuracy and whether it matters; and devising structures to address different researchers’ needs. See the article for details on each of these (especially on meeting different needs.) The write-up concludes with a call for data-governance professionals to think of themselves as “data librarians.” Is this approach the key to more effective data search and analysis?

Cynthia Murrell, December 31, 2015

Sponsored by ArnoldIT.com, publisher of the CyberOSINT monograph

Scientific Research Has Turned into a Safe Space

December 31, 2015

The Internet is a cold, cruel place, especially if you hang out in the comments section on YouTube, eBay forums, social media, and 4chan.  If you practice restraint and limit your social media circles to trusted individuals, you can surf the Internet without encountering trolls and haters.  Some people do not practice common sense, so they encounter many hateful situations on the Internet and as a result they demand “safe spaces.”  Safe spaces are where people do not encounter anything negative.

Safe spaces are stupid.  Period.  What is disappointing is that the “safe space” and “only positive things” has made its way into the scientific community according to Nature in the article, “‘Novel, Amazing, Innovative’: Positive Words On The Rise In Science Papers.”

The University Medical Center in the Netherlands studied the use of positive and negative words in the titles of scientific papers and abstracts from 1974-2014 published on the medical database PubMed.  The researchers discovered that positive words in titles grew from 2% in 1974 to 17.5% in 2014.  Negative word usage increased from 1.3% to 2.4%, while neutral words did not see any change.  The trend only applies to research papers, as the same test was run using published books and it showed little change.

“The most obvious interpretation of the results is that they reflect an increase in hype and exaggeration, rather than a real improvement in the incidence or quality of discoveries… The findings “fit our own observations that in order to get published, you need to emphasize what is special and unique about your study,” he says. Researchers may be tempted to make their findings stand out from thousands of others — a tendency that might also explain the more modest rise in usage of negative words.”

While there is some doubt associated with the findings, because it was only applied to PubMed.  The original research team thinks that it points to much larger problem, because not all research can be “innovative” or “novel.”  The positive word over usage is polluting the social, psychological, and biomedical sciences.

Under the table, this really points to how scientists and researchers are fighting for tenure.  What would this mean for search engine optimization if all searches and descriptions had to have a smile?  Will they even invent a safe space filter?

Whitney Grace, December 31, 2015

Sponsored by ArnoldIT.com, publisher of the CyberOSINT monograph

 

 

 

SEO Tips Based on Recent Google Search Quality Guidelines

December 30, 2015

Google has recently given search-engine optimization pros a lot to consider, we learn from “Top 5 Takeaways from Google’s Search Quality Guidelines and What They Mean for SEO” at Merkle’s RKG Blog. Writer Melody Pettula presents five recommendations based on Google’s guidelines. She writes:

“A few weeks ago, Google released their newest Search Quality Evaluator Guidelines, which teach Google’s search quality raters how to determine whether or not a search result is high quality.  This is the first time Google has released the guidelines in their entirety, though versions of the guidelines have been leaked in the past and an abridged version was released by Google in 2013. Why is this necessary? ‘Quality’ is no longer simply a function of text on a page; it differs by device, location, search query, and everything we know about the user. By understanding how Google sees quality we can improve websites and organic performance. Here’s a countdown of our top 5 takeaways from Google’s newest guidelines and how they can improve your SEO strategy.”

We recommend any readers interested in SEO check out the whole article, but here are the five considerations Pettula lists, from least to most important: consider user intent; supply supplementary content; guard your reputation well; consider how location affects user searches; and, finally, “mobile is the future.” On that final point, the article notes that Google is now almost entirely focused on making things work for mobile devices. SEO pros would do well to keep that new reality in mind.

Cynthia Murrell, December 30, 2015

Sponsored by ArnoldIT.com, publisher of the CyberOSINT monograph

 

Overhyped Science Stuff

December 30, 2015

After Christmas, comes New Year’s Eve and news outlets take the time to reflect on the changes in the past year.  Usually they focus on celebrities who died, headlining news stories, technology advancements, and new scientific discoveries.  One of the geeky news outlets on the Internet is Gizmodo  and they took their shot at highlighting things that happened in 2015, but rather than focusing on new advances they check off “The Most Overhyped Scientific Discoveries In 2015.”

There was extreme hype about an alien megastructure in outer space that Neil deGrasse Tyson had to address and tell folks they were overreacting.  Bacon and other processed meats were labeled as carcinogens and caused cancer!  The media, of course, took the bacon link and ran with it causing extreme panic, but in the long run everything causes cancer from cellphones to sugar.

Global warming is a hot topic that always draws arguments and it appears to be getting worse the more humans release carbon dioxide into the atmosphere.  Humans are always ready for a quick solution and a little ice age would rescue Earth.  It would be brought on by diminishing solar activity, but it turns out carbon dioxide pollution does more damage than solar viability can fix.  Another story involved the nearly indestructible tardigrades and the possibility of horizontal gene transfer, but a dispute between two rival labs about research on tardigrades ruined further research to understanding the unique creature.

The biggest overblown scientific discovery, in our opinion, is NASA’s warp drive.  Humans are desperate for breakthroughs in space travel, so we can blast off to Titan’s beaches for a day and then come home within our normal Earth time.  NASA experimented with an EM Drive:

“Apparently, the engineers working on the EM Drive decided to address some of the skeptic’s concerns head-on this year, by re-running their experiments in a closed vacuum to ensure the thrust they were measuring wasn’t caused by environmental noise. And it so happens, new EM Drive tests in noise-free conditions failed to falsify the original results. That is, the researchers had apparently produced a minuscule amount of thrust without any propellant.

Once again, media reports made it sound like NASA was on the brink of unveiling an intergalactic transport system.”

NASA might be working on warp drive prototype, but the science is based on short-term experiments, none of it has been peer reviewed, and NASA has not claimed that the engine even works.

The media takes the idea snippets and transforms them into overblown news pieces that are based more on junk science than real scientific investigation.

 

Whitney Grace, December 30, 2015
Sponsored by ArnoldIT.com, publisher of the CyberOSINT monograph

« Previous PageNext Page »

  • Archives

  • Recent Posts

  • Meta