Inside Loon Balloons
March 2, 2017
You may have heard about Google X’s Project Loon, which aims to bring Internet access to underserved, rural areas using solar-powered balloons. The post, “Here’s How Google Makes its Giant, Internet-Beaming Balloons,” at Business Insider takes us inside that three-year-old project, describing some of how the balloons are made and used. The article is packed with helpful photos and GIFs. We learn that the team has turned to hot-air-balloon manufacturer Raven Aerostar for their expertise. The write-up tells us:
The balloons fly high in the stratosphere at about 60,000 to 90,000 feet above Earth. That’s two to three times as high as most commercial airplanes. Raven Aerostar creates a special outer shell for the balloons, called the film, that can hold a lot of pressure — allowing the balloons to float in the stratosphere for longer. The film is as thin as a typical sandwich bag. … The film is made of a special formulation of polyethylene that allows it to retain strength when facing extreme temperatures of up to -112 degrees Fahrenheit.
We like the comparison sandwich bag. The balloons are tested in sub-freezing conditions at the McKinley Climatic Lab—see the article for dramatic footage of one of their test subjects bursting. We also learn about the “ballonet,” an internal compartment in each balloon that controls altitude and, thereby, direction. Each balloon is equipped with a GPS tracker, of course, and all electronics are secured in a tiny basket below.
One caveat is a bit disappointing—users cannot expect to stream high-quality videos through the balloons. Described as “comparable to 3G,” the service should be enough for one to visit websites and check email. That is certainly far better than nothing and could give rural small-business owners and remote workers the Internet access they need.
Cynthia Murrell, March 2, 2017
Search Like Star Trek: The Next Frontier
February 28, 2017
I enjoy the “next frontier”-type article about search and retrieval. Consider “The Next Frontier of Internet and Search,” a write up in the estimable “real” journalism site Huffington Post. As I read the article, I heard “Scotty, give me more power.” I thought I heard 20 somethings shouting, “Aye, aye, captain.”
The write up told me, “Search is an ev3ryday part of our lives.” Yeah, maybe in some demographics and geo-political areas. In others, search is associated with finding food and water. But I get the idea. The author, Gianpiero Lotito of FacilityLive is talking about people with computing devices, an interest in information like finding a pizza, and the wherewithal to pay the fees for zip zip connectivity.
And the future? I learned:
he future of search appears to be in the algorithms behind the technology.
I understand algorithms applied to search and content processing. Since humans are expensive beasties, numerical recipes are definitely the go to way to perform many tasks. For indexing, humans fact checking, curating, and indexing textual information. The math does not work the way some expect when algorithms are applied to images and other rich media. Hey, sorry about that false drop in the face recognition program used by Interpol.
I loved this explanation of keyword search:
The difference among the search types is that: the keyword search only picks out the words that it thinks are relevant; the natural language search is closer to how the human brain processes information; the human language search that we practice is the exact matching between questions and answers as it happens in interactions between human beings.
This is as fascinating as the fake information about Boolean being a probabilistic method. What happened to string matching and good old truncation? The truism about people asking questions is intriguing as well. I wonder how many mobile users ask questions like, “Do manifolds apply to information spaces?” or “What is the chemistry allowing multi-layer ion deposition to take place?”
Yeah, right.
The write up drags in the Internet of Things. Talk to one’s Alexa or one’s thermostat via Google Home. That’s sort of natural language; for example, Alexa, play Elvis.
Here’s the paragraph I highlighted in NLP crazy red:
Ultimately, what the future holds is unknown, as the amount of time that we spend online increases, and technology becomes an innate part of our lives. It is expected that the desktop versions of search engines that we have become accustomed to will start to copy their mobile counterparts by embracing new methods and techniques like the human language search approach, thus providing accurate results. Fortunately these shifts are already being witnessed within the business sphere, and we can expect to see them being offered to the rest of society within a number of years, if not sooner.
Okay. No one knows the future. But we do know the past. There is little indication that mobile search will “copy” desktop search. Desktop search is a bit like digging in an archeological pit on Cyprus: Fun, particularly for the students and maybe a professor or two. For the locals, there often is a different perception of the diggers.
There are shifts in “the business sphere.” Those shifts are toward monopolistic, choice limited solutions. Users of these search systems are unaware of content filtering and lack the training to work around the advertising centric systems.
I will just sit here in Harrod’s Creek and let the future arrive courtesy of a company like FacilityLive, an outfit engaged in changing Internet searching so I can find exactly what I need. Yeah, right.
Stephen E Arnold, February 28, 2017
Google and Its Search Soccer Team: Shot Hits the Post
February 28, 2017
I read “Google’s Search Algorithm Is Like a Soccer Team.” Interesting notion but an old one. Years ago Google patented a system and method for deploying communication software agents. Some of these were called “janitors.” The name was cute. The idea was that the “janitors” would clean up some of the mess left when unruly bots left litter in a file structure.
The write up ignores Google’s technical documentation, journal papers, and wild and crazy patent documents. The author has a good sense of how algorithms work and how clever folks can hook them together to create a business process or manufacturing system to further the sale of online advertising.
The discussion of Google’s search algorithm (please, note the singular noun). I thought that Google had a slightly more sophisticated approach to providing search and retrieval in its various forms to its billions of information foragers.
I remember a time in the late 1990s, when co-workers would ask one another which search engine they used. Lycos? AltaVista? Yahoo? Dogpile? Ask Jeeves? The reason there was such a time, and the reason there is no longer such a time, is that Google had not yet introduced its search algorithm. Google’s search algorithm helped Google gain market share on its way to search engine preeminence. Imagine you were searching the internet in the mid 1990s, and your search engine of choice was Ask Jeeves.
Yep, that’s an interesting point: AskJeeves. As I recall, AskJeeves used manually prepared answers to a relatively small body of questions. AskJeeves was interesting but fizzled trying to generate money with online customer service. This is a last ditch tactic that many other search vendors have tried. How is that customer service working for you, gentle reader? Great, I bet.
So how does Google’s algorithm compare to a soccer team? I learned:
The search algorithm looks at a website’s incoming links and how important those pages are. The higher the number of quality page links coming in, the higher the website ranks. Think of a soccer team playing a match. Each player on one team represents a web page. And every pass made to a player on the team represents links from another website. A player’s ranking depends upon the amount of passes (links) they receive. If the player receives many passes from other important players, then the player’s score rises more than if they received passes from less talented players, i.e. those who receive fewer passes by lesser quality players. Every single time there is a pass, the rankings are updated. Google’s search algorithm uses links instead of passes.
Yep, that’s a shot on goal, but it is wide. The conclusion of this amazing soccer game metaphor is that “thus SEO was born.” And the reason? Algorithms.
That shot rolled slow and low only to bounce off the goal post and wobble wide. Time to get another forward, pay for a referee, and keep the advertising off the field. Well, that won’t work for the GOOG will it?
Stephen E Arnold, February 28, 2017
Comprehensive, Intelligent Enterprise Search Is Already Here
February 28, 2017
The article on Sys-Con Media titled Delivering Comprehensive Intelligent Search examines the accomplishments of World Wide Technology (WWT) in building a better search engine for the business organization. The Enterprise Search Project Manager and Manager of Enterprise Content at WWT discovered that the average employee will waste over a full week each year looking for the information they need to do their work. The article details how they approached a solution for enterprise search,
We used the Gartner Magic Quadrants and started talks with all of the Magic Quadrant leaders. Then, through a down-selection process, we eventually landed on HPE… It wound up being that we went with the HPE IDOL tool, which has been one of the leaders in enterprise search, as well as big data analytics, for well over a decade now, because it has very extensible platform, something that you can really scale out and customize and build on top of.
Trying to replicate what Google delivers in an enterprise is a complicated task because of how siloed data is in the typical organization. The new search solution offers vast improvements in presenting employees with the relevant information, and all of the relevant information and prevents major time waste through comprehensive and intelligent search.
Chelsea Kerwin, February 28, 2017
When AI Spreads Propaganda
February 28, 2017
We thought Google was left-leaning, but an article at the Guardian, “How Google’s Search Algorithm Spreads False Information with a Rightwing Bias,” seems to contradict that assessment. The article cites recent research by the Observer, which found neo-Nazi and anti-Semitic views prominently featured in Google search results. The Guardian followed up with its own research and documented more examples of right-leaning misinformation, like climate-change denials, anti-LGBT tirades, and Sandy Hook conspiracy theories. Reporters Olivia Solon and Sam Levin tell us:
The Guardian’s latest findings further suggest that Google’s searches are contributing to the problem. In the past, when a journalist or academic exposes one of these algorithmic hiccups, humans at Google quietly make manual adjustments in a process that’s neither transparent nor accountable.
At the same time, politically motivated third parties including the ‘alt-right’, a far-right movement in the US, use a variety of techniques to trick the algorithm and push propaganda and misinformation higher up Google’s search rankings.
These insidious manipulations – both by Google and by third parties trying to game the system – impact how users of the search engine perceive the world, even influencing the way they vote. This has led some researchers to study Google’s role in the presidential election in the same way that they have scrutinized Facebook.
Robert Epstein from the American Institute for Behavioral Research and Technology has spent four years trying to reverse engineer Google’s search algorithms. He believes, based on systematic research, that Google has the power to rig elections through something he calls the search engine manipulation effect (SEME).
Epstein conducted five experiments in two countries to find that biased rankings in search results can shift the opinions of undecided voters. If Google tweaks its algorithm to show more positive search results for a candidate, the searcher may form a more positive opinion of that candidate.
This does add a whole new, insidious dimension to propaganda. Did Orwell foresee algorithms? Further complicating the matter is the element of filter bubbles, through which many consume only information from homogenous sources, allowing no room for contrary facts. The article delves into how propagandists are gaming the system and describes Google’s response, so interested readers may wish to navigate there for more information.
One particular point gives me chills– Epstein states that research shows the vast majority of readers are not aware that bias exists within search rankings; they have no idea they are being manipulated. Perhaps those of us with some understanding of search algorithms can spread that insight to the rest of the multitude. It seems such education is sorely needed.
Cynthia Murrell, February 28, 2017
Finding Meaning in Snapchat Images, One Billion at a Time
February 27, 2017
The article on InfoQ titled Amazon Introduces Rekognition for Image Analysis explores the managed service aimed at the explosive image market. According to research cited in the article, over 1 billion photos are taken every single day on Snapchat alone, compared to the 80 billion total taken in the year 2000. Rekognition’s deep learning power is focused on identifying meaning in visual content. The article states,
The capabilities that Rekognition provides include Object and Scene detection, Facial Analysis, Face Comparison and Facial Recognition. While Amazon Rekognition is a new public service, it has a proven track record. Jeff Barr, chief evangelist at AWS, explains: Powered by deep learning and built by our Computer Vision team over the course of many years, this fully-managed service already analyzes billions of images daily. It has been trained on thousands of objects and scenes. Rekognition was designed from the get-go to run at scale.
The facial analysis features include markers for image quality, facial landmarks like facial hair and open eyes, and sentiment expressed (smiling = happy.) The face comparison feature includes a similarity score that estimates the likelihood of two pictures being of the same person. Perhaps the most useful feature is object and scene detection, which Amazon believes will help users find specific moments by searching for certain objects. The use cases also span vacation rental markets and travel sites, which can now tag images with key terms for improved classifications.
Chelsea Kerwin, February 27, 2017
Intellisophic / Linkapedia
February 24, 2017
Intellisophic identifies itself as a Linkapedia company. Poking around Linkapedia’s ownership revealed some interesting factoids:
- Linkapedia is funded in part by GITP Ventures and SEMMX (possible a Semper fund)
- The company operates in Hawaii and Pennsylvania
- One of the founders is a monk / Zen master. (Calm is a useful characteristic when trying to spin money from a search machine.)
First, Intellisophic. The company describes itself this way at this link:
Intellisophic is the world’s largest provider of taxonomic content. Unlike other methods for taxonomy development that are limited by the expense of corporate librarians and subject matter experts, Intellisophic content is machine developed, leveraging knowledge from respected reference works. The taxonomies are unbounded by subject coverage and cost significantly less to create. The taxonomy library covers five million topic areas defined by hundreds of millions of terms. Our taxonomy library is constantly growing with the addition of new titles and publishing partners.
In addition, Intellisophic’s technology—Orthogonal Corpus Indexing—can identify concepts in large collections of text. The system can be sued to enrich an existing technology, business intelligence, and search. One angle Intellisophic exploits is its use of reference and educational books. The company is in the “content intelligence” market.
Second, the “parent” of Intellisophic is Linkapedia. This public facing Web site allows a user to run a query and see factoids, links about a topic. Plus, Linkapedia has specialist collections of content bundles; for example, lifestyle, pets, and spirituality. I did some clicking around and found that certain topics were not populated; for instance, Lifestyle, Cars, and Brands. No brand information appeared for me. I stumbled into a lengthy explanation of the privacy policy related to a mathematics discussion group. I backtracked, trying to get access the actual group and failed. I think the idea is an interesting one, but more work is needed. My test query for “enterprise search” presented links to Convera and a number of obscure search related Web sites.
The company is described this way in Crunchbase:
Linkapedia is an interest based advertising platform that enables publishers and advertisers to monetize their traffic, and distribute their content to engaged audiences. As opposed to a plain search engine which delivers what users already know, Linkapedia’s AI algorithms understand the interests of users and helps them discover something new they may like even if they don’t already know to look for it. With Linkapedia content marketers can now add Discovery as a new powerful marketing channel like Search and Social.
Like other search related services, Linkapedia uses smart software. Crunchbase states:
What makes Linkapedia stand out is its AI discovery engine that understands every facet of human knowledge. “There’s always something for you on Linkapedia”. The way the platform works is simple: people discover information by exploring a knowledge directory (map) to find what interests them. Our algorithms show content and native ads precisely tailored to their interests. Linkapedia currently has hundreds of million interest headlines or posts from the worlds most popular sources. The significance of a post is that “someone thought something related to your interest was good enough to be saved or shared at a later time.” The potential of a post is that it is extremely specific to user interests and has been extracted from recognized authorities on millions of topics.
Interesting. Search positioned as indexing, discovery, social, and advertising.
Stephen E Arnold, February 24, 2017
Mobile App Usage on the Rise from 34% of Consumer Time in 2013 to 50% in 2016
February 24, 2017
Bad news, Google. The article titled Smartphone Apps Now Account for Half the Time Americans Spend Online on TechCrunch reveals that mobile applications are still on the rise. Throw in tablet apps and the total almost hits 60%. Google is already working to maintain relevancy with its In Apps feature for Androids, which searches inside apps themselves. The article explains,
This shift towards apps is exactly why Google has been working to integrate the “web of apps” into its search engine, and to make surfacing the information hidden in apps something its Google Search app is capable of handling. Our app usage has grown not only because of the ubiquity of smartphones, but also other factors – like faster speeds provided by 4G LTE networks, and smartphones with larger screens that make sitting at a desktop less of a necessity.
What apps are taking up the most of our time? Just the ones you would expect, such as Facebook, Messenger, YouTube, and Google Maps. But Pokemon Go is the little app that could, edging out Snapchat and Pinterest in the ranking of the top 15 mobile apps. According to a report from Senor Tower, Pokemon Go has gone beyond 180 million daily downloads. The growth of consumer time spent on apps is expected to keep growing, but comScore reassuringly states that desktops will also remain a key part of consumer’s lives for many years to come.
Chelsea Kerwin, February 24, 2017
U.S. Government Keeping Fewer New Secrets
February 24, 2017
We have good news and bad news for fans of government transparency. In their Secrecy News blog, the Federation of American Scientists’ reports, “Number of New Secrets in 2015 Near Historic Low.” Writer Steven Aftergood explains:
The production of new national security secrets dropped precipitously in the last five years and remained at historically low levels last year, according to a new annual report released today by the Information Security Oversight Office.
There were 53,425 new secrets (‘original classification decisions’) created by executive branch agencies in FY 2015. Though this represents a 14% increase from the all-time low achieved in FY 2014, it is still the second lowest number of original classification actions ever reported. Ten years earlier (2005), by contrast, there were more than 258,000 new secrets.
The new data appear to confirm that the national security classification system is undergoing a slow-motion process of transformation, involving continuing incremental reductions in classification activity and gradually increased disclosure. …
Meanwhile, ‘derivative classification activity,’ or the incorporation of existing secrets into new forms or products, dropped by 32%. The number of pages declassified increased by 30% over the year before.
A marked decrease in government secrecy—that’s the good news. On the other hand, the report reveals some troubling findings. For one thing, costs are not going down alongside classifications; in fact, they rose by eight percent last year. Also, response times to mandatory declassification requests (MDRs) are growing, leaving over 14,000 such requests to languish for over a year each. Finally, fewer newly classified documents carry the “declassify in ten years or less” specification, which means fewer items will become declassified automatically down the line.
Such red-tape tangles notwithstanding, the reduction in secret classifications does look like a sign that the government is moving toward more transparency. Can we trust the trajectory?
Cynthia Murrell, February 24, 2017
Tips for Finding Information on Reddit.com
February 23, 2017
I noted “The Right Way to Search Posts on Reddit.” I find it interesting that the Reddit content is not comprehensively indexed by Google. One does stumble across this type of results list in the Google if one knows how to use Google’s less than obvious search syntax. Where’s bad stuff on Reddit? Google will reveal some links of interest to law enforcement professionals. For example:
Bing does a little better with certain Reddit content. To be fair, neither service is doing a bang up job indexing social media content but lists a fraction of the Google index pointers. For example:
So how does one search Reddit.com the “right way.” I noted this paragraph:
As of 2015, Reddit had accumulated over 190 million posts across 850,000 different subreddits (or communities), plus an additional 1.7 billion comments across all of those posts. That’s an incredible amount of content, and all of it can still be accessed on Reddit.
I would point out that the “all” is not accurate. There is a body of content deleted by moderators, including some of Reddit.com’s top dogs, which has been removed from the site.
Reddit offers some search syntax to help the researcher locate what is indexed by Reddit.com’s search system. The write up pointed to these strings:
- title:[text] searches only post titles.
- author:[username] searches only posts by the given username.
- selftext:[text] searches only the body of posts that were made as self-posts.
- subreddit:[name] searches only posts that were submitted to the given subreddit community.
- url:[text] searches only the URL of non-self-post posts.
- site:[text] searches only the domain name of non-self-post posts.
- nsfw:yes or nsfw:no to filter results based on whether they were marked as NSFW or not.
- self:yes or self:no to filter results based on whether they were self-posts or not.
The article contains a handful of other search commands; for example, Boolean and and or. How does one NOT out certain words. Use the minus sign. The word not is apparently minus sign appropriate for the discerning Reddit.com searcher.
Stephen E Arnold, February 23, 2017