A Not-For-Profit Search Engine? That’s So Crazy It Just Might Work
May 4, 2016
The Common Search Project has a simple and straightforward mission statement. They want a nonprofit search engine, an alternative to the companies currently running the Internet (ahem, Google.) They are extremely polite in their venture, but also firmly invested in three qualities for the search engine that they intend to build and run: openness, transparency, and independence. The core values include,
“Radical transparency. Our search results must be explainable and reproducible. All our code is open source and results are generated only using publicly available data. Transparency also extends to our governance, finances and day-to-day operations. Independence. No single person, company or special interest must be able to influence the order of our search results to their benefit. … Public service. We want to build and operate a free service targeted at a large, mainstream audience.”
Common Search currently offers a Demo version for searching homepages only. They are an exciting development compared to the other David’s who have swung at Google’s Goliath. Common Search makes DuckDuckGo, the search engine focused on ensuring user privacy, look downright half-assed. They are calling for, and creating, a real alternative with a completely fresh perspective that isn’t solely about meeting user needs, but insisting on user standards related to privacy, control, and clarity of results.
Chelsea Kerwin, May 4, 2016
Sponsored by ArnoldIT.com, publisher of the CyberOSINT monograph
Do Businesses Have a Collective Intelligence?
May 4, 2016
After working in corporate America for several years, I was amazed by the sheer audacity of its stupidity. I came to the conclusion that many people in corporate America lack intelligence and are slowly skirting insanity’s edge, so when I read Xconomy’s article, “Brainspace Aims To Harness ‘Collective Intelligence’ Of Businesses” made me giggle. I digress. Intelligence really does run rampant in businesses, especially in IT departments the keep modern companies up and running. The digital workspace has created a collective intelligence within a company’s enterprise system and the information is either accessed directly from the file hierarchy or through (the usually quicker) search box.
Keywords within the correct context pertaining to a company are extremely important to semantic search, which is why Brainspace invented a search software that creates a search ontology for individual companies. Brainspace says that all companies create collective intelligence within their systems and their software takes the digitized “brain” and produces a navigable map that organizes the key items into clusters.
“As the collection of digital data on how we work and live continues to grow, software companies like Brainspace are working on making the data more useful through analytics, artificial intelligence, and machine-learning techniques. For example, in 2014 Google acquired London-based Deep Mind Technologies, while Facebook runs a program called FAIR—Facebook AI Research. IBM Watson’s cognitive computing program has a significant presence in Austin, TX, where a small artificial intelligence cluster is growing.”
Building a search ontology by incorporating artificial intelligence into semantic search is a fantastic idea. Big data relies on deciphering information housed in the “collective intelligence,” but it can lack human reasoning to understanding context. An intelligent semantic search engine could do wonders that Google has not even built a startup for yet.
Whitney Grace, May 4, 2016
Sponsored by ArnoldIT.com, publisher of the CyberOSINT monograph
Google Relies on Freebase Machine ID Numbers to Label Images in Knowledge Graph
May 3, 2016
The article on Seo by the Sea titled Image Search and Trends in Google Search Using FreeBase Entity Numbers explains the transformation occurring at Google around Freebase Machine ID numbers. Image searching is a complicated business when it comes to differentiating labels. Instead of text strings, Google’s Knowledge Graph is based in Freebase entities, which are able to uniquely evaluate images- without language. The article explains with a quote from Chuck Rosenberg,
“An entity is a way to uniquely identify something in a language-independent way. In English when we encounter the word “jaguar”, it is hard to determine if it represents the animal or the car manufacturer. Entities assign a unique ID to each, removing that ambiguity, in this case “/m/0449p” for the former and “/m/012×34” for the latter.”
Metadata is wonderful stuff, isn’t it? The article concludes by crediting Barbara Starr, a co-administrator of the Lotico San Diego Semantic Web Meetup, with noticing that the Machine ID numbers assigned to Freebase entities now appear in Google Trend’s URLs. Google Trends is a public web facility that enables an exploration of the hive mind by showing what people are currently searching. The Wednesday that President Obama nominated a new Supreme Court Justice, for example, had the top search as Merrick Garland.
Chelsea Kerwin, May 3, 2016
Sponsored by ArnoldIT.com, publisher of the CyberOSINT monograph
An Open Source Search Engine to Experiment With
May 1, 2016
Apache Lucene receives the most headlines when it comes to discussion about open source search software. My RSS feed pulled up another open source search engine that shows promise in being a decent piece of software. Open Semantic Search is free software that cane be uses for text mining, analytics, a search engine, data explorer, and other research tools. It is based on Elasticsearch/Apache Solrs’ open source enterprise search. It was designed with open standards and with a robust semantic search.
As with any open source search, it can be programmed with numerous features based on the user’s preference. These include, tagging, annotation, varying file format support, multiple data sources support, data visualization, newsfeeds, automatic text recognition, faceted search, interactive filters, and more. It has the benefit that it can be programmed for mobile platforms, metadata management, and file system monitoring.
Open Semantic Search is described as
“Research tools for easier searching, analytics, data enrichment & text mining of heterogeneous and large document sets with free software on your own computer or server.”
While its base code is derived from Apache Lucene, it takes the original product and builds something better. Proprietary software is an expense dubbed a necessary evil if you work in a large company. If, however, you are a programmer and have the time to develop your own search engine and analytics software, do it. It could be even turn out better than the proprietary stuff.
Whitney Grace, May 1, 2016
Sponsored by ArnoldIT.com, publisher of the CyberOSINT monograph
Search without Indexing
April 27, 2016
I read “Outsmarting Google Search: Making Fuzzy Search Fast and Easy Without Indexing.”
Here’s a passage I highlighted:
It’s clear the “Google way” of indexing data to enable fuzzy search isn’t always the best way. It’s also clear that limiting the fuzzy search to an edit distance of two won’t give you the answers you need or the most comprehensive view of your data. To get real-time fuzzy searches that return all relevant results you must use a data analytics platform that is not constrained by the underlying sequential processing architectures that make up software parallelism. The key is hardware parallelism, not software parallelism, made possible by the hybrid FPGA/x86 compute engine at the heart of the Ryft ONE.
I also circled:
By combining massively parallel FPGA processing with an x86-powered Linux front-end, 48 TB of storage, a library of algorithmic components and open APIs in a small 1U device, Ryft has created the first easy-to-use appliance to accelerate fuzzy search to match exact search speeds without indexing.
An outfit called InsideBigData published “Ryft Makes Real-time Fuzzy Search a Reality.” Alas, that link is now dead.
Perhaps a real time fuzzy search will reveal the quickly deleted content?
Sounds promising. How does one retrieve information within videos, audio streams, and images? How does one hook together or link a reference to an entity (discovered without controlled term lists) with a phone number?
My hunch is that the methods disclosed in the article have promise, the future of search seems to be lurching toward applications that solve real world, real time problems. Ryft may be heading in that direction in a search climate which presents formidable headwinds.
Stephen E Arnold, April 27, 2016
Duck Duck Go as a Privacy Conscious Google Alternative
April 26, 2016
Those frustrated with Google may have an alternative. Going over to the duck side: A week with Duck Duck Go from Search Engine Watch shares a thorough first-hand account of using Duck Duck Go for a week. User privacy protection seems to be the hallmark of the search service and there is even an option to enable Tor in its mobile app. Features are comparable, such as one designed to compete with Google’s Knowledge Graph called Instant Answers. As an open source product, Instant Answers is built up by community contributions. As far as seamless, intuitive search, the post concludes,
“The question is, am I indignant enough about Google’s knowledge of my browsing habits (and everyone else’s that feed its all-knowing algorithms) to trade the convenience of instantly finding what I’m after for that extra measure of privacy online? My assessment of DuckDuckGo after spending a week in the pond is that it’s a search engine for the long term. To get the most out of using it, you have to make a conscious change in your online habits, rather than just expecting to switch one search engine for another and get the same results.”
Will a majority of users replace “Googling” with “Ducking” anytime soon? Time will tell, and it will be an interesting saga to see unfold. I suppose we could track the evolution on Knowledge Graph and Instant Answers to see the competing narratives unfold.
Megan Feil, April 26, 2016
Sponsored by ArnoldIT.com, publisher of the CyberOSINT monograph
Watson Lacks Conversation Skills and He Is Not Evil
April 22, 2016
When I was in New York last year, I was walking on the west side when I noticed several other pedestrians moving out of the way of a man mumbling to himself. Doing as the natives do, I moved aside and heard the man rumble about how, “The robots are taking over and soon they will be ruling us. You all are idiots for not listening to me.” Fear of a robot apocalypse has been constant since computer technology gained precedence and we also can thank science-fiction for perpetuating it. Tech Insider says in “Watson Can’t Actually Talk To You Like In The Commercials” Elon Musk, Bill Gates, Stephen Hawking, and other tech leaders have voiced their concerns about creating artificial intelligence that is so advanced it can turn evil.
IBM wants people to believe otherwise, which explains their recent PR campaign with commercials that depict Watson carrying on conversations with people. The idea is that people will think AI are friendly, here to augment our jobs, and overall help us. There is some deception on IBM’s part, however. Watson cannot actually carry on a conversation with a person. People can communicate with, usually via an UI like a program via a desktop or tablet. Also there is more than one Watson, each is programmed for different functions like diagnosing diseases or cooking.
“So remember next time you see Watson carrying on a conversation on TV that it’s not as human-like as it seems…Humor is a great way to connect with a much broader audience and engage on a personal level to demystify the technology,’ Ann Rubin, Vice President IBM Content and Global Creative, wrote in an email about the commercials. ‘The reality is that these technologies are being used in our daily lives to help people.’”
If artificial intelligence does become advanced enough that it is capable of thought and reason comparable to a human, it is worrisome. It might require that certain laws be put into place to maintain control over the artificial “life.” That day is a long time off, however, until then embrace robots helping to improve life.
Whitney Grace, April 22, 2016
Sponsored by ArnoldIT.com, publisher of the CyberOSINT monograph
Local News Station Produces Dark Web Story
April 22, 2016
The Dark Web continues to emerge as a subject of media interest for growing audiences. An article, Dark Web Makes Illegal Drug, Gun Purchases Hard To Trace from Chicago CBS also appears to have been shared as a news segment recently. Offering some light education on the topic, the story explains the anonymity possible for criminal activity using the Dark Web and Bitcoin. The post describes how these tools are typically used,
“Within seconds of exploring the deep web we found over 15,000 sales for drugs including heroin, cocaine and marijuana. In addition to the drugs we found fake Illinois drivers licenses, credit card and bank information and dangerous weapons. “We have what looks to be an assault rifle, AK 47,” said Petefish. That assault rifle AK 47 was selling for 10 bitcoin which would be about $4,000. You can buy bitcoins at bitcoin ATM machines using cash, leaving very little trace of your identity. Bitcoin currency along with the anonymity and encryption used on the dark web makes it harder for authorities to catch criminals, but not impossible.”
As expected, this piece touches on the infamous Silk Road case along with some nearby cases involving local police. While the Dark Web and cybercrime has been on our radar for quite some time, it appears mainstream media interest around the topic is slowly growing. Perhaps those with risk to be affected, such as businesses, government and law enforcement agencies will also continue catching on to the issues surrounding the Dark Web.
Megan Feil, April 22, 2016
Sponsored by ArnoldIT.com, publisher of the CyberOSINT monograph
Google Removes Pirate Links
April 21, 2016
A few weeks ago, YouTube was abuzz with discontent from some of its most popular YouTube stars. Their channels had been shut down die to copyright claims by third parties, even thought the content in question fell under the Fair Use defense. YouTube is not the only one who has to deal with copyright claims. TorrentFreak reports that “Google Asked To Remove 100,000 ‘Pirate Links’ Every Hour.”
Google handles on average two million DMCA takedown notices from copyright holders about pirated content. TorrentFreak discovered that the number has doubled since 2015 and quadrupled since 2014. The amount beats down to one hundred thousand per hour. If the rate continues it will deal with one billion DMCA notices this year, while it had previously taken a decade to reach this number.
“While not all takedown requests are accurate, the majority of the reported links are. As a result many popular pirate sites are now less visible in Google’s search results, since Google downranks sites for which it receives a high number of takedown requests. In a submission to the Intellectual Property Enforcement Coordinator a few months ago Google stated that the continued removal surge doesn’t influence its takedown speeds.”
Google does not take broad sweeping actions, such as removing entire domain names from search indexes, as it does not want to become a censorship board. The copyright holders, though, are angry and want Google to promote only legal services over the hundreds of thousands of Web sites that pop up with illegal content. The battle is compared to an endless whack-a-mole game.
Pirated content does harm the economy, but the numbers are far less than how the huge copyright holders claim. The smaller people who launch DMCA takedowns, they are hurt more. YouTube stars, on the other hand, are the butt of an unfunny joke and it would be wise for rules to be revised.
Whitney Grace, April 21, 2016
Sponsored by ArnoldIT.com, publisher of the CyberOSINT monograph
Digging for a Direction of Alphabet Google
April 21, 2016
Is Google trying to emulate BAE System‘s NetReveal, IBM i2, and systems from Palantir? Looking back at an older article from Search Engine Watch, How the Semantic Web Changes Everything for Search may provide insight. Then, Knowledge Graph had launched, and along with it came a wave of communications generating buzz about a new era of search moving from string-based queries to a semantic approach, organizing by “things”. The write-up explains,
“The cornerstone of any march to a semantic future is the organization of data and in recent years Google has worked hard in the acquisition space to help ensure that they have both the structure and the data in place to begin creating “entities”. In buying Wavii, a natural language processing business, and Waze, a business with reams of data on local traffic and by plugging into the CIA World Factbook, Freebase and Wikipedia and other information sources, Google has begun delivering in-search info on people, places and things.”
This article mentioned Knowledge Graph’s implication for Google to deliver strengthened and more relevant advertising with this semantic approach. Even today, we see the Alphabet Google thing continuing to shift from search to other interesting information access functions in order to sell ads.
Megan Feil, April 21, 2016
Sponsored by ArnoldIT.com, publisher of the CyberOSINT monograph