Comprehensive Search System Atlas Recall Enters Open Beta
December 1, 2016
We learn about a new way to search nearly everything one has encountered digitally from TechCrunch’s article, “Atlas Recall, a Search Engine for Your Entire Digital Live, Gets an Open Beta and $20M in Backing.” The platform is the idea of Atlas Informatics CEO, and Napster co-founder, Jordan Ritter, a man after our own hearts. When given funding and his pick of projects, Ritter says, he “immediately” chose to improve the search experience.
The approach the Atlas team has devised may not be for everyone. It keeps track of everything users bring up on their computers and mobile devices (except things they specifically tell it not to.) It brings together data from disparate places like one’s Facebook, Outlook, Spotlight, and Spotify accounts and makes the data available from one cloud-based dashboard.
This does sound extremely convenient, and I don’t doubt the company’s claim that it can save workers hours every week. However, imagine how much damage a bad actor could do if, hypothetically, they were able to get in and search for, say, “account number” or “eyes only.” Make no mistake, security is a top priority for Atlas, and sensible privacy measures are in place. Besides, the company vows, they will not sell tailored (or any) advertising, and are very clear that each user owns their data. Furthermore, Atlas maintains they will have access to metadata, not the actual contents of users’ files.
Perhaps for those who already trust the cloud with much of their data, this arrangement is an acceptable risk. For those potential users, contributor Devin Coldewey describes Atlas Recall:
Not only does it keep track of all those items [which you have viewed] and their contents, but it knows the context surrounding them. It knows when you looked at them, what order you did so in, what other windows and apps you had open at the same time, where you were when you accessed it, who it was shared with before, and tons of other metadata.
The result is that a vague search, say ‘Seahawks game,’ will instantly produce all the data related to it, regardless of what silo it happens to be in, and presented with the most relevant stuff first. In that case maybe it would be the tickets you were emailed, then nearby, the plans you made over email with friends to get there, the Facebook invite you made, the articles you were reading about the team, your fantasy football page. Click on any of them and it takes you straight there. …
When you see it in action, it’s easy to imagine how quickly it could become essential. I happen to have a pretty poor memory, but even if I didn’t, who wants to scrub through four different web apps at work trying to find that one PDF? Wouldn’t it be nice to just type in a project name and have everything related to it — from you and from coworkers — pop up instantly, regardless of where it ‘lives’?
The main Atlas interface can be integrated with other search engines like Google and Spotlight, so users can see aggregated results when they use those, too. Interested readers may want to navigate to the article and view the embedded sales video, shorter than two minutes, which illustrates the platform. If you’re interested in the beta, you can sign up here (scroll down to “When can I start using Atlas?”). Founded in 2015, Atlas Informatics is based in Seattle. As of this writing, they are also hiring developers and engineers.
Cynthia Murrell, December 01, 2016
Sponsored by ArnoldIT.com, publisher of the CyberOSINT monograph
Machine Learning Changes the Way We Learn from Data
October 26, 2016
The technology blog post from Danial Miessler titled Machine Learning is the New Statistics strives to convey a sense of how crucial Machine Learning has become in terms of how we gather information about the world around us. Rather than dismissing Machine Learning as a buzzword, the author heralds Machine Learning as an advancement in our ability to engage with the world around us. The article states,
So Machine Learning is not merely a new trick, a trend, or even a milestone. It’s not like the next gadget, instant messaging, or smartphones, or even the move to mobile. It’s nothing less than a foundational upgrade to our ability to learn about the world, which applies to nearly everything else we care about. Statistics greatly magnified our ability to do that, and Machine Learning will take us even further.
The article breaks down the steps of our ability to analyze our own reality, moving from randomly explaining events, to explanations based on the past, to explanations based on comparisons with numerous trends and metadata. The article positions Machine Learning as the next step, involving an explanation that compares events but simultaneously progresses the comparison by coming up with new models. The difference is of course that Machine Learning offers the ability of continuous model improvement. If you are interested, the blog also offers a Machine Learning Primer.
Chelsea Kerwin, October 26, 2016
Sponsored by ArnoldIT.com, publisher of the CyberOSINT monograph
For the Paranoid at Heart: New Privacy Concerns from Columbia University and Google
September 23, 2016
The article on PhysOrg titled Location Data on Two Apps Enough to Identify Someone, Says Study illustrates the inadequacy of deleting names and personal details from big data sets. Location metadata undermines the anonymity of this data. Researchers at Columbia University and Google teamed up to establish that individuals can easily be identified simply by comparing their movements across two data sets. The article states,
What this really shows is that simply removing identifying information from large-scale data sets is not sufficient,” said Yves-Alexandre de Montjoye, a research scientist at the MIT Media Lab who was not involved in the study. “We need to move to a model of privacy-through-security. Instead of anonymizing data and making it public, there should be technical controls over who gets access to the data, how it is used, and for what purpose.
Just by bringing your phone with you, (and who doesn’t?) you create vast amounts of location metadata about yourself, often without your knowledge. As more and more apps require you to offer your location, it becomes less difficult for various companies to access the data. If you are interested in exploring how easy it is to figure out your identity based on your social media usage, visit You Are Where You Go.
Chelsea Kerwin, September 23, 2016
Sponsored by ArnoldIT.com, publisher of the CyberOSINT monograph
There is a Louisville, Kentucky Hidden Web/Dark Web meet up on September 27, 2016.
Information is at this link: https://www.meetup.com/Louisville-Hidden-Dark-Web-Meetup/events/233599645/
Signs of Life from Funnelback
May 19, 2016
Funnelback has been silent as of late, according to our research, but the search company has emerged from the tomb with eyes wide open and a heartbeat. The Funnelback blog has shared some new updates with us. The first bit of news is if you are “Searchless In Seattle? (AKA We’ve Just Opened A New Office!)” explains that Funnelback opened a new office in Seattle, Washington. The search company already has offices in Poland, United Kingdom, and New Zealand, but now they want to establish a branch in the United States. Given their successful track record with the finance, higher education, and government sectors in the other countries they stand a chance to offer more competition in the US. Seattle also has a reputable technology center and Funnelback will not have to deal with the Silicon Valley group.
The second piece of Funnelback news deals with “Driving Channel Shift With Site Search.” Channel shift is the process of creating the most efficient and cost effective way to deliver information access and usage to users. It can be difficult to implement a channel shift, but increasing the effectiveness of a Web site’s search can have a huge impact.
Being able to quickly and effectively locate information on a Web site saves time for not only more important facts, but it also can drive sales, further reputation, etc.
“You can go further still, using your search solution to provide targeted experiences; outputting results on maps, searching by postcode, allowing for short-listing and comparison baskets and even dynamically serving content related to what you know of a visitor, up-weighting content that is most relevant to them based on their browsing history or registered account.
Couple any of the features above with some intelligent search analytics, that highlight the content your users are finding and importantly what they aren’t finding (allowing you to make the relevant connections through promoted results, metadata tweaking or synonyms), and your online experience is starting to become a lot more appealing to users than that queue on hold at your call centre.”
I have written about it many times, but a decent Web site search function can make or break a site. Not only does it demonstrate that the Web site is not professional, it does not inspire confidence in a business. It is a very big rookie mistake to make.
Whitney Grace, May 19, 2016
Sponsored by ArnoldIT.com, publisher of the CyberOSINT monograph
Google Relies on Freebase Machine ID Numbers to Label Images in Knowledge Graph
May 3, 2016
The article on Seo by the Sea titled Image Search and Trends in Google Search Using FreeBase Entity Numbers explains the transformation occurring at Google around Freebase Machine ID numbers. Image searching is a complicated business when it comes to differentiating labels. Instead of text strings, Google’s Knowledge Graph is based in Freebase entities, which are able to uniquely evaluate images- without language. The article explains with a quote from Chuck Rosenberg,
“An entity is a way to uniquely identify something in a language-independent way. In English when we encounter the word “jaguar”, it is hard to determine if it represents the animal or the car manufacturer. Entities assign a unique ID to each, removing that ambiguity, in this case “/m/0449p” for the former and “/m/012×34” for the latter.”
Metadata is wonderful stuff, isn’t it? The article concludes by crediting Barbara Starr, a co-administrator of the Lotico San Diego Semantic Web Meetup, with noticing that the Machine ID numbers assigned to Freebase entities now appear in Google Trend’s URLs. Google Trends is a public web facility that enables an exploration of the hive mind by showing what people are currently searching. The Wednesday that President Obama nominated a new Supreme Court Justice, for example, had the top search as Merrick Garland.
Chelsea Kerwin, May 3, 2016
Sponsored by ArnoldIT.com, publisher of the CyberOSINT monograph
New Tor Communication Software for Journalists and Sources Launches
February 29, 2016
A new one-to-one messaging tool for journalists has launched after two years in development. The article Ricochet uses power of the dark web to help journalists, sources dodge metadata laws from The Age describes this new darknet-based software. The unique feature of this software, Ricochet, in comparison to others used by journalists such as Wickr, is that it does not use a server but rather Tor. Advocates acknowledge the risk of this Dark Web software being used for criminal activity but assert the aim is to provide sources and whistleblowers an anonymous channel to securely release information to journalists without exposure. The article explains,
“Dr Dreyfus said that the benefits of making the software available would outweigh any risks that it could be used for malicious purposes such as cloaking criminal and terrorist operations. “You have to accept that there are tools, which on balance are a much greater good to society even though there’s a tiny possibility they could be used for something less good,” she said. Mr Gray argued that Ricochet was designed for one-to-one communications that would be less appealing to criminal and terrorist organisers that need many-to-many communications to carry out attacks and operations. Regardless, he said, the criminals and terrorists had so many encryption and anonymising technologies available to them that pointing fingers at any one of them was futile.”
Online anonymity is showing increasing demand as evidenced through the recent launch of several new Tor-based softwares like Ricochet, in addition to Wickr and consumer-oriented apps like Snapchat. The Dark Web’s user base appears to be growing and diversifying. Will public perception follow suit?
Megan Feil, February 29, 2016
Sponsored by ArnoldIT.com, publisher of the CyberOSINT monograph
The Modern Law Firm and Data
December 16, 2015
We thought it was a problem if law enforcement officials did not know how the Internet and Dark Web worked as well as the capabilities of eDiscovery tools, but a law firm that does not know how to work with data-mining tools much less the importance of technology is losing credibility, profit, and evidence for cases. According to Information Week in “Data, Lawyers, And IT: How They’re Connected” the modern law firm needs to be aware of how eDiscovery tools, predictive coding, and data science work and see how they can benefit their cases.
It can be daunting trying to understand how new technology works, especially in a law firm. The article explains how the above tools and more work in four key segments: what role data plays before trial, how it is changing the courtroom, how new tools pave the way for unprecedented approaches to law practice, how data is improving how law firms operate.
Data in pretrial amounts to one word: evidence. People live their lives via their computers and create a digital trail without them realizing it. With a few eDiscovery tools lawyers can assemble all necessary information within hours. Data tools in the courtroom make practicing law seem like a scenario out of a fantasy or science fiction novel. Lawyers are able to immediately pull up information to use as evidence for cross-examination or to validate facts. New eDiscovery tools are also good to use, because it allows lawyers to prepare their arguments based on the judge and jury pool. More data is available on individual cases rather than just big name ones.
“The legal industry has historically been a technology laggard, but it is evolving rapidly to meet the requirements of a data-intensive world.
‘Years ago, document review was done by hand. Metadata didn’t exist. You didn’t know when a document was created, who authored it, or who changed it. eDiscovery and computers have made dealing with massive amounts of data easier,’ said Robb Helt, director of trial technology at Suann Ingle Associates.”
Legal eDiscovery is one of the main branches of big data that has skyrocketed in the past decade. While the examples discussed here are employed by respected law firms, keep in mind that eDiscovery technology is still new. Ambulance chasers and other law firms probably do not have a full IT squad on staff, so when learning about lawyers ask about their eDiscovery capabilities.
Whitney Grace, December 16, 2015
Sponsored by ArnoldIT.com, publisher of the CyberOSINT monograph
TemaTres Open Source Vocabulary Server
November 3, 2015
The latest version of the TemaTres vocabulary server is now available, we learn from the company’s blog post, “TemaTres 2.0 Released.” Released under the GNU General Public License version 2.0, the web application helps manage taxonomies, thesauri, and multilingual vocabularies. The web application can be downloaded at SourceForge. Here’s what has changed since the last release:
*Export to Moodle your vocabulary: now you can export to Moodle Glossary XML format
*Metadata summary about each term and about your vocabulary (data about terms, relations, notes and total descendants terms, deep levels, etc)
*New report: reports about terms with mapping relations, terms by status, preferred terms, etc.
*New report: reports about terms without notes or specific type of notes
*Import the notes type defined by user (custom notes) using tagged file format
*Select massively free terms to assign to other term
*Improve utilities to take terminological recommendations from other vocabularies (more than 300: http://www.vocabularyserver.com/vocabularies/)
*Update Zthes schema to Zthes 1.0 (Thanks to Wilbert Kraan)
*Export the whole vocabulary to Metadata Authority Description Schema (MADS)
*Fixed bugs and improved several functional aspects.
*Uses Bootstrap v3.3.4
See the server’s SourceForge page, above, for the full list of features. Though as of this writing only 21 users had rated the product, all seemed very pleased with the results. The TemaTres website notes that running the server requires some other open source tools: PHP, MySql, and HTTP Web server. It also specifies that, to update from version 1.82, keep the db.tematres.php, but replace the code. To update from TemaTres 1.6 or earlier, first go in as an administrator and update to version 1.7 through Menu-> Administration -> Database Maintenance.
Cynthia Murrell, November 3, 2015
Sponsored by ArnoldIT.com, publisher of the CyberOSINT monograph
Braiding Big Data
October 26, 2015
An apt metaphor to explain big data is the act of braiding. Braiding requires person to take three or more locks of hair and alternating weaving them together. The end result is clean, pretty hairstyle that keeps a person’s hair in place and off the face. Big data is like braiding, because specially tailored software takes an unruly mess of data, including the combed and uncombed strands, and organizes them into a legible format. Perhaps this is why TopQuadrant named its popular big data software TopBraid, read more about its software upgrade in “TopQuadrant Launches TopBraid 5.0.”
TopBraid Suite is an enterprise Web-based solution set that simplifies the development and management of standards-based, model driven solutions focused on taxonomy, ontology, metadata management, reference data governance, and data virtualization. The newest upgrade for TopBraid builds on the current enterprise information management solutions and adds new options:
“ ‘It continues to be our goal to improve ways for users to harness the full potential of their data,’ said Irene Polikoff, CEO and co-founder of TopQuadrant. ‘This latest release of 5.0 includes an exciting new feature, AutoClassifier. While our TopBraid Enterprise Vocabulary Net (EVN) Tagger has let users manually tag content with concepts from their vocabularies for several years, AutoClassifier completely automates that process.’ “
The AutoClassifer makes it easier to add and edit tags before making them a part of the production tag set. Other new features are for TopBraid Enterprise Vocabulary Net (TopBraid EVN), TopBraid Reference Data Manager (RDM), TopBraid Insight, and the TopBraid platform, including improvements in internationalization and a new component for increasing system availability in enterprise environments, TopBraid DataCache.
TopBraid might be the solution an enterprise system needs to braid its data into style.
Whitney Grace, October 26, 2015
Sponsored by ArnoldIT.com, publisher of the CyberOSINT monograph
Funding Granted for American Archive Search Project
September 23, 2015
Here’s an interesting project: we received an announcement about funding for Pop Up Archive: Search Your Sound. A joint effort of the WGBH Educational Foundation and the American Archive of Public Broadcasting, the venture’s goal is nothing less than to make almost 40,000 hours of Public Broadcasting media content easily accessible. The American Archive, now under the care of WGBH and the Library of Congress, has digitized that wealth of sound and video. Now, the details are in the metadata. The announcement reveals:
“As we’ve written before, metadata creation for media at scale benefits from both machine analysis and human correction. Pop Up Archive and WGBH are combining forces to do just that. Innovative features of the project include:
*Speech-to-text and audio analysis tools to transcribe and analyze almost 40,000 hours of digital audio from the American Archive of Public Broadcasting
*Open source web-based tools to improve transcripts and descriptive data by engaging the public in a crowdsourced, participatory cataloging project
*Creating and distributing data sets to provide a public database of audiovisual metadata for use by other projects.
“In addition to Pop Up Archive’s machine transcripts and automatic entity extraction (tagging), we’ll be conducting research in partnership with the HiPSTAS center at University of Texas at Austin to identify characteristics in audio beyond the words themselves. That could include emotional reactions like laughter and crying, speaker identities, and transitions between moods or segments.”
The project just received almost $900,000 in funding from the Institute of Museum and Library Services. This loot is on top of the grant received in 2013, from the Corporation for Public Broadcasting, that got the project started. But will it be enough money to develop a system that delivers on-point results? If not, we may be stuck with something clunky, something that resembles the old Autonomy Virage, Blinkxx, Exalead video search, or Google YouTube search. Let us hope this worthy endeavor continues to attract funding so that, someday, anyone can reliably (and intuitively) find valuable Public Broadcasting content.
Cynthia Murrell, September 23, 2015
Sponsored by ArnoldIT.com, publisher of the CyberOSINT monograph

