Social Media Snooping Site Emerges for Landlord and Employers

September 2, 2016

The promise of unlocking the insights in big data is one that many search and analytics companies make. CNet shares the scoop on a new company: Disturbing new site scrapes your private Facebook and informs landlords, employers. Their website is Score Assured and it provides a service as an intermediary between your social media accounts and your landlord. Through scanning every word you have typed on Facebook, Twitter, LinkedIn or even Tinder, this service will then filter all the words through a neuro-linguistic programming tool to provide a report on your reputation. We learned,

There’s no reason to believe that Score Assured’s “analysis” will offer in any way an accurate portrayal of who you are or your financial wherewithal. States across the country are already preparing or enacting legislation to ensure that potential employers have no right to ask for your password to Facebook or other social media. In Washington, for example, it’s illegal for an employer to ask for your password. Score Assured offers landlords and employers (the employer service isn’t live yet) the chance to ask for such passwords slightly more indirectly. Psychologically, the company is preying on a weakness humans have been displaying for some time now: the willingness to give up their privacy to get something they think they really want.

Scraping and finding tools are not new, but could this application be any more 2016? The author of this piece is onto the zeitgeist of “I’ve got nothing to hide.” Consequently, data — even social data — becomes a commodity. Users’ willingness to consent is the sociologically interesting piece here. It remains to be seen whether the data mining technology is anything special.

Megan Feil, September 2, 2016
Sponsored by ArnoldIT.com, publisher of the CyberOSINT monograph

Stanford Offers Course Overviewing Roots of the Google Algorithm

March 23, 2016

The course syllabus for Stanford’s Computer Science class titled CS 349: Data Mining, Search, and the World Wide Web on Stanford.edu provides an overview of some of the technologies and advances that led to Google search. The syllabus states,

“There has been a close collaboration between the Data Mining Group (MIDAS) and the Digital Libraries Group at Stanford in the area of Web research. It has culminated in the WebBase project whose aims are to maintain a local copy of the World Wide Web (or at least a substantial portion thereof) and to use it as a research tool for information retrieval, data mining, and other applications. This has led to the development of the PageRank algorithm, the Google search engine…”

The syllabus alone offers some extremely useful insights that could help students and laypeople understand the roots of Google search. Key inclusions are the Digital Equipment Corporation (DEC) and PageRank, the algorithm named for Larry Page that enabled Google to become Google. The algorithm ranks web pages based on how many other websites link to them. John Kleinburg also played a key role by realizing that websites with lots of links (like a search engine) should also be seen as more important. The larger context of the course is data mining and information retrieval.

 

Chelsea Kerwin, March 23, 2016

Sponsored by ArnoldIT.com, publisher of the CyberOSINT monograph

 

Measuring Classifiers by a Rule of Thumb

February 1, 2016

Computer programmers who specialize in machine learning, artificial intelligence, data mining, data visualization, and statistics are smart individuals, but they sometimes even get stumped.  Using the same form of communication as reddit and old-fashioned forums, Cross Validated is a question an answer site run by Stack Exchange.   People can post questions related to data and relation topics and then wait for a response.  One user posted a question about “Machine Learning Classifiers”:

“I have been trying to find a good summary for the usage of popular classifiers, kind of like rules of thumb for when to use which classifier. For example, if there are lots of features, if there are millions of samples, if there are streaming samples coming in, etc., which classifier would be better suited in which scenarios?”

The response the user received was that the question was too broad.  Classifiers perform best depending on the data and the process that generates it.  It is kind of like asking the best way to organize books or your taxes, it depends on the content within the said items.

Another user replied that there was an easy way to explain the general process of understanding the best way to use classifiers.  The user directed users to the Sci-Kit.org chart about “choosing the estimator”. Other users say that the chart is incomplete, because it does not include deep learning, decision trees, and logistic regression.

We say create some other diagrams and share those.  Classifiers are complex, but they are a necessity to the artificial intelligence and big data craze.

 

Whitney Grace, February 1, 2016
Sponsored by ArnoldIT.com, publisher of the CyberOSINT monograph

The Modern Law Firm and Data

December 16, 2015

We thought it was a problem if law enforcement officials did not know how the Internet and Dark Web worked as well as the capabilities of eDiscovery tools, but a law firm that does not know how to work with data-mining tools much less the importance of technology is losing credibility, profit, and evidence for cases.  According to Information Week in “Data, Lawyers, And IT: How They’re Connected” the modern law firm needs to be aware of how eDiscovery tools, predictive coding, and data science work and see how they can benefit their cases.

It can be daunting trying to understand how new technology works, especially in a law firm.  The article explains how the above tools and more work in four key segments: what role data plays before trial, how it is changing the courtroom, how new tools pave the way for unprecedented approaches to law practice, how data is improving how law firms operate.

Data in pretrial amounts to one word: evidence.  People live their lives via their computers and create a digital trail without them realizing it.  With a few eDiscovery tools lawyers can assemble all necessary information within hours.  Data tools in the courtroom make practicing law seem like a scenario out of a fantasy or science fiction novel.  Lawyers are able to immediately pull up information to use as evidence for cross-examination or to validate facts.  New eDiscovery tools are also good to use, because it allows lawyers to prepare their arguments based on the judge and jury pool.  More data is available on individual cases rather than just big name ones.

“The legal industry has historically been a technology laggard, but it is evolving rapidly to meet the requirements of a data-intensive world.

‘Years ago, document review was done by hand. Metadata didn’t exist. You didn’t know when a document was created, who authored it, or who changed it. eDiscovery and computers have made dealing with massive amounts of data easier,’ said Robb Helt, director of trial technology at Suann Ingle Associates.”

Legal eDiscovery is one of the main branches of big data that has skyrocketed in the past decade.  While the examples discussed here are employed by respected law firms, keep in mind that eDiscovery technology is still new.  Ambulance chasers and other law firms probably do not have a full IT squad on staff, so when learning about lawyers ask about their eDiscovery capabilities.

Whitney Grace, December 16, 2015
Sponsored by ArnoldIT.com, publisher of the CyberOSINT monograph

 

Chinese Opinion Monitoring Software by Knowlesys

August 18, 2015

Ever wonder what tools the Chinese government uses to keep track of those pesky opinions voiced by its citizens? If so, take a look at “IPOMS : Chinese Internet Public Opinion Monitoring System” at Revolution News. The brief write-up tells us about a software company, Knowlesys, reportedly supplying such software to China (among other clients). Reporter and Revolution News founder Jennifer Baker tells us:

“Knowlesys’ system can collect web pages with some certain key words from Internet news, topics on forum and BBS, and then cluster these web pages according to different ‘event’ groups. Furthermore, this system provides the function of automatically tracking the progress of one event. With this system, supervisors can know what is exactly happening and what has happened from different views, which can improve their work efficiency a lot. Most of time, the supervisor is the government, the evil government. sometimes a company uses the system to collect information for its products. IPOMS is composed of web crawler, html parser and topic detection and tracking tool.”

The piece includes a diagram that lays out the software’s process, from extraction to analysis to presentation (though the specifics are pretty standard to anyone familiar with data analysis in general). Data monitoring and mining firm Knowlesys was founded in 2003. The company has offices in Hong Kong and a development center in Schenzhen, China.

Cynthia Murrell, August 18, 2015

Sponsored by ArnoldIT.com, publisher of the CyberOSINT monograph

Hadoop Rounds Up Open Source Goodies

July 17, 2015

Summer time is here and what better way to celebrate the warm weather and fun in the sun than with some fantastic open source tools.  Okay, so you probably will not take your computer to the beach, but if you have a vacation planned one of these tools might help you complete your work faster so you can get closer to that umbrella and cocktail.  Datamation has a great listicle focused on “Hadoop And Big Data: 60 Top Open Source Tools.”

Hadoop is one of the most adopted open source tool to provide big data solutions.  The Hadoop market is expected to be worth $1 billion by 2020 and IBM has dedicated 3,500 employees to develop Apache Spark, part of the Hadoop ecosystem.

As open source is a huge part of the Hadoop landscape, Datamation’s list provides invaluable information on tools that could mean the difference between a successful project and failed one.  Also they could save some extra cash on the IT budget.

“This area has a seen a lot of activity recently, with the launch of many new projects. Many of the most noteworthy projects are managed by the Apache Foundation and are closely related to Hadoop.”

Datamation has maintained this list for a while and they update it from time to time as the industry changes.  The list isn’t sorted on a comparison scale, one being the best, rather they tools are grouped into categories and a short description is given to explain what the tool does. The categories include: Hadoop-related tools, big data analysis platforms and tools, databases and data warehouses, business intelligence, data mining, big data search, programming languages, query engines, and in-memory technology.  There is a tool for nearly every sort of problem that could come up in a Hadoop environment, so the listicle is definitely worth a glance.

Whitney Grace, July 17, 2015
Sponsored by ArnoldIT.com, publisher of the CyberOSINT monograph

 

Lexalytics: GUI and Wizard

June 12, 2015

What is one way to improve a user’s software navigational experience?  One of the best ways is to add a graphical user interface (GUI).  Software Development @ IT Business Net shares a press release about “Lexalytics Unveils Industry’s First Wizard For Text Mining And Sentiment Analysis.”  Lexalytics is one of the leading companies that provides sentiment and analytics solutions and as the article’s title explains it has made an industry first by releasing a GUI and wizard for Semantria SaaS platform and Excel plug-in.  The wizard and GUI (SWIZ) are part of the Semantria Online Configurator, SWEB 1.3, which also included functionality updates and layout changes.

” ‘In order to get the most value out of text and sentiment analysis technologies, customers need to be able to tune the service to match their content and business needs,’ said Jeff Catlin, CEO, Lexalytics. ‘Just like Apple changed the game for consumers with its first Macintosh in 1984, making personal computing easy and fun through an innovative GUI, we want to improve the job of data analysts by making it just as fun, easy and intuitive with SWIZ.’”

Lexalytics is dedicated to helping its clients enjoy an easier experience when it comes to data analytics.  The company wants its clients to get the answers they by providing the tools they need to get them without having to over think the retrieval process.  While Lexalytics already provides robust and flexible solutions, the SWIZ release continues to prove it has the most tunable and configurable text mining technology.

Whitney Grace, June 12, 2015

Sponsored by ArnoldIT.com, publisher of the CyberOSINT monograph

Search Companies: Innovative or Not?

June 11, 2015

Forbes’ article “The 50 Most Innovative Companies Of 2014: Strong Innovators Are Three Times More Likely To Rely on Big Data Analytics” points out how innovation is strongly tied to big data analytics and data mining these days.  The Boston Consulting Group (BCG) studies the methodology of innovation.  The numbers are astounding when companies that use big data are placed against those who still have not figured out how to use their data: 57% vs. 19%.

Innovation, however, is not entirely defined by big data.  Most of the companies that rely on big data as key to their innovation are software companies.  According to Forbes’ study, they found that 53% see big data as having a huge impact in the future, while BCG only found 41% who saw big data as vital to their innovation.

Big data cannot be and should not be ignored.  Forbes and BCG found that big data analytics are useful and can have huge turnouts:

“BCG also found that big-data leaders generate 12% higher revenues than those who do not experiment and attempt to gain value from big data analytics.  Companies adopting big data analytics are twice as likely as their peers (81% versus 41%) to credit big data for making them more innovative.”

Measuring innovation proves to be subjective, but one cannot die the positive effect big data analytics and data mining can have on a company.  You have to realize, though, that big data results are useless without a plan to implement and use the data.  Also take note that none of the major search vendors are considered “innovative,” when a huge part of big data involves searching for results.

Whitney Grace, June 11, 2015

Sponsored by ArnoldIT.com, publisher of the CyberOSINT monograph

Explaining Big Data Mythology

May 14, 2015

Mythologies usually develop over a course of centuries, but big data has only been around for (arguably) a couple decades—at least in the modern incarnate.  Recently big data has received a lot of media attention and product development, which was enough to give the Internet time to create a big data mythology.  The Globe and Mail wanted to dispel some of the bigger myths in the article, “Unearthing Big Myths About Big Data.”

The article focuses on Prof. Joerg Niessing’s big data expertise and how he explains the truth behind many of the biggest big data myths.  One of the biggest items that Niessing wants people to understand is that gathering data does not equal dollar signs, you have to be active with data:

“You must take control, starting with developing a strategic outlook in which you will determine how to use the data at your disposal effectively. “That’s where a lot of companies struggle. They do not have a strategic approach. They don’t understand what they want to learn and get lost in the data,” he said in an interview. So before rushing into data mining, step back and figure out which customer segments and what aspects of their behavior you most want to learn about.”

Niessing says that big data is not really big, but made up of many diverse, data points.  Big data also does not have all the answers, instead it provides ambiguous results that need to be interpreted.  Have questions you want to be answered before gathering data.  Also all of the data returned is not the greatest.  Some of it is actually garbage, so it cannot be usable for a project.  Several other myths are uncovered, but the truth remains that having a strategic big data plan in place is the best way to make the most of big data.

Whitney Grace, May 14, 2015

Sponsored by ArnoldIT.com, publisher of the CyberOSINT monograph

Tweets Reveal Patterns of Support or Opposition for ISIL

March 31, 2015

Once again, data analysis is being put to good use. MIT Technology Review describes how “Twitter Data Mining Reveals the Origins of Support for the Islamic State.” A research team lead by one WalidMagdy at the Qatar Computing Research Institute studied tweets regarding the “Islamic State” (also known as ISIS, ISIL, or just IS) to discern any patterns that tell us which people choose to join such an organization and why.

See the article for a detailed description of the researchers’ methodology. Interesting observations involve use of the group’s name and tweet timing. Supporters tended to use the whole, official name (the “Islamic State in Iraq and the Levant” is perhaps the most accurate translation), while most opposing tweets didn’t bother, using the abbreviation. They also found that tweets criticizing ISIS surge right after the group has done something terrible, while supporters tended to tweet after a propaganda video was released or the group achieved a major military victory. Other indicators of sentiment were identified, and an algorithm created. The article reveals:

“Magdy and co trained a machine learning algorithm to spot users of both types and said it was able to classify other users as likely to become pro- or anti-ISIS with high accuracy. ‘We train a classifier that can predict future support or opposition of ISIS with 87 percent accuracy,’ they say….

“That is interesting research that reveals the complexity of the forces at work in determining support or opposition to movements like ISIS—why people like [Egypt’s] Ahmed Al-Darawy end up dying on the battlefield. A better understanding of these forces is surely a step forward in finding solutions to the tangled web that exists in this part of the world.

“However, it is worth ending on a note of caution. The ability to classify people as potential supporters of ISIS raises the dangerous prospect of a kind of thought police, like that depicted in films like Minority Report. Clearly, much thought must be given to the way this kind of information should be used.”

Clearly. (Though the writer seems unaware that the term “thought police” originated with Orwell’s Nineteen Eighty-Four, the reference to Minority Report shows he or she understands the concept. But I digress.) Still, trying to understand why people turn to violence and helping to mitigate their circumstances before they get there seems worth a try. Better than bombs, in my humble opinion, and perhaps longer-lasting.

Cynthia Murrell, March 31, 2015

Stephen E Arnold, Publisher of CyberOSINT at www.xenky.com

  • Archives

  • Recent Posts

  • Meta