Big Data Needs to Go Public

December 16, 2016

Big Data touches every part of our lives and we are unaware.  Have you ever noticed when you listen to the news, read an article, or watch a YouTube video that people say items such as: “experts claim, “science says,” etc.”  In the past, these statements relied on less than trustworthy sources, but now they can use Big Data to back up their claims.  However, popular opinion and puff pieces still need to back up their big data with hard fact.  Nature.com says that transparency is a big deal for Big Data and algorithm designers need to work on it in the article, “More Accountability For Big-Data Algorithms.”

One of the hopes is that big data will be used to bridge the divide between one bias and another, except that he opposite can happen.  In other words, Big Data algorithms can be designed with a bias:

There are many sources of bias in algorithms. One is the hard-coding of rules and use of data sets that already reflect common societal spin. Put bias in and get bias out. Spurious or dubious correlations are another pitfall. A widely cited example is the way in which hiring algorithms can give a person with a longer commute time a negative score, because data suggest that long commutes correlate with high staff turnover.

Even worse is that people and organizations can design an algorithm to support science or facts they want to pass off as the truth.  There is a growing demand for “algorithm accountability,” mostly in academia.  The demands are that data sets fed into the algorithms are made public.  There also plans to make algorithms that monitor algorithms for bias.

Big Data is here to say, but relying too much on algorithms can distort the facts.  This is why the human element is still needed to distinguish between fact and fiction.  Minority Report is closer to being our present than ever before.

Whitney Grace, December 16, 2016

Big Data on Crime

December 5, 2016

An analytics company that collects crime related data from local law enforcement agencies plans to help reduce crime rates by using Big Data.

CrimerReports.com, in its FAQs says:

The data on CrimeReports is sent on an hourly, daily, or weekly basis from more than 1000 participating agencies to the CrimeReports map. Each agency controls their data flow to CrimeReports, including how often they send data, which incidents are included.

Very little is known about the service provider. WhoIs Lookup indicates that though the domain was registered way back in 1999, it was updated few days back on November 25th 2016 and is valid till November 2, 2017.

CrimeReports is linked to a local law enforcement agency that selectively shares the data on crime with the analytics firm. After some number crunching, the service provider then sends the data to its subscribers via emails. According to the firm:

Although no formal, third-party study has been commissioned, there is anecdotal evidence to suggest that public-facing crime mapping—by keeping citizens informed about crime in their area—helps them be more vigilant and implement crime prevention efforts in their homes, workplaces, and communities. In addition, there is anecdotal evidence to suggest that public-facing crime mapping fosters more trust in local law enforcement by members of the community.

To maintain data integrity, the data is collected only through official channels. The crime details are not comprehensive, rather they are redacted to protect victim and criminal’s privacy. As of now, CrimeReports get paid by law enforcement agencies. Certainly, this is something new and probably never tried.

Vishal Ingole, December 5, 2016

Could AI Spell Doom for Marketers?

December 1, 2016

AI is making inroads into almost every domain; marketing is no different. However, inability of AI to be creative in true sense may be a major impediment.

The Telegraph in a feature article titled Marketing Faces Death by Algorithm Unless It Finds a New Code says:

Artificial intelligence (AI) is one of the most-hyped topics in advertising right now. Brands are increasingly finding that they need to market to intelligent machines in order to reach humans, and this is set to transform the marketing function.

The problem with AI, as most marketers agree is its inability to imitate true creativity. As the focus of marketing is shifting from direct product placement to content marketing, the importance of AI becomes even bigger. For instance, a clothing company cannot analyze vast amounts of Big Data, decipher it and then create targeted advertising based on it. Algorithms will play a crucial role in it. However, the content creation will ultimately require human touch and intervention.

As it becomes clear here:

While AI can build a creative idea, it’s not creative “in the true sense of the word”, according to Mr Cooper. Machine learning – the driving technology behind how AI can learn – still requires human intelligence to work out how the machine would get there. “It can’t put two seemingly random thoughts together and recognize something new.

The other school of thought says that what AI lacks is not creativity, but processing power and storage. It seems we are moving closer to bridging this gap. Thus when AI closes this gap, will most occupations, including, creative and technical become obsolete?

Vishal Ingole, December 1, 2016
Sponsored by ArnoldIT.com, publisher of the CyberOSINT monograph

AI to Profile Gang Members on Twitter

November 16, 2016

Researchers from Ohio Center of Excellence in Knowledge-enabled Computing (Kno.e.sis) are claiming that an algorithm developed by them is capable of identifying gang members on Twitter.

Vice.com recently published an article titled Researchers Claim AI Can Identify Gang Members on Twitter, which claims that:

A deep learning AI algorithm that can identify street gang members based solely on their Twitter posts, and with 77 percent accuracy.

The article then points out the shortcomings of the algorithm or AI by saying this:

According to one expert contacted by Motherboard, this technology has serious shortcomings that might end up doing more harm than good, especially if a computer pegs someone as a gang member just because they use certain words, enjoy rap, or frequently use certain emojis—all criteria employed by this experimental AI.

The shortcomings do not end here. The data on Twitter is being analyzed in a silo. For example, let us assume that few gang members are identified using the algorithm (remember, no location information is taken into consideration by the AI), what next?

Is it not necessary then to also identify other social media profiles of the supposed gang members, look at Big Data generated by them, analyze their communication patterns and then form some conclusion? Unfortunately, none of this is done by the AI. It, in fact, would be a mammoth task to extrapolate data from multiple sources just to identify people with certain traits.

And most importantly, what if the AI is put in place, and someone just for the sake of fun projects an innocent person as a gang member? As rightly pointed out in the article – machines trained on prejudiced data tend to reproduce those same, very human, prejudices.

Vishal Ingole, November  16, 2016
Sponsored by ArnoldIT.com, publisher of the CyberOSINT monograph

The CIA Claims They Are Psychic

November 2, 2016

Today’s headline sounds like something one would read printed on a grocery store tabloid or a conspiracy Web site.  Before I start making claims about the Illuminati, this is not a claim about magical powers, but rather big data and hard science…I think.  Defense One shares that, “The CIA Says It Can Predict Social Unrest As Early As 3 To 5 Days Out.”  While deep learning and other big data technology is used to drive commerce, science, healthcare, and other industries, law enforcement officials and organizations are using it to predict and prevent crime.

The CIA users big data to analyze data sets, discover trends, and predict events that might have national security ramifications.  CIA Director John Brennan hired Andrew Hallman to be the Deputy Director for Digital Innovations within the agency.  Under Hallman’s guidance, the CIA’s “anticipatory intelligence” has improved.  The CIA is not only using their private data sets, but also augment them with open data sets to help predict social unrest.

The big data science allows the CIA to make more confident decisions and provide their agents with better information to assess a situation.

Hallman said analysts are “becoming more proficient in articulating” observations to policymakers derived in these new ways. What it adds up to, Hallman said, is a clearer picture of events unfolding—or about to unfold—in an increasingly unclear world.

What I wonder is how many civil unrest events have been prevented?  For security reasons, some of them remain classified.  While the news is mongering fear, would it not be helpful if the CIA shared some of its success stats with the news and had them make it a priority to broadcast it?

Whitney Grace, November 2, 2016
Sponsored by ArnoldIT.com, publisher of the CyberOSINT monograph

Online Drugs Trade Needs Surgical Strikes

October 25, 2016

Despite shutdown of Silk Road by the FBI in 2013, online drug trade through Dark Net is thriving. Only military-precision like surgical strikes on vendors and marketplaces using technological methods can solve this problem.

RAND Corporation in its research papaer titled Taking Stock of the Online Drugs Trade says that –

Illegal drug transactions on cryptomarkets have tripled since 2013, with revenues doubling. But at $12-21 (€10.5-18.5) million a month, this is clearly a niche market compared to the traditional offline market, estimated at $2.3 (€2) billion a month in Europe alone.

The primary goal of the research paper was to determine first, the size and scope of cryptomarkets and second, to device avenues for law enforcement agencies to intervene these illegal practices. Though the report covered the entire Europe, the role of Netherlands, in particular, was studied in this report. This was owing to the fact that Netherlands has the highest rate of consumption of drugs acquired using cryptomarkets.

Some interesting findings of the report include –

  • Though revenues have doubled, drug cryptomarkets are still niche and generate revenues of $21 million/month as compared to $2.1 billion in offline trade.
  • Cannabis still is the most in demand followed by stimulants like cocaine and ecstasy-type drugs
  • Vendors from US, Australia, Canada and Western Europe dominate the online marketplace

Apart from following the conventional methods of disrupting the drug trade (dismantling logistics, undercover operations, and taking down marketplaces), the only new method suggested includes the use of Big Data techniques.

Cryptomarkets are going to thrive, and the only way to tackle this threat is by following the money (in this case, the cryptocurrencies). But who is going to bell the cat?

Vishal Ingole, October 25, 2016
Sponsored by ArnoldIT.com, publisher of the CyberOSINT monograph

What Lurks in the Dark Web?

October 20, 2016

Organizations concerned about cyber security can effectively thwart any threats conditionally they know a threat is lurking in the dark. An Israeli SaaS-based startup claims it can bridge this gap by offering real-time analysis of data on Dark Web.

TechCrunch in an article Sixgill claims to crawl the Dark Web to detect future cybercrime says:

Sixgill has developed proprietary algorithms and tech to connect the Dark Web’s dots by analyzing so-called “big data” to create profiles and patterns of Dark Web users and their hidden social networks. It’s via the automatic crunching of this data that the company claims to be able to identify and track potential hackers who may be planning malicious and illegal activity.

By analyzing the data, Sixgill claims that it can identify illegal marketplaces, data leaks and also physical attacks on organizations using its proprietary algorithms. However, there are multiple loopholes in this type of setup.

First, some Dark Web actors can easily insert red herrings across the communication channels to divert attention from real threats. Second, the Dark Web was created by individuals who wished to keep their communications cloaked. Mining data, crunching it through algorithms would not be sufficient enough to keep organizations safe. Moreover, AI can only process data that has been mined by algorithms, which is many cases can be false. TOR is undergoing changes to increase the safeguards in place for its users. What’s beginning is a Dark Web arms race. A pattern of compromise will be followed by hardening. Then compromise will occur and the Hegelian cycle repeats.

Vishal Ingole, October 20, 2016
Sponsored by ArnoldIT.com, publisher of the CyberOSINT monograph

Paris Police Face Data Problem in Google Tax Evasion Investigation

September 20, 2016

Google has been under scrutiny for suspected tax evasion. Yahoo published a brief piece updating us on the investigation: Data analysis from Paris raid on Google will take months, possibly years: prosecutor. French police raided Google’s office in Paris, taking the tax avoidance inquiry to a new level. This comes after much pressure from across Europe to prevent multinational corporations from using their worldwide presence to pay less taxes. Financial prosecutor Eliane Houlette is quoted stating,

We have collected a lot of computer data, Houlette said in an interview with Europe 1 radio, TV channel iTele and newspaper Le Monde, adding that 96 people took part in the raid. “We need to analyze (the data) … (it will take) months, I hope that it won’t be several years, but we are very limited in resources’. Google, which said it is complying fully with French law, is under pressure across Europe from public opinion and governments angry at the way multinationals exploit their global presence to minimize tax liabilities.

While big data search technology exists, government and law enforcement agencies may not have the funds to utilize such technologies. Or, perhaps the knowledge of open source solutions is not apparent. If nothing else, these comments made by Houlette go to show the need for increased focus on upgrading systems for real-time and rapid data analysis.

Megan Feil, September 20, 2016
Sponsored by ArnoldIT.com, publisher of the CyberOSINT monograph
There is a Louisville, Kentucky Hidden Web/Dark Web meet up on September 27, 2016.
Information is at this link: https://www.meetup.com/Louisville-Hidden-Dark-Web-Meetup/events/233599645/

 

Big Data Processing Is Relative to Paradigm of Today

September 7, 2016

The size and volume that characterizes an information set as big data — and the tools used to process — is relative to the current era. A story from NPR reminds us of this as they ask, Can Web Search Predict Cancer? Promise And Worry Of Big Data And Health. In 1600’s England, a statistician essentially founded demography by compiling details of death records into tables. Today, trends from big data are drawn through a combination of assistance from computer technology and people’s analytical skills. Microsoft scientists conducted a study showing that Bing search queries may hold clues to a future diagnosis of pancreatic cancer.

The Microsoft scientists themselves acknowledge this [lack of comprehensive knowledge and predictive abilities] in the study. “Clinical trials are necessary to understand whether our learned model has practical utility, including in combination with other screening methods,” they write. Therein lies the crux of this big data future: It’s a logical progression for the modern hyper-connected world, but one that will continue to require the solid grounding of a traditional health professional, to steer data toward usefulness, to avoid unwarranted anxiety or even unnecessary testing, and to zero in on actual causes, not just correlations within particular health trends.”

As the producers of data points in many social-related data sets, and as the original analyzers of big data, it makes sense that people remain a key part of big data analytics. While this may be especially pertinent in matters related to health, it may be more intuitively understood in this sector in contrast to others. Whether health or another sector, can the human variable ever be taken out of the data equation? Perhaps such a world will give rise to whatever is beyond the current buzz around the phrase big data.

Megan Feil, September 7, 2016
Sponsored by ArnoldIT.com, publisher of the CyberOSINT monograph
There is a Louisville, Kentucky Hidden Web/Dark Web meet up on September 27, 2016.
Information is at this link: https://www.meetup.com/Louisville-Hidden-Dark-Web-Meetup/events/233599645/

Big Data Is Just a Myth

August 1, 2016

Remember in the 1979 hit The Muppet Movie there was a running gag where Kermit the Frog kept saying, “It’s a myth.  A myth!”  Then a woman named Myth would appear out of nowhere and say, “Yes?”  It was a funny random gag, but while it is a myth that frogs give warts, most of the myths related to big data may or not be.  Data Science Central decided to explain some of the myths in, “Debunking The 68 Most Common Myths About Big Data-Part 2.”

Some of the prior myths debunked in the first part were that big data was the newest power word, an end all solution for companies, only meant for big companies, and that it was complicated and expensive.  In truth, anyone can benefit from big data with a decent implementation plan and with someone who knows how to take charge of it.

Big data, in fact, can be integrated with preexisting systems, although it takes time and knowledge to link the new and the old together (it is not as difficult as it seems).  Keeping on that same thought, users need to realize that there is not a one size fits all big data solution.  Big data is a solution that requires analytical, storage, and other software.  It cannot be purchased like other proprietary software and it needs to be individualized for each organization.

One myth that is has converted into truth is that big data relies on Hadoop storage.  It used to be Hadoop  managed a market of many, but bow it is an integral bit of software needed to get the big data job done.  One of the most prevalent myths is it only belongs in the IT department:

“Here’s the core of the issue.  Big Data gives companies the greatly enhanced ability to reap benefits from data-driven insights and to make better decisions.  These are strategic issues.

You know who is most likely to be clamoring for Big Data?  Not IT.  Most likely it’s sales, marketing, pricing, logistics, and production forecasting.  All areas that tend to reap outsize rewards from better forward views of the business.”

Big data is becoming more of an essential tool for organizations in every field as it tells them more about how they operate and their shortcomings.  Big data offers a very detailed examination of these issues; the biggest issue users need to deal with is how they will use it?

 

Whitney Grace, August 1, 2016
Sponsored by ArnoldIT.com, publisher of the CyberOSINT monograph

Next Page »

  • Archives

  • Recent Posts

  • Meta