IBM Thinks Big on Data Unification

December 7, 2016

So far, the big data phenomenon has underwhelmed. We have developed several good ways to collect, store, and analyze data. However, those several ways have resulted in separate, individually developed systems that do not play well together. IBM hopes to fix that, we learn from “IBM Announces a Universal Platform for Data Science” at Forbes. They call the project the Data Science Experience. Writer Greg Satell explains:

Consider a typical retail enterprise, which has separate operations for purchasing, point-of-sale, inventory, marketing and other functions. All of these are continually generating and storing data as they interact with the real world in real time. Ideally, these systems would be tightly integrated, so that data generated in one area could influence decisions in another.

The reality, unfortunately, is that things rarely work together so seamlessly. Each of these systems stores information differently, which makes it very difficult to get full value from data. To understand how, for example, a marketing campaign is affecting traffic on the web site and in the stores, you often need to pull it out of separate systems and load it into excel sheets.

That, essentially, has been what’s been holding data science back. We have the tools to analyze mountains of data and derive amazing insights in real time. New advanced cognitive systems, like Watson, can then take that data, learn from it and help guide our actions. But for all that to work, the information has to be accessible.”

The article acknowledges that progress that has been made in this area, citing the open-source Hadoop and its OS, Spark, for their ability to tap into clusters of data around the world and analyze that data as a single set. Incompatible systems, however, still vex many organizations.

The article closes with an interesting observation—that many business people’s mindsets are stuck in the past. Planning far ahead is considered prudent, as is taking ample time to make any big decision. Technology has moved past that, though, and now such caution can render the basis for any decision obsolete as soon as it is made. As Satell puts it, we need “a more Bayesian approach to strategy, where we don’t expect to predict things and be right, but rather allow data streams to help us become less wrong over time.” Can the humans adapt to this way of thinking? It is reassuring to have a plan; I suspect only the most adaptable among us will feel comfortable flying by the seat of our pants.

Cynthia Murrell, December 7, 2016

Emphasize Data Suitability over Data Quantity

November 30, 2016

It seems obvious to us, but apparently, some folks need a reminder. Harvard Business Review proclaims, “You Don’t Need Big Data, You Need the Right Data.” Perhaps that distinction has gotten lost in the Big Data hype. Writer Maxwell Wessel points to Uber as an example. Though the company does collect a lot of data, the key is in which data it collects, and which it does not. Wessel explains:

In an era before we could summon a vehicle with the push of a button on our smartphones, humans required a thing called taxis. Taxis, while largely unconnected to the internet or any form of formal computer infrastructure, were actually the big data players in rider identification. Why? The taxi system required a network of eyeballs moving around the city scanning for human-shaped figures with their arms outstretched. While it wasn’t Intel and Hewlett-Packard infrastructure crunching the data, the amount of information processed to get the job done was massive. The fact that the computation happened inside of human brains doesn’t change the quantity of data captured and analyzed. Uber’s elegant solution was to stop running a biological anomaly detection algorithm on visual data — and just ask for the right data to get the job done. Who in the city needs a ride and where are they? That critical piece of information let the likes of Uber, Lyft, and Didi Chuxing revolutionize an industry.

In order for businesses to decide which data is worth their attention, the article suggests three guiding questions: “What decisions drive waste in your business?” “Which decisions could you automate to reduce waste?” (Example—Amazon’s pricing algorithms) and “What data would you need to do so?” (Example—Uber requires data on potential riders’ locations to efficiently send out drivers.) See the article for more notes on each of these guidelines.

Cynthia Murrell, November 30, 2016
Sponsored by ArnoldIT.com, publisher of the CyberOSINT monograph

Do Not Forget to Show Your Work

November 24, 2016

Showing work is messy, necessary step to prove how one arrived at a solution.  Most of the time it is never reviewed, but with big data people wonder how computer algorithms arrive at their conclusions.  Engadget explains that computers are being forced to prove their results in, “MIT Makes Neural Networks Show Their Work.”

Understanding neural networks is extremely difficult, but MIT’s Computer Science and Artificial Intelligence Laboratory (CSAIL) has developed a way to map the complex systems.  CSAIL figured the task out by splitting networks in two smaller modules.  One for extracting text segments and scoring according to their length and accordance and the second module predicts the segment’s subject and attempts to classify them.  The mapping modules sounds almost as complex as the actual neural networks.  To alleviate the stress and add a giggle to their research, CSAIL had the modules analyze beer reviews:

For their test, the team used online reviews from a beer rating website and had their network attempt to rank beers on a 5-star scale based on the brew’s aroma, palate, and appearance, using the site’s written reviews. After training the system, the CSAIL team found that their neural network rated beers based on aroma and appearance the same way that humans did 95 and 96 percent of the time, respectively. On the more subjective field of “palate,” the network agreed with people 80 percent of the time.

One set of data is as good as another to test CSAIL’s network mapping tool.  CSAIL hopes to fine tune the machine learning project and use it in breast cancer research to analyze pathologist data.

Whitney Grace, November 24, 2016
Sponsored by ArnoldIT.com, publisher of the CyberOSINT monograph

In Connected World, Users Are Getting Reared as Slaughter Animals

November 22, 2016

Yahoo, Facebook, Google, WhatsApp, Instagram and Microsoft all have one thing in common; for any service that they provide for free, they are harnessing your private data to be sold to advertisers.

Mirror UK recently published an Op-Ed titled Who Is Spying on You? What Yahoo Hack Taught Us About Facebook, Google, and WhatsApp in which the author says:

Think about this for a second. All those emails you’ve written and received with discussions about politics and people that were assumed to be private and meant as inside jokes for you and your friends were being filtered through CIA headquarters. Kind of makes you wonder what you’ve written in the past few years, doesn’t it?

The services be it free email or free instant messaging have been designed and developed in such a way that the companies that own them end up with a humongous amount of information about its users. This data is sugarcoated and called as Big Data. It is then sold to advertisers and marketers who in the garb of providing immersive and customized user experience follow every click of yours online. This is akin to rearing animals for slaughtering them later.

The data is not just for sale to the corporates; law enforcement agencies can snoop on you without any warrants. As pointed out in the article:

While hypocritical in many ways, these tech giants are smart enough to know who butters their bread and that the perception of trust outweighs the reality of it. But isn’t it the government who ultimately ends up with the data if a company is intentionally spying on us and building a huge record about each of us?

None of the tech giants accept this fact, but most are selling your data to the government, including companies like Samsung that are into the hardware business.

Is there are a way that can help you evade this online snooping? Probably no if you consider mainstream services and social media platforms. Till then, if you want to stay below the radar, delete your accounts and data on all mainstream email service providers, instant messaging apps, service providing websites and social media platform.

Vishal Ingole, November 22, 2016
Sponsored by ArnoldIT.com, publisher of the CyberOSINT monograph

Big Data Teaches Us We Are Big Paranoid

November 18, 2016

I love election years!  Actually, that is sarcasm.  Election years bring out the worst in Americans.  The media runs rampant with predictions that each nominee is the equivalent of the anti-Christ and will “doom America,” “ruin the nation,” or “destroy humanity.”  The sane voter knows that whoever the next president is will probably not destroy the nation or everyday life…much.  Fear, hysteria, and paranoia sells more than puff pieces and big data supports that theory.  Popular news site Newsweek shares that, “Our Trust In Big Data Shows We Don’t Trust Ourselves.”

The article starts with a new acronym: DATA.  It is not that new, but Newsweek takes a new spin on it.  D means dimensions or different datasets, the ability to combine multiple data streams for new insights.  A is for automatic, which is self-explanatory.  T stands for time and how data is processed in real time.  The second A is for artificial intelligence that discovers all the patterns in the data.

Artificial intelligence is where the problems start to emerge.  Big data algorithms can be unintentionally programmed with bias.  In order to interpret data, artificial intelligence must learn from prior datasets.  These older datasets can show human bias, such as racism, sexism, and socioeconomic prejudices.

Our machines are not as objectives as we believe:

But our readiness to hand over difficult choices to machines tells us more about how we see ourselves.

Instead of seeing a job applicant as a person facing their own choices, capable of overcoming their disadvantages, they become a data point in a mathematical model. Instead of seeing an employer as a person of judgment, bringing wisdom and experience to hard decisions, they become a vector for unconscious bias and inconsistent behavior.  Why do we trust the machines, biased and unaccountable as they are? Because we no longer trust ourselves.”

Newsweek really knows how to be dramatic.  We no longer trust ourselves?  No, we trust ourselves more than ever, because we rely on machines to make our simple decisions so we can concentrate on more important topics.  However, what we deem important is biased.  Taking the Newsweek example, what a job applicant considers an important submission, a HR representative will see as the 500th submission that week.  Big data should provide us with better, more diverse perspectives.

Whitney Grace, November 18, 2016
Sponsored by ArnoldIT.com, publisher of the CyberOSINT monograph

Most Dark Web Content Is Legal and Boring

November 15, 2016

Data crunching done by an information security firm reveals that around 55% is legal and mundane like the clear or Open Web.

Digital Journal, which published the article Despite its Nefarious Reputation, New Report Finds Majority of Activity on the Dark Web is Totally Legal and Mundane, says that:

What we’ve found is that the dark web isn’t quite as dark as you may have thought,” said Emily Wilson, Director of Analysis at Terbium Labs. “The vast majority of dark web research to date has focused on illegal activity while overlooking the existence of legal content. We wanted to take a complete view of the dark web to determine its true nature and to offer readers of this report a holistic view of dark web activity — both good and bad.

The findings have been curated in a report The Truth About the Dark Web: Separating Fact from Fiction that puts the Dark Web in a new light. According to this report, around 55% of the content on Dark Web is legal; porn makes 7% of content on Dark Web, and most of it is legal. Drugs though is a favorite topic, only 45% of the content related to it can be termed as illegal. Fraud, extremism and illegal weapons trading on the other hand just make 5-7% of Dark Web.

The research methodology was done using a mix of machine intelligence and human intelligence, as pointed out in the article:

Conducting research on the dark web is a difficult task because the boundaries between categories are unclear,” said Clare Gollnick, Chief Data Scientist at Terbium Labs. “We put significant effort into making sure this study was based on a representative, random sample of the dark web. We believe the end result is a fair and comprehensive assessment of dark web activity, with clear acknowledgment of the limitations involved in both dark web data specifically and broader limitations of data generally.

Dark Web slowly is gaining traction as users of Open Web are finding utilities on this hidden portion of the Internet. Though the study is illuminating indeed, it fails to address how much of the illegal activity or content on Dark Web affects the real world. For instance, what quantity of drug trade takes place over Dark Web. Any answers?

Vishal Ingole, November  15, 2016
Sponsored by ArnoldIT.com, publisher of the CyberOSINT monograph

Genetics Are Biased

November 4, 2016

DNA does not lie. DNA does not lie if conducted accurately and by an experienced geneticist.  Right now it is popular for people to get their DNA tested to discover where their ancestors came from.  Many testers are surprised when they receive their results, because they learn their ancestors came from unexpected places.  Black Americans are eager to learn about the genetics, due to their slave ancestry and lack of familial records.  For many Black Americans, DNA is the only way they can learn where their roots originated, but Africa is not entirely cataloged.

According to Science Daily’s article “Major Racial Bias Found In Leading Genomics Database,” if you have African ancestry and get a DNA test it will be difficult to pinpoint your results.  The two largest genomics databases that geneticists refer to contain a measurable bias to European genes.  From a logical standpoint, this is understandable as Africa has the largest genetic diversity and remains a developing continent without the best access to scientific advances.  These provide challenges for geneticists as they try to solve the African genetic puzzle.

It also weighs heavily on black Americans, because they are missing a significant component in their genetic make-up they can reveal vital health information.  Most black Americans today contain a percentage of European ancestry.  While the European side of their DNA can be traced, their African heritage is more likely to yield clouded results.  On a financial scale, it is more expensive to test black Americans genetics due to the lack of information and the results are still not going to be as accurate as a European genome.

This groundbreaking research by Dr. O’Connor and his team clearly underscores the need for greater diversity in today’s genomic databases,’ says UM SOM Dean E. Albert Reece, MD, PhD, MBA, who is also Vice President of Medical Affairs at the University of Maryland and the John Z. and Akiko Bowers Distinguished Professor at UM SOM. ‘By applying the genetic ancestry data of all major racial backgrounds, we can perform more precise and cost-effective clinical diagnoses that benefit patients and physicians alike.

While Africa is a large continent, the Human Genome Project and other genetic organizations should apply for grants that would fund a trip to Africa.  Geneticists and biologists would then canvas Africa, collect cheek swabs from willing populations, return with the DNA to sequence, and add to the database.  Would it be expensive?  Yes, but it would advance medical knowledge and reveal more information about human history.  After all, we all originate from Mother Africa.

Whitney Grace, November 4, 2016
Sponsored by ArnoldIT.com, publisher of the CyberOSINT monograph

Be Prepared for Foggy Computing

October 31, 2016

Cloud computing allows users to access their files or hard drive from multiple devices at multiple locations.  Fog computing, on the other hand, is something else entirely.  Fog computing is the latest buzzword in the tech world and pretty soon it will be in the lexicon.  If you are unfamiliar with fog computing, read Forbes’s article, “What Is Fog Computing? And Why It Matters In Our Big Data And IoT World.”

According to the article, smartphones are “smart” because they receive and share information with the cloud.  The biggest problem with cloud computing is bandwidth, slow Internet speeds.  The United States is 35th in the world for bandwidth speed, which is contrary to the belief that it is the most advanced country in the world.  Demand for faster speeds increases every day.  Fog computing also known as edge computing seeks to resolve the problem by grounding data.  How does one “ground” data?

What if the laptop could download software updates and then share them with the phones and tablets? Instead of using precious (and slow) bandwidth for each device to individually download the updates from the cloud, they could utilize the computing power all around us and communicate internally.

Fog computing makes accessing data faster, more efficient, and more reliably from a local area rather than routing to the cloud and back.  IBM and Cisco Systems are developing projects that would push computing to more local areas, such as a router, devices, and sensors.

Considering that there are security issues with housing data on a third party’s digital storage unit, it would be better to locate a more local solution.  Kind of like back in the old days, when people housed their data on CPUs.

Whitney Grace, October 31, 2016
Sponsored by ArnoldIT.com, publisher of the CyberOSINT monograph

Machine Learning Changes the Way We Learn from Data

October 26, 2016

The technology blog post from Danial Miessler titled Machine Learning is the New Statistics strives to convey a sense of how crucial Machine Learning has become in terms of how we gather information about the world around us. Rather than dismissing Machine Learning as a buzzword, the author heralds Machine Learning as an advancement in our ability to engage with the world around us. The article states,

So Machine Learning is not merely a new trick, a trend, or even a milestone. It’s not like the next gadget, instant messaging, or smartphones, or even the move to mobile. It’s nothing less than a foundational upgrade to our ability to learn about the world, which applies to nearly everything else we care about. Statistics greatly magnified our ability to do that, and Machine Learning will take us even further.

The article breaks down the steps of our ability to analyze our own reality, moving from randomly explaining events, to explanations based on the past, to explanations based on comparisons with numerous trends and metadata. The article positions Machine Learning as the next step, involving an explanation that compares events but simultaneously progresses the comparison by coming up with new models. The difference is of course that Machine Learning offers the ability of continuous model improvement. If you are interested, the blog also offers a Machine Learning Primer.

Chelsea Kerwin, October 26, 2016
Sponsored by ArnoldIT.com, publisher of the CyberOSINT monograph

What Lurks in the Dark Web?

October 20, 2016

Organizations concerned about cyber security can effectively thwart any threats conditionally they know a threat is lurking in the dark. An Israeli SaaS-based startup claims it can bridge this gap by offering real-time analysis of data on Dark Web.

TechCrunch in an article Sixgill claims to crawl the Dark Web to detect future cybercrime says:

Sixgill has developed proprietary algorithms and tech to connect the Dark Web’s dots by analyzing so-called “big data” to create profiles and patterns of Dark Web users and their hidden social networks. It’s via the automatic crunching of this data that the company claims to be able to identify and track potential hackers who may be planning malicious and illegal activity.

By analyzing the data, Sixgill claims that it can identify illegal marketplaces, data leaks and also physical attacks on organizations using its proprietary algorithms. However, there are multiple loopholes in this type of setup.

First, some Dark Web actors can easily insert red herrings across the communication channels to divert attention from real threats. Second, the Dark Web was created by individuals who wished to keep their communications cloaked. Mining data, crunching it through algorithms would not be sufficient enough to keep organizations safe. Moreover, AI can only process data that has been mined by algorithms, which is many cases can be false. TOR is undergoing changes to increase the safeguards in place for its users. What’s beginning is a Dark Web arms race. A pattern of compromise will be followed by hardening. Then compromise will occur and the Hegelian cycle repeats.

Vishal Ingole, October 20, 2016
Sponsored by ArnoldIT.com, publisher of the CyberOSINT monograph

Next Page »

  • Archives

  • Recent Posts

  • Meta