IBM Thinks Big on Data Unification

December 7, 2016

So far, the big data phenomenon has underwhelmed. We have developed several good ways to collect, store, and analyze data. However, those several ways have resulted in separate, individually developed systems that do not play well together. IBM hopes to fix that, we learn from “IBM Announces a Universal Platform for Data Science” at Forbes. They call the project the Data Science Experience. Writer Greg Satell explains:

Consider a typical retail enterprise, which has separate operations for purchasing, point-of-sale, inventory, marketing and other functions. All of these are continually generating and storing data as they interact with the real world in real time. Ideally, these systems would be tightly integrated, so that data generated in one area could influence decisions in another.

The reality, unfortunately, is that things rarely work together so seamlessly. Each of these systems stores information differently, which makes it very difficult to get full value from data. To understand how, for example, a marketing campaign is affecting traffic on the web site and in the stores, you often need to pull it out of separate systems and load it into excel sheets.

That, essentially, has been what’s been holding data science back. We have the tools to analyze mountains of data and derive amazing insights in real time. New advanced cognitive systems, like Watson, can then take that data, learn from it and help guide our actions. But for all that to work, the information has to be accessible.”

The article acknowledges that progress that has been made in this area, citing the open-source Hadoop and its OS, Spark, for their ability to tap into clusters of data around the world and analyze that data as a single set. Incompatible systems, however, still vex many organizations.

The article closes with an interesting observation—that many business people’s mindsets are stuck in the past. Planning far ahead is considered prudent, as is taking ample time to make any big decision. Technology has moved past that, though, and now such caution can render the basis for any decision obsolete as soon as it is made. As Satell puts it, we need “a more Bayesian approach to strategy, where we don’t expect to predict things and be right, but rather allow data streams to help us become less wrong over time.” Can the humans adapt to this way of thinking? It is reassuring to have a plan; I suspect only the most adaptable among us will feel comfortable flying by the seat of our pants.

Cynthia Murrell, December 7, 2016

Google Cloud, Azure, and AWS Differences

October 18, 2016

With so many options for cloud computing, it can be confusing about which one to use for your personal or business files.  Three of the most popular cloud computing options are Amazon Web Services (AWS), Google Cloud Platform, and Microsoft Azure.  Beyond the pricing, the main differences range from what services they offer and what they name them.  Site Point did us a favor with its article comparing the different cloud services: “A Side-By-Side Comparison Of AWS, Google Cloud, And Azure.”

Cloud computing has the great benefit of offering flexible price options, but they can often can very intricate based on how much processing power you need, how many virtual servers you deploy, where they are deployed, etc.  AWS, Azure, and Google Cloud do offer canned solutions along with individual ones.

AWS has the most extensive service array, but they are also the most expensive.  It is best to decide how you want to use cloud computing because prices will vary based on the usage and each service does have specializations.  All three are good for scalable computing on demand, but Google is less flexible in its offering, although it is easier to understand the pricing.  Amazon has the most robust storage options.

When it comes to big data:

This requires very specific technologies and programming models, one of which is MapReduce, which was developed by Google, so maybe it isn’t surprising to see Google walking forward in the big data arena by offering an array of products — such as BigQuery (managed data warehouse for large-scale data analytics), Cloud Dataflow (real-time data processing), Cloud Dataproc (managed Spark and Hadoop), Cloud Datalab (large-scale data exploration, analysis, and visualization), Cloud Pub/Sub (messaging and streaming data), and Genomics (for processing up to petabytes of genomic data). Elastic MapReduce (EMR) and HDInsight are Amazon’s and Azure’s take on big data, respectively.

Without getting too much into the nitty gritty, each of the services have their strengths and weaknesses.  If one of the canned solutions do not work for you, read the fine print to learn how cloud computing can help your project.

Whitney Grace, October 18, 2016
Sponsored by ArnoldIT.com, publisher of the CyberOSINT monograph

Big Data Is Just a Myth

August 1, 2016

Remember in the 1979 hit The Muppet Movie there was a running gag where Kermit the Frog kept saying, “It’s a myth.  A myth!”  Then a woman named Myth would appear out of nowhere and say, “Yes?”  It was a funny random gag, but while it is a myth that frogs give warts, most of the myths related to big data may or not be.  Data Science Central decided to explain some of the myths in, “Debunking The 68 Most Common Myths About Big Data-Part 2.”

Some of the prior myths debunked in the first part were that big data was the newest power word, an end all solution for companies, only meant for big companies, and that it was complicated and expensive.  In truth, anyone can benefit from big data with a decent implementation plan and with someone who knows how to take charge of it.

Big data, in fact, can be integrated with preexisting systems, although it takes time and knowledge to link the new and the old together (it is not as difficult as it seems).  Keeping on that same thought, users need to realize that there is not a one size fits all big data solution.  Big data is a solution that requires analytical, storage, and other software.  It cannot be purchased like other proprietary software and it needs to be individualized for each organization.

One myth that is has converted into truth is that big data relies on Hadoop storage.  It used to be Hadoop  managed a market of many, but bow it is an integral bit of software needed to get the big data job done.  One of the most prevalent myths is it only belongs in the IT department:

“Here’s the core of the issue.  Big Data gives companies the greatly enhanced ability to reap benefits from data-driven insights and to make better decisions.  These are strategic issues.

You know who is most likely to be clamoring for Big Data?  Not IT.  Most likely it’s sales, marketing, pricing, logistics, and production forecasting.  All areas that tend to reap outsize rewards from better forward views of the business.”

Big data is becoming more of an essential tool for organizations in every field as it tells them more about how they operate and their shortcomings.  Big data offers a very detailed examination of these issues; the biggest issue users need to deal with is how they will use it?

 

Whitney Grace, August 1, 2016
Sponsored by ArnoldIT.com, publisher of the CyberOSINT monograph

Hewlett Packard Makes Haven Commercially Available

July 19, 2016

The article InformationWeek titled HPE’s Machine Learning APIs, MIT’s Sports Analytics Trends: Big Data Roundup analyzes Haven OnDemand, a large part of Hewlett Packard Enterprise’s big data strategy. For a look at the smart software coming out of HP Enterprise, check out this video. The article states,

“HPE’s announcement this week brings HPE Haven OnDemand as a service on the Microsoft Azure platform and provides more than 60 APIs and services that deliver deep learning analytics on a wide range of data, including text, audio, image, social, Web, and video. Customers can start with a freemium service that enables development and testing for free, and grow into a usage and SLA-based commercial model for enterprises.”

You may notice from the video that the visualizations look a great deal like Autonomy IDOL’s visualizations from the early 2000s. That is, dated, especially when compared to visualizations from other firms. But Idol may have a new name: Haven. According to the article, that name is actually a relaxed acronym for Hadoop, Autonomy IDOL, HP Vertica, Enterprise Security Products, and “n” or infinite applications. HPE promises that this cloud platform with machine learning APIs will assist companies in growing mobile and enterprise applications. The question is, “Can 1990s technology provide what 2016 managers expects?”

 

Chelsea Kerwin, July 19, 2016

Sponsored by ArnoldIT.com, publisher of the CyberOSINT monograph

There is a Louisville, Kentucky Hidden Web/Dark
Web meet up on July 26, 2016.
Information is at this link: http://bit.ly/29tVKpx.

Enterprise Search Vendor Sinequa Partners with MapR

June 8, 2016

In the world of enterprise search and analytics, everyone wants in on the clients who have flocked to Hadoop for data storage. Virtual Strategy shared an article announcing Sinequa Collaborates With MapR to Power Real-Time Big Data Search and Analytics on Hadoop. A firm specializing in big data, Sinequa, has become certified with the MapR Converged Data Platform. The interoperation of Sinequa’s solutions with MapR will enable actionable information to be gleaned from data stored in Hadoop. We learned,

“By leveraging advanced natural language processing along with universal structured and unstructured data indexing, Sinequa’s platform enables customers to embark on ambitious Big Data projects, achieve critical in-depth content analytics and establish an extremely agile development environment for Search Based Applications (SBA). Global enterprises, including Airbus, AstraZeneca, Atos, Biogen, ENGIE, Total and Siemens have all trusted Sinequa for the guidance and collaboration to harness Big Data to find relevant insight to move business forward.”

Beyond all the enterprise search jargon in this article, the collaboration between Sinequa and MapR appears to offer an upgraded service to customers. As we all know at this point, unstructured data indexing is key to data intake. However, when it comes to output, technological solutions that can support informed business decisions will be unparalleled.

 

Megan Feil, June 8, 2016

Sponsored by ArnoldIT.com, publisher of the CyberOSINT monograph

 

The Pros and Cons of Data Silos When It Comes to Data Analysis and Management

February 22, 2016

The article on Informatica Blog titled Data Silos Are the Death of Analytics. Here’s the Fix explores the often overlooked need for a thorough data management vision and strategy at any competitive business. The article is plugging for an eBook guide to data analytics, but it does go into some detail on the early stages of streamlining the data management approach, summarized by the advice to avoid data silos. The article explains,

“It’s vital to pursue a data management architecture that works across any type of data, BI tool, or storage technology. If the move to add Hadoop or NoSQL demands entirely different tools to manage the data, you’re at risk of creating another silo…When you’ve got different tools for your traditional data warehouse versus your cloud setup, and therefore different skill sets to hire for, train for, and maintain, you’re looking at a real mess.”

The suggestions for streamlined processes and analysis certainly make sense, but the article does not defend the reasonable purposes of data silos, such as power, control, and secrecy. Nor do they consider that in some cases a firm is required to create data silos to comply with a government contract. But it is a nice thought: one big collection of data, one comprehensive data strategy. Maybe.

 
Chelsea Kerwin, February 22, 2016

Sponsored by ArnoldIT.com, publisher of the CyberOSINT monograph

Hello, Big Algorithms

January 15, 2016

The year had barely started and it looks lime we already have a new buzzword to nestle into our ears: big algorithms.  The term algorithm has been tossed around with big data as one of the driving forces behind powerful analytics.  Big data is an encompassing term that refers to privacy, security, search, analytics, organization, and more.  The real power, however, lies in the algorithms.  Benchtec posted the article, “Forget Big Data-It’s Time For Big Algorithms” to explain how algorithms are stealing the scene.

Data is useless unless you are able to are pull something out of it.  The only way get the meat off the bone is to use algorithms.  Algorithms might be the powerhouses behind big data, but they are not unique.  The individual data belonging to different companies.

“However, not everyone agrees that we’ve entered some kind of age of the algorithm.  Today competitive advantage is built on data, not algorithms or technology.  The same ideas and tools that are available to, say, Google are freely available to everyone via open source projects like Hadoop or Google’s own TensorFlow…infrastructure can be rented by the minute, and rather inexpensively, by any company in the world. But there is one difference.  Google’s data is theirs alone.”

Algorithms are ingrained in our daily lives from the apps run on smartphones to how retailers gather consumer detail.  Algorithms are a massive untapped market the article says.  One algorithm can be manipulated and implemented for different fields.  The article, however, ends on some socially conscious message about using algorithms for good not evil.  It is a good sentiment, but kind of forced here, but it does spur some thoughts about how algorithms can be used to study issues related to global epidemics, war, disease, food shortages, and the environment.

Whitney Grace, January 15, 2016
Sponsored by ArnoldIT.com, publisher of the CyberOSINT monograph

Use the Sentiment Analysis Luke

December 22, 2015

The newest Star Wars film is out in theaters and any credible Star Wars geek has probably seen the film at least twice.  One theme that continues to be prevalent in the franchise is the use of the mystical, galactic power the Force.  The Force gives the Jedi special powers, such as the ability to read a person’s mind.  Computer Weekly says that data will be able to do the same thing in: “Sentiment Analysis With Hadoop: 5 Steps To Becoming A Mind Reader.”

While the article title reads more like a kit on how to became a psychic cheat, sentiment analysis has proven to predict a person’s actions, especially their shopping habits.  Sentiment analysis is a huge market for companies wanting to learn how to reach their shoppers on a more intimate level, predict trends before they happen, and connect with shoppers in real-time.  Apache Hadoop is a tool used to harness the power of data to make anyone with the right knowledge a mind reader and Twitter is one of the tools used.

First-data is collect, second-label data to create a data dictionary with positive or negative annotations, third-run analytics, fourth-run through a beta phase, and fifth-get the insights. While it sounds easy, the fourth step is going to be the biggest hassle:

“Remember that analytic tools that just look for positive or negative words can be entirely misleading if they miss important context. Typos, intentional misspellings, emoticons and jargon are just few additional obstacles in the task.

Computers also don’t understand sarcasm and irony and as a general rule are yet to develop a sense of humor. Too many of these and you will lose accuracy. It is probably best to address this point by fine-tuning your model.”

The purpose of sentiment analysis is teaching software how to “think” like a human and understand all our illogical ways.  (Hmm…that was a Star Trek reference, whoops!)  Hadoop Apache might not have light sabers or help you find droids, but it does offer to help understand consumers spending habits.   So how about, “These are the greenbacks you have been looking for.”

Whitney Grace, December 22, 2015
Sponsored by ArnoldIT.com, publisher of the CyberOSINT monograph

Old School Mainframes Still Key to Big Data

December 17, 2015

According to ZDNet, “The Ultimate Answer to the Handling of Big Data: The Mainframe.” Believe it or not, a recent survey of 187 IT pros from Syncsort found the mainframe to be the important to their big data strategy. IBM has even created a Hadoop-capable mainframe. Reporter Ken Hess lists some of the survey’s findings:

*More than two-thirds of respondents (69 percent) ranked the use of the mainframe for performing large-scale transaction processing as very important

*More than two-thirds (67.4 percent) of respondents also pointed to integration with other standalone computing platforms such as Linux, UNIX, or Windows as a key strength of mainframe

*While the majority (79 percent) analyze real-time transactional data from the mainframe with a tool that resides directly on the mainframe, respondents are also turning to platforms such as Splunk (11.8 percent), Hadoop (8.6 percent), and Spark (1.6 percent) to supplement their real-time data analysis […]

*82.9 percent and 83.4 percent of respondents cited security and availability as key strengths of the mainframe, respectively

*In a weighted calculation, respondents ranked security and compliance as their top areas to improve over the next 12 months, followed by CPU usage and related costs and meeting Service Level Agreements (SLAs)

*A separate weighted calculation showed that respondents felt their CIOs would rank all of the same areas in their top three to improve

Hess goes on to note that most of us probably utilize mainframes without thinking about it; whenever we pull cash out of an ATM, for example. The mainframe’s security and scalability remain unequaled, he writes, by any other platform or platform cluster yet devised. He links to a couple of resources besides the Syncsort survey that support this position: a white paper from IBM’s Big Data & Analytics Hub and a report from research firm Forrester.

 

Cynthia Murrell, December 17, 2015

Sponsored by ArnoldIT.com, publisher of the CyberOSINT monograph

 

Apple May Open up on Open Source

October 27, 2015

Is Apple ready to openly embrace open source? MacRumors reports, “Apple Building Unified Cloud Platform for iCloud, iTunes, Siri and More.” Writer Joe Rossignol cites a new report from the Information that indicates the famously secret company may be opening up to keep up with the cloudy times. He writes:

“The new platform is based on Siri, which itself is powered by open source infrastructure software called Mesos on the backend, according to the report. Apple is reportedly placing more emphasis on open source software in an attempt to attract open source engineers that can help improve its web services, but it remains to be seen how far the company shifts away from its deep culture of secrecy.

“The paywalled report explains how Apple is slowly embracing the open source community and becoming more transparent about its open source projects. It also lists some of the open source technologies that Apple uses, including Hadoop, HBase, Elasticsearch, Reak, Kafka, Azkaban and Voldemort.”

Rossignol goes on to note that, according to Bloomberg, Apple is working on a high-speed content delivery network and upgrading data centers to better compete with its rivals in the cloud, like Amazon, Google, and Microsoft. Will adjusting its stance on open-source allow it to keep up?

Cynthia Murrell, October 27, 2015

Sponsored by ArnoldIT.com, publisher of the CyberOSINT monograph

Next Page »

  • Archives

  • Recent Posts

  • Meta