Apache Sparking Big Data
April 3, 2015
Apache Spark is an open source cluster computing framework that rivals MapReduce. Venture Beat says that people did not pay that much attention to Apache Spark when it was first invented at University of California’s AMPLAB in 2011. The article, “How An Early Bet On Apache Spark Paid Off Big” reports the big data open source supporters are adopting Apache Spark, because of its superior capabilities.
People with big data plans want systems that process real-time information at a fast pace and they want a whole lot of it done at once. MapReduce can do this, but it was not designed for it. It is all right for batch processing, but it is slow and much to complex to be a viable solution.
“When we saw Spark in action at the AMPLab, it was architecturally everything we hoped it would be: distributed, in-memory data processing speed at scale. We recognized we’d have to fill in holes and make it commercially viable for mainstream analytics use cases that demand fast time-to-insight on hordes of data. By partnering with AMPLab, we dug in, prototyped the solution, and added the second pillar needed for next-generation data analytics, a simple to use front-end application.”
ClearStory Data was built using Apache Spark to access data quickly, deliver key insights, and making the UI very user friendly. People who use Apache Spark want information immediately to be utilized for profit from a variety of multiple sources. Apache Spark might ignite the fire for the next wave of data analytics for big data.
Whitney Grace, April 3, 2015
Stephen E Arnold, Publisher of CyberOSINT at www.xenky.com
EBay Develops Open Source Pulsar for Real Time Data Analysis
April 2, 2015
A new large-scale, real-time analytics platform has been launched in response to one huge company’s huge data needs. VentureBeat reports, “EBay Launches Pulsar, an Open-Source Tool for Quickly Taming Big Data.” EBay has made the code available under an open-source license. It seems traditional batch processing systems, like that found in the widely used open-source Hadoop, just won’t cut it for eBay. That puts them in good company; Google, Microsoft, Twitter, and LinkedIn have each also created their own stream-processing systems.
Shortly before the launch, eBay released a whitepaper on the project, “Pulsar—Real-time Analytics at Scale.” It describes the what and why behind Pulsar’s design; check it out for the technical details. The whitepaper summarizes itself:
“In this paper we have described the data and processing model for a class of problems related to user behavior analytics in real time. We describe some of the design considerations for Pulsar. Pulsar has been in production in the eBay cloud for over a year. We process hundreds of thousands of events/sec with a steady state loss of less than 0.01%. Our pipeline end to end latency is less than a hundred milliseconds measured at the 95th percentile. We have successfully operated the pipeline over this time at 99.99% availability. Several teams within eBay have successfully built solutions leveraging our platform, solving problems like in-session personalization, advertising, internet marketing, billing, business monitoring and many more.”
For updated information on Pulsar, monitor their official website at gopulsar.io.
Cynthia Murrell, April 2, 2015
Stephen E Arnold, Publisher of CyberOSINT at www.xenky.com

