Hadoop Rounds Up Open Source Goodies

July 17, 2015

Summer time is here and what better way to celebrate the warm weather and fun in the sun than with some fantastic open source tools.  Okay, so you probably will not take your computer to the beach, but if you have a vacation planned one of these tools might help you complete your work faster so you can get closer to that umbrella and cocktail.  Datamation has a great listicle focused on “Hadoop And Big Data: 60 Top Open Source Tools.”

Hadoop is one of the most adopted open source tool to provide big data solutions.  The Hadoop market is expected to be worth $1 billion by 2020 and IBM has dedicated 3,500 employees to develop Apache Spark, part of the Hadoop ecosystem.

As open source is a huge part of the Hadoop landscape, Datamation’s list provides invaluable information on tools that could mean the difference between a successful project and failed one.  Also they could save some extra cash on the IT budget.

“This area has a seen a lot of activity recently, with the launch of many new projects. Many of the most noteworthy projects are managed by the Apache Foundation and are closely related to Hadoop.”

Datamation has maintained this list for a while and they update it from time to time as the industry changes.  The list isn’t sorted on a comparison scale, one being the best, rather they tools are grouped into categories and a short description is given to explain what the tool does. The categories include: Hadoop-related tools, big data analysis platforms and tools, databases and data warehouses, business intelligence, data mining, big data search, programming languages, query engines, and in-memory technology.  There is a tool for nearly every sort of problem that could come up in a Hadoop environment, so the listicle is definitely worth a glance.

Whitney Grace, July 17, 2015
Sponsored by ArnoldIT.com, publisher of the CyberOSINT monograph

 

How Not to Drive Users Away from a Website

July 15, 2015

Writer and web psychologist Liraz Margalit at the Next Web has some important advice for websites in “The Psychology Behind Web Browsing.” Apparently, paying attention to human behavioral tendencies can help webmasters avoid certain pitfalls that could damage their brands. Imagine that!

The article cites a problem an unspecified news site encountered when it tried to build interest in its videos by making them play automatically when a user navigated to their homepage. I suspect I know who they’re talking about, and I recall thinking at the time, “how rude!” I thought it was just because I didn’t want to be chastised by people near me for suddenly blaring a news video. According to Margalit, though, my problem goes much deeper: It’s an issue of control rooted in pre-history. She writes:

“The first humans had to be constantly on alert for changes in their environment, because unexpected sounds or sights meant only one thing: danger. When we click on a website hoping to read an article and instead are confronted with a loud, bright video, the automatic response is not so different from that our prehistoric ancestors, walking in the forest and stumbling upon a bear or a saber-toothed hyena.”

This need for safety has morphed into a need for control; we do not like to be startled or lost. When browsing the Web, we want to encounter what we expect to encounter (perhaps not in terms of content, but certainly in terms of format.) The name for this is the “expectation factor,” and an abrupt assault on the senses is not the only pitfall to be avoided. Getting lost in an endless scroll can also be disturbing; that’s why those floating menus, that follow you as you move down the page, were invented. Margalit  notes:

“Visitors like to think they are in charge of their actions. When a video plays without visitors initiating any interaction, they feel the opposite. If a visitor feels that a website is trying to ‘sell’ them something, or push them into viewing certain content without permission, they will resist by trying to take back the interaction and intentionally avoid that content.”

And that, of course, is the opposite of what websites want, so giving users the control they expect is a smart business move. Besides, it’s only polite to ask before engaging a visitor’s Adobe Flash or, especially, speakers.

Cynthia Murrell, July 15, 2015

Sponsored by ArnoldIT.com, publisher of the CyberOSINT monograph

Short Honk: Saudi Supercomputer

July 14, 2015

In order to crunch text and do large scale computations, a fast computer is a useful tool. Engineering & Technology Magazine reported in “Saudi Machine Makes It on to World’s Top Ten Supercomputer List”:

The Shaheen II is the first supercomputer based in the Middle East to enter the world’s top ten list, debuting at number seven. The Saudi supercomputer is based at King Abdullah University of Science and Technology and is the seventh most powerful computer on the planet, according to the Top 500 organization that monitors high-performance machines. China’s Tianhe-2 kept its position as the most powerful supercomputer in the world in the latest rankings.

If you are monitoring the supercomputer sector, this announcement, if accurate, is important in my opinion. There are implications for content processing, social graph generation, and other interesting applications.

Stephen E Arnold, July 14, 2015

Algorithmic Art Historians

July 14, 2015

Apparently, creativity itself is no longer subjective. MIT Technology Review announces, “Machine Vision Algorithm Chooses the Most Creative Paintings in History.” Traditionally, art historians judge how creative a work is based on its novelty and its influence on subsequent artists. The article notes that this is a challenging task, requiring an encyclopedic knowledge of art history and the judgement to decide what is novel and what has been influential. Now, a team at Rutgers University has developed an algorithm they say is qualified for the job.

Researchers Ahmed Elgammal and Babak Saleh credit several developments with bringing AI to this point. First, we’ve recently seen several breakthroughs in machine understanding of visual concepts, called classemes. that include recognition of factors from colors to specific objects. Another important factor: there now exist well-populated online artwork databases that the algorithms can, um, study. The article continues:

“The problem is to work out which paintings are the most novel compared to others that have gone before and then determine how many paintings in the future have uses similar features to work out their influence. Elgammal and Saleh approach this as a problem of network science. Their idea is to treat the history of art as a network in which each painting links to similar paintings in the future and is linked to by similar paintings from the past. The problem of determining the most creative is then one of working out when certain patterns of classemes first appear and how these patterns are adopted in the future. …

“The problem of finding the most creative paintings is similar to the problem of finding the most influential person on a social network, or the most important station in a city’s metro system or super spreaders of disease. These have become standard problems in network theory in recent years, and now Elgammal and Saleh apply it to creativity networks for the first time.”

Just what we needed. I have to admit the technology is quite intriguing, but I wonder: Will all creative human endeavors eventually have their algorithmic counterparts and, if so, how will that effect human expression?

Cynthia Murrell, July 14, 2015

Sponsored by ArnoldIT.com, publisher of the CyberOSINT monograph

SAS Explains Big Data. Includes Cartoon, Excludes Information about Cost

July 13, 2015

I know that it is easy to say Big Data. It is easy to say Hadoop. It is easy to make statements in marketing collateral, in speeches, and in blogs written by addled geese. Honk!

 

I wish to point out that any use of these terms in the same sentence require an important catalyst: Money. Money that has been in the words of the government procurement officer, “Allocated, not just budgeted.”

Here are the words:

  1. Big Data
  2. Hadoop
  3. Unstructured data.

Point your monitored browser at “Marketers Ask: What Can Hadoop Do That My Data Warehouse Can’t?” The write up originates with SAS. When a company anchored in statistics, I expect some familiarity with numbers. (yep, just like the class you have blocked from your mind. The mid term? What mid term?)

The write up points out that unstructured data comes in many flavors. This chart, complete with cartoon, identifies 15 content types. I was amazed. Just 15. What about the data in that home brew content management system or tucked in the index of the no longer supported DEC 20 TIPS system. Yes, that data.

image

How does Hadoop deal with the orange and blue? Pretty well but you and the curious marketer must attend to three steps. Count ‘em off, please:

  1. Identify the business issue. I think this means know what problem one is trying to solve. This is a good idea, but I think most marketing problems boil down to generating revenue and proving it to senior management. Marketing looks for silver bullets when the sales are not dropping from the sky like packages for the believers in the Cargo Cult.
  2. Get top management support. Yep, this is a good idea because the catalyst—money—has to be available to clean, acquire, and load the goodies in the blue boxes and the wonky stuff from the home brew CMS.
  3. Develop a multi play plan. I think this means that the marketer has zero clue how complicated the Hadoop magic is. The excitement of extract, transform, and load. The thrill of batch processing awaits. Then the joy of looking at outputs which baffle the marketer more comfortable selecting colors and looking at Adwords’ reports than Hadoop data.

My thought is that SAS understands data, statistical methods, and the reality of a revolution which is taking place without the strictures of SAS approaches.

I do like the cartoon. I do not like the omission of the money part of the task. Doing the orange and blue thing for marketers is expensive. Do the marketers know this?

Nope.

Stephen E Arnold, July 13, 2015

Watson Based Tradeoff Analytics Weighs Options

July 13, 2015

IBM’s Watson now lends its considerable intellect to helping users make sound decisions. In “IBM Watson Tradeoff Analytics—General Availability,” the Watson Developer Community announces that the GA release of this new tool can be obtained through the Watson Developer Cloud platform. The release follows an apparently successful Beta run that began last February. The write-up explains that the tool:

“… Allows you to compare and explore many options against multiple criteria at the same time. This ultimately contributes to a more balanced decision with optimal payoff.

“Clients expect to be educated and empowered: ‘don’t just tell me what to do,’ but ‘educate me, and let me choose.’ Tradeoff Analytics achieves this by providing reasoning and insights that enable judgment through assessment of the alternatives and the consequent results of each choice. The tool identifies alternatives that represent interesting tradeoff considerations. In other words: Tradeoff Analytics highlights areas where you may compromise a little to gain a lot. For example, in a scenario where you want to buy a phone, you can learn that if you pay just a little more for one phone, you will gain a better camera and a better battery life, which can give you greater satisfaction than the slightly lower price.”

For those interested in the technical details behind this Watson iteration, the article points you to Tradeoff Analyticsdocumentation. Those wishing to glimpse the visualization capabilities can navigate to  this demo. The write-up also lists post-beta updates and explains pricing, so check it out for more information.

Cynthia Murrell, July 13, 2015

Sponsored by ArnoldIT.com, publisher of the CyberOSINT monograph

Dealing with Company and Product Identity: Terbium Labs Nails It

July 11, 2015

Navigate to www.terbiumlabs.com and read about the company.

image

Nifty name. Very nifty name indeed. Now, a bit of branding commentary.

I used to work at Halliburton Nuclear. Ah, the good old days of nuclear engineers poking fun at civil engineers and mathematicians not understanding any joke made my the computer engineers.

The problem of naming companies in high technology disciplines is a very big one. Before Halliburton gobbled up the Nuclear Utility Services outfit, the company with more than 400 nuclear engineers on staff struggled with its name. Nuclear Utility Services was abbreviated to NUS. A pretty sharp copywriter named Richard Harrington of the dearly loved Ketchum, McLeod and Gove ad agency came up with this catchy line:

After the EPA, call NUS.

The important point is that Mr. Harrington, a whiz person, wanted to have people read each letter: E-P-A, not say eepa and say N-U-S not say noose. In Japanese, the sound “nus” has a negative meaning usually applied to pressurized body odor emissions. Not good.

Search and content processing vendors struggle with names. I have written about outfits which have fumbled the branding ball. Examples range from Thunderstone which has been usurped by a gaming company. Brainware which has been snagged and used for interesting videos. Smartlogic whose name has been appropriated by a smaller outfit doing marketing/design stuff. There are names which are impossible to find; for example, i2, AMI, and ChaCha to name a few among many.

I want to call attention to a quite useful product naming which I learned about recently. Navigate to TerbiumLabs.com. Consider the word Terbium. Look for the word “Matchlight.”

I find Terbium a darned good word because terbium is an element, which my old (and I mean old) chemistry professor pronounced “ter-beem”). The element has a number of useful applications. Think solid sate devices and as a magic ingredient in some rocket fuels and—okay, okay—some explosives.

But as good as “terbium” is for a company I absolutely delight in this product name:

Matchlight.

Now what’s Matchlight and why should anyone care. My hunch is that the technology which allows a next generation approach to content identification and other functions works to

  • light a match in the wilderness
  • illuminate a dark space
  • start a camp fire so I can cook a goose

You can and should learn more about Terbium Labs and its technology. The names will help you remember.

Important company; important technology. Great name Matchlight. (Hear that search and content processing vendors with dud names?)

Stephen E Arnold, July 11, 2015

Business Intelligence: The Grunt Work? Time for a Latte

July 10, 2015

I read “One Third of BI Pros Spend Up to 90% of Time Cleaning Data.” Well, well, well. Good old and frail eWeek has reported what those involved in data work have known for what? Decades, maybe centuries? The write up states with typical feather duster verbiage:

A recent survey commissioned by data integration platform provider Xplenty indicates that nearly one-third of business intelligence (BI) professionals are little more than “data janitors,” as they spend a majority of their time cleaning raw data for analytics.

What this means is that the grunt work in analytics still has to be done. This is difficult and tedious work even with normalization tools and nifty hand crafted scripts. Who wants to do this work? Not the MBAs who need slick charts to nail their bonus. Not the frantic marketer who has to add some juice to the pale and wan vice president’s talk at the Rotary Club. Not anyone, except those who understand the importance of scrutinizing data.

The write up points out that extract, transform, and load functions or ETL in the jingoism of Sillycon Valley is work. Guess what? The eWeek story uses these words to explain what the grunt work entails:

  • Integrating data from different platforms
  • Transforming data
  • Cleansing data
  • Formatting data.

But here’s the most important item in the article: If the report on which the article is based is correct, 21 percent of the data require special care and feeding. How’s that grab you for a task when you are pumping a terabyte of social media or intercept data a day? Right. Time for a bit of Facebook and a trip to Starbuck’s.

What happens if the data are not ship shape? Well, think about the fine decisions flowing from organizations which are dependent on data analytics. Why not chase down good old United Airlines and ask the outfit if anyone processed log files for the network which effectively grounded all flights? Know anyone at the Office of Personnel Management? You might ask the same question.

Ignoring data or looking at outputs without going through the grunt work is little better than guessing. No, wait. Guessing would probably return better outcomes. Time for some Foosball.

Stephen E Arnold, July 10, 2015

SAS Text Miner Promises Unstructured Insight

July 10, 2015

Big data is tools help organizations analyze more than their old, legacy data.  While legacy data does help an organization study how their process have changed, the data is old and does not reflect the immediate, real time trends.  SAS offers a product that bridges old data with the new as well as unstructured and structured data.

The SAS Text Miner is built from Teragram technology.  It features document theme discovery, a function the finds relations between document collections; automatic Boolean rule generation; high performance text mining that quickly evaluates large document collection; term profiling and trending, evaluates term relevance in a collection and how they are used; multiple language support; visual interrogation of results; easily import text; flexible entity options; and a user friendly interface.

The SAS Text Miner is specifically programmed to discover data relationships data, automate activities, and determine keywords and phrases.  The software uses predictive models to analysis data and discover new insights:

“Predictive models use situational knowledge to describe future scenarios. Yet important circumstances and events described in comment fields, notes, reports, inquiries, web commentaries, etc., aren’t captured in structured fields that can be analyzed easily. Now you can add insights gleaned from text-based sources to your predictive models for more powerful predictions.”

Text mining software reveals insights between old and new data, making it one of the basic components of big data.

Whitney Grace, July 10, 2015

Sponsored by ArnoldIT.com, publisher of the CyberOSINT monograph

Coveo Partners with Etherios on Salesforce Services

July 7, 2015

Professional services firm Etherios is teaming up with Coveo in a joint mission to add even more value to customers’ Salesforce platforms, we learn from “Etherios and Coveo Announce Strategic Alliance” at Yahoo Finance. Etherios is a proud Salesforce Platinum Partner. The press release tells us:

 “Coveo connects information from across a company’s IT ecosystem of record and delivers the knowledge that matters to customers and agents in context. Coveo for Salesforce – Communities Edition helps customers solve their own cases by proactively offering case-resolving knowledge suggestions, and Coveo for Salesforce – Service Cloud Edition allows customer support agents to upskill as they engage customers by injecting case-resolving content and experts into the Salesforce UI as they work.

“Etherios provides customers with consulting and implementation services in the areas of Sales, Customer Service, Field Service and IoT [Internet of Things]. … Etherios capabilities span operational strategy, business process, technical design and implementation expertise.”

 Founded in 2005, Coveo leverages search technology to boost users’ skills, knowledge, and proficiency while supplying tools for collaboration and self-service. The company maintains offices in the U.S. (SanMateo, CA), the Netherlands, and Quebec.

 A division of Digi International, Etherios launched in 2008 specifically to supply cloud-based tools for Salesforce users. They prefer to inhabit the cutting edge, and operate out of Chicago, Dallas, and San Francisco.

 Cynthia Murrell, July 7, 2015

Sponsored by ArnoldIT.com, publisher of the CyberOSINT monograph

« Previous PageNext Page »

  • Archives

  • Recent Posts

  • Meta