Try TextBlob for Sentiment Analysis

November 19, 2013

Sad to say, we have heard rumblings about severe disappointment with Attensity-type and Lexalytics-type sentiment applications. If you want to kick some tires in this interesting search niche, look instead to the open source application TextBlob. OpenShift points out this resource in, “Day 9: TextBlog—Finding Sentiments in Text.” The article is one in an ambitious series by writer Shekhar Gulati, who challenged himself to master one technology a day for a month. Very admirable, sir!

Gulati begins with his experience with sentiment analysis:

“My interest in sentiment analysis is few years old when I wanted to write an application which will process a stream of tweets about a movie, and then output the overall sentiment about the movie. Having this information would help me decide if I wanted to watch a particular movie or not.

“I googled around, and found that Naive Bayes classifier can be used to solve this problem. The only programming language that I knew at the time was Java, so I wrote a custom implementation and used the application for some time. I was lazy to commit the code, so when my machine crashed, I lost the code and application. Now I commit all my code to github, and I have close to 200 public repositories 🙂

“In this blog, I will talk about a Python package called TextBlob which can help developers solve this problem. We will first cover some basics, and then we will develop a simple Flask application which will use the TextBlob API.”

The post does indeed cover the basics, including the installation of Python and virtualenv before we can get going with TextBlob. It then takes us through writing an example application and deploying to the cloud. As he notes above, Gulati has his code safe and sound at Github; the code for this example are posted here, and the js and css files can be found here.

Cynthia Murrell, November 19, 2013

Sponsored by ArnoldIT.com, developer of Augmentext

Oracle Endeca Democratizes Data Discovery

November 19, 2013

Oracle is on a mission to empower its clients with self-service discovery capabilities for enterprise systems. The recent announcement came to us via Market Wired in, “Oracle Democratizes Data Discovery With Oracle Endeca Information Discovery 3.1.” The new Endeca software allows users to maintain enterprise features, while offering high-end collaboration that does not pose a security risk.

The new data discovery platform comes enabled with analytics features that incorporate more information varieties and support better decision-making. The new features that enhance better decision-making are self-service data mashup and discovery dashboards, deep unstructured analysis, enterprise class self-service discovery, enhanced integration with Oracle business intelligence, and Web content integration.

” ‘Data Discovery has been a sea change in the analytics market, driven by a desire to make information more accessible to a broader range of users at all levels of the business. With Oracle’s release of Oracle Endeca Information Discovery 3.1, we anticipate an even more improved ability to spur adoption and improved time to value with enterprise-class self-service discovery,’ said Mark Rittman, chief technical officer, Rittman Mead.”

Oracle will reel in more clients with this platform. There are not any details on how much the licensing fee for “democratized data” will cost. It is easy to surmise that is more expensive than open source alternatives.

Whitney Grace, November 19, 2013

Sponsored by ArnoldIT.com, developer of Augmentext

Autonomy Adds Qfiniti Analytics

November 18, 2013

HP Autonomy is searching for ways (pun intended) to redeem itself after the fiasco from earlier this year. How is the company doing it? HP Autonomy, according to the Silicon Angle article “HP Autonomy Expands Analytics Lineup With Qfiniti 10,” is tapping into the mobile market. The company has expanded its Qfiniti analytics software to now analyze video and social media along with basic and voice and text messaging. The Qfiniti upgrade has a lot more options that make it an attractive solution for customer relations personnel. The IDOL search tool helps users identify patterns in audio, text, and video and the ability to search through old and new unstructured/structured data are tempting to try. The former rather than the latter, of course. The article also mentions how it can be used to keep track of workflows and front and back office processes—something that any enterprise based software usually does.

HP Autonomy is really proud of the new analytics angle and how it can help their clients:

“Rafiq Mohammadi, the general manager of marketing optimization at HP Autonomy, noted that ‘HP Qfiniti 10 closes the loop for the customer contact center, providing a full set of functions to match all requirements. Because HP Qfiniti is a modular platform, customers can cut their total cost of ownership by getting a solution that best fits their needs.’”

We believe that HP Autonomy is moving in the right direction to overcome past obstacles. While reading the article, commercials for the new Nissan Infiniti came to mind. A project manager probably fell asleep while listening to a Nissan commercial and was inspired by the luxury brand. HP Autonomy and Nissan can team up to design the first Big Data compliant car: it drives, saves gas mileage, and provides professional grade analytics as you go.

Whitney Grace, November 18, 2013

Sponsored by ArnoldIT.com, developer of Augmentext

Pretty Graphics For Big Data Experience

November 15, 2013

TechCrunch makes a big deal about this headline: “ClearStory Data Designs An Analytics Platform That Is About The Experience As Much As The Technology.” ClearStory Data is one of the first companies to launch an analytics platform that can offer rich visuals and sharing capabilities. The graphics and sharing come out on the user interface, but behind the pretty graphics and social media graces there is something else.

The article states:

“On the back-end, ClearStory has a platform for integrating a company’s internal and external data using an in-memory database technology, said CEO Sharmila Shahani-Mulligan in a phone interview this week. This can be relational or NoSQL data, point-of-sale information or demographic statistics from external sources. Its advantage is in the ability to process multiple types of data on the fly and then combine that with a modern user interface.”

Not a bad new way to use analytics, especially when the idea behind it is that users will be able to manipulate their data like a story rather than a boring data report. Think about it. What would you rather do, read a griping novel or the latest user agreement for iTunes? Turning shopping or Internet browsing into a story. Maybe this could be a new form of writing or even blogging where social media turns into a giant events catalog of how people shop.

Whitney Grace, November 15, 2013

Sponsored by ArnoldIT.com, developer of Augmentext

The Toy Elephant Hadoop Saves A Real Elephant

November 15, 2013

Hadoop was named after a toy elephant, so it is only appropriate that as a form of charity the company is donating money to saving elephants from poachers. nature and technology have often been perceived to be at odds with one another, constantly battling for dominance over the planet. Technology can save nature and analytical data techniques have been used to solve problems according to the recent Gigaom article, “Buy Datameer’s Hadoop Application, Save An Elephant.”

The article states:

“We’ve written before about applying big data techniques to help solve societal problems, and now we have a case of applying the revenue from big data software sales directly to a cause. In this case Datameer, a startup that applies a spreadsheet interface to Hadoop, is selling a “charity edition” of its product for $49 and donating all the proceeds during the month of November to a conservation charity called Pro Wildlife.”

Some cynics may view this gesture as a marketing ploy to buy a product that meant to solve the big data problem. (Actually, it only allows users to download to a single desktop and analysis 10 GB, so it is more like big data for the single data-obsessed user). On the bright side, you get to help save the largest, living land mammal. Who does not like elephants?

Whitney Grace, November 15, 2013

Sponsored by ArnoldIT.com, developer of Augmentext

Neuroscientists Advance Predictive Analytics

November 12, 2013

Where do the fields of neuroscience and predictive analytics intersect? Apparently at the University of Sussex. Phys.org reveals, “Scientists Identify a Mathematical ‘Crystal Ball’ that May Predict Calamities.” It makes sense when you consider that both disciplines deal with complex systems.

In systems ranging in scale from the planet’s climate to an epileptic’s brain, the transition from a healthy to an unhealthy state is marked by a peak in information flow between elements. Until now, it has been difficult to impossible to predict these peaks in advance. Working together, scientists from the University of Sussex’s Sackler Centre for Consciousness Science and the Centre for Research in Complex Systems at Australia’s Charles Sturt University have made a breakthrough regarding such predictions. The article explains:

“Essentially this means finding a way to characterize, mathematically, the extent to which the parts of a complex system are simultaneously segregated (they all behave differently) and integrated (they all depend on each other). In the present study the research team managed to do just this, and to show for the first time that their measure reliably predicts phase transitions in standard systems studied by physicists now for many decades (the so-called ‘Ising’ model).

“Professor Anil Seth, Co-Director of the Sackler Centre, says: ‘The implications of the work are far-reaching. If the results generalise to other real-world systems, we might have ways of predicting calamitous events before they happen, which would open the possibility for intervention to prevent the transition from occurring.'”

Such interventions would obviously be beneficial in many circumstances. As this science progresses, we may be surprised at how widely the method could be applied. The possibilities seem endless. Can an application to horse racing be far behind?

Cynthia Murrell, November 12, 2013

Sponsored by ArnoldIT.com, developer of Augmentext

Predictions of Real-World Events, Magic or Science?

November 10, 2013

The article on SmartData Collective titled Can You Predict Crowd Behavior? Big Data Can argues that prediction of real-world events like protesting and violent conflict are already being successfully predicted, not by historians or economists but by data scientists, specifically those at Recorded Future. We have all heard about Nate Silver’s voting predictions, but according to the article, Recorded Future has taken crowd behavior predicting even further,

“Back in January 2010 a small startup company called Recorded Future released a blog post claiming that Yemen would likely have food shortages and flooding that year. Due to the combination, the country was headed for conflict. By September of that year not only had Yemen experienced flooding but was also combating food shortages… By February 2012, the protests had turned violent with protesters killed by gunmen and the Yemen President suffering severe injuries after a bomb was planted in his compound.”

While we are not sure how this is working out in the real world, with actual events, businesses have certainly embraced the idea that they can sell things to people before the people even know they need them. The problem might be how to avoid creeping the customer out like the expectant mother debacle at Target. Meanwhile the issue of privacy rears its head; apparently it is never too early to start predicting bad behavior.

Chelsea Kerwin, November 10, 2013

Sponsored by ArnoldIT.com, developer of Augmentext

An Analysis of Deleted Tweets

November 7, 2013

It is only human to wish we could take back hurtful or embarrassing words. On the other hand, it is tough to search for information that is no longer there. University of Edinburgh researchers have been looking into the motives behind deleted Twitter missives, we learn in Digital Trends‘ piece, “New Research is Revealing What Tweets Get Deleted—and Why.” You can see the study as a PDF here.

Not surprisingly, a particularly rich field for deleted tweets lies in the political realm. Writer Kate Knibbs tells us:

“Nicko Margolies, projects coordinator for the Sunlight Foundation (the organization that runs Politiwoops) says that they’ve noted a number of reasons politicians have chosen to delete their tweets. ‘The ones that we find most interesting are the situations where politicians change their position on something or craft their language into a message others are using. This is often seen through popular hashtags or talking points that many politicians echo to their followers, bringing the issue (and their position) to the forefront of the digital conversation,’ he says.”

Of course. The team says, though, that public figures are not the only ones concerned with how their 140-words-or-less may be interpreted. They found that many reconsidered tweets contained curse words. In what I suspect is a related finding, they discovered people are more likely to delete tweets very late at night.

Tweets containing sensitive information like social security numbers or email addresses are also more likely to be removed. Knibbs sensibly wonders whether such deletions increased after revelations about NSA surveillance came out, but that information is not available. She hopes other researchers take up the topic of deleted online postings because, she says, what we choose to redact reveals much about online behavior. Such studies could even prompt us to pause before we post something we’d regret. Maybe.

Cynthia Murrell, November 07, 2013

Sponsored by ArnoldIT.com, developer of Augmentext

Invest in Social Media Sentiment Analysis to Avoid Brand Damage

November 2, 2013

Simon Creasey from Computer Weekly recently reported on the outcome of the latest Twitter firestorm in the article “Failure to Invest in Sentiment Analytics Could Lead to Brand Damage.

According to the article, a disgruntled British Airways passenger decided use a paid-for promoted tweet to blast his complaints to thousands of Twitter followers. As you can imagine, the tweet went viral and was shared and re-shared until it received global coverage. While PR disasters are often unavoidable, businesses are developing social media sentiment analysis software to contain them.

The article concludes:

““Monitoring what people are saying about your products and industry can help you design your products and propositions for the future and in that sense Twitter acts as a great market research tool as well as a lead-generation tool,” says Sinclair.

“Similarly, if you monitor what people are saying about your brand it can also help you with customer service and PR. There are many examples of companies who have found themselves under social media attack. Failure to invest in these kinds of tools could easily result in significant damage to a company’s reputation and brand.”

These days, social media is ever expanding and it is impossible to keep track of everything being said about your company’s brand, products, and employees. In order to avoid PR disasters like the one that happened to British Airways, companies should invest in the latest sentiment analysis technologies.

Jasmine Ashton, November 02, 2013

Sponsored by ArnoldIT.com, Developer of Beyond Search

2013 Text Mining Summit Draws Record Crowd

November 2, 2013

The Linguamatics Blog recently reported on the outcome of the 2013 Text Mining Summit in the post “Pharma and Healthcare Come Together to See the Future of Text Mining.”

According to the article, this year’s event drew a record crowd of over 85 attendees who had the opportunity to listen to industry experts from the pharma and healthcare sector.

The article summarizes a few event highlights:

“Delegates were provided with an excellent opportunity to explore trends in text mining and analytics, natural language processing and knowledge discovery. Delegates discovered how I2E is delivering valuable intelligence from text in a range of applications, including the mining of scientific literature, news feeds, Electronic Health Records (EHRs), clinical trial data, FDA drug labels and more. Customer presentations demonstrated how I2E helps workers in knowledge driven organizations meet the challenge of information overload, maximize the value of their information assets and increase speed to insight.”

Events like the Text Analytics Summit are excellent opportunities for members of the data analytics community to gather and share their insights and new advances in the industry.

Jasmine Ashton, November 02, 2013

Sponsored by ArnoldIT.com, developer of Beyond Search

« Previous PageNext Page »

  • Archives

  • Recent Posts

  • Meta