Guide to Sentiment Analysis Application

January 17, 2014

The article on Lexalytics Blog titled Tagging, Taxonomies, Categorization with Salience provides a guide to using salience to get the most out of data. The first step, Discovery, involves features like Themes which extracts proper noun phrases to give a summary of what the content contains. Step 2 uses Concept Topics which uses ontology built from Wikipedia’s semantic knowledge to relate one word to another.

The article explains how this works:

“Salience will use the relationship between the category samples to tag your data. So every time the word “lion” pops up in your data, that entry will be categorized as “cats”. Every time the word “cheetah” appears, salience will know that this animal belongs to the cat family, and will tag the document as “cats”. This method of categorization is awesome because you do not need to list every single member of the cat family to create this category.”

Step 3 is another way of classifying data; it is creating a query topic. You input all words associated with your topic after consulting Wikipedia and a thesaurus, then limit the search with more information, and you also include how closely one word must be to another for it to be relevant.

Chelsea Kerwin, January 17, 2014

Sponsored by ArnoldIT.com, developer of Augmentext

A Sales Pitch for HP IDOL

January 12, 2014

Conceptual search allows users to search by concepts and ideas within information rather than basic keywords and phrases. Great idea, except that that the idea of conceptual search has been around since 1999. HP is touting it as a entirely brand new idea in the article, “Analytics For Human Information: Optimize Information Categorization With HP IDOL” posted on its own Web site. Rather than break directly into the “new” conceptual search, we are given the even better glittery term “categorization.” HP IDOL, using ExploreCloud-an SaaS solution for analytics and sights, offers an auto-categorization feature marked as a time saver and productive tool.

HP describes it as a magic tool:

“Powered by HP IDOL, ExploreCloud helps you uncover insights across all channels: web, mobile, social media, email, contact center, database, and storefront, so that you can organize and quantify content in a consistent, objective manner, resulting in data that is more accessible and consistent. And you can maintain existing legacy taxonomies and/or enrich them with contextual understanding. When you go beyond the limitations of what keywords can help you do, your whole world opens up. You can also discover the “unknown unknowns,” or topics you did not know to look for in the first place.”

The article stresses that regular keyword searching is far from abandoned, but its limitations are stressed. Keyword search’s weaknesses are addressed to the point of stating the obvious, and then it turns into a sales pitch for HP IDOL. Little is said about what exactly HP IDOL can do, other than organize data. HP, please tell us something we do not know.

Whitney Grace, January 12, 2014

Sponsored by ArnoldIT.com, developer of Augmentext

Connexica Heads to South Africa

January 10, 2014

Traveling around the Cape of Good Hope can be tricky business, but Connexica, a software company based in Staffordshire, England, plans on opening an office in South Africa. According to Midlands Business News in the article, “Staffordshire Based Connexica Expand Into South Africa” the business move is the result of a strategic partnership with Allard Verster Group.

Allard Verster Group specializes in business consulting and solutions and the partnership between the two companies will give Allard the ability to sell Connexica’s Business Analytics Software CXAIR in South Africa. CXAIR gives its users high-speed access to data with interactive diagrams.

Allard Verster Group already has an established client base in several sectors, including mining, manufacturing, healthcare, insurance, and local government. The partnership will allow Connexica to reach a new range of clientele. Both companies are excited about the venture and the new opportunities it presents:

“Head of Business Development Greg Richards says of the agreement  ‘We are delighted to announce this partnership with the Allard Verster Group and I am particularly excited about CXAIR moving into a new territory and see real opportunity for CXAIR within the South African and wider African market.’

Craig Verster, Executive Director at Allard Verster Group commented: ‘Our partnership with Connexica significantly enhances our ability to deliver powerful search, business Intelligence and data analysis’ productivity solutions and services to business users. It validates our strategy to co-innovate with our partners to deliver measurable value to our clients.’ ”

Good news for Connexica and Allard Verster Group. Strategic partnerships are one of the best ways to drum up new business as well as expand a product’s market reach.

Whitney Grace, January 10, 2014

Sponsored by ArnoldIT.com, developer of Augmentext

Datameer Rakes in Funding

January 9, 2014

Right now, Datameer is happily positioned at the intersection of preparation and opportunity, we learn from “Datameer Picks Up $19M to Help Companies Do Analytics Along with Hadoop” at VentureBeat. The use of Hadoop has been soaring, and Datameer is perfectly poised to rise with it. As more companies implement the open-source database framework, Datameer is seeing more demand for its help making sense of it all. It doesn’t hurt that the data-analysis firm built its solutions with Hadoop in mind from the start—any IT professional knows that can mean the difference between headache-free implementation and long hours trying to force applications to play well together.

Investors have taken notice of Datameer’s advantages. Writer Jordan Novet relates:

“‘You’re actually seeing Datameer being purchased almost at the same time as Hadoop itself, at the same time as the distribution,’ Ben Fu, a partner at Next World Capital, said in an interview with VentureBeat. Next World led the latest round of funding for the company, bringing its total funding to $36.8 million. Datameer’s large contracts from customers such as British Telecom, Sears, and Visa, also made the company interesting, Fu said….

Next World Capital’s Fu is joining Datameer’s board. Alongside Next World, Kleiner Perkins Caufield & Byers and Redpoint Ventures also joined the round. The new money will provide Datameer with the firepower to sign up new customers, especially in Europe, where Next World has a program to put startups in touch with executives at enterprises from around the continent.”

Novet notes the funding can also allow Datameer to take advantage of further Hadoop advances, as well as respond to competition. Datameer was begun in 2009 by some of the original Hadoop contributors. Headquartered in San Mateo, California, the company also has offices in New York City and in Halle, Germany. In related and possibly helpful news, Datameer is hiring for several positions as of this writing.

Cynthia Murrell, January 09, 2014

Sponsored by ArnoldIT.com, developer of Augmentext

Incapsula Study on Web Activity Gives Insight into Bot Behaviors

December 23, 2013

The article on BBC News Technology titled Bots Now ‘Account for 61% of Web Traffic’ expands on the data from a recent Incapsula study that found humans might only account for a shrinking minority of internet traffic. Last years figure was more like fifty/fifty, but this is not as scary as it might sound since most of the ‘bots’ causing this traffic are tools for search engines indexing website content. There are also other ‘good bots’ like those used by analytics companies rating website performances and other such tasks. The article describes some reservations about the numbers, according to Dr. Ian Brown of the Oxford University Cyber Security Centre:

“There will also be some unavoidable fuzziness in their data, given that they are trying to measure malicious website visits where by definition the visitors are trying to disguise their origin.” Despite the overall growth in bot activity, the firm said that many of the traditional malicious uses of the tools had become less common. It said there had been a 75% drop in the frequency spam links were being automatically posted.”

Part of the explanation for this drop is credited to Google’s vigilance over the last year in stamping out this practice. More good news, Incapsula also reported a 10% drop in hacking activities such as stealing credit cards and hijacking sites (grouped together under the term tool bot activities).

Chelsea Kerwin, December 23, 2013

Sponsored by ArnoldIT.com, developer of Augmentext

Big Data Fans: The Nitty Gritty

December 22, 2013

Love talking about Big Data? I recommend doing a bit of reading. I found “What I Learned from 2 Years of Data Sciencing” refreshing. Quotes I noted were:

  • With reference to Big Data projects where the author worked: “None of these projects gained traction within the company and became abandoned.”
  • With reference to the work required: “Much of the efforts spent for those projects were in getting the right data into the right shape.”
  • “Little did I know that we’ll be cleaning and shaping data for most of my second year at uSwitch.”
  • “In practice, I was just cleaning and shaping data.”
  • “Figuring out the right work to do is one of the most difficult tasks for a data science team. It doesn’t help with the fact that the data science role is so vague.”
  • “Figuring out where to devote our time and effort is not as easy as it sounds.”
  • “Unless someone or something can act on the data, results can only satisfy intellectual curiosity. A business can’t survive on funding people to carry out academic studies forever.”
  • “If cleaning vast amount of data, being clueless as to what to do, and debating with colleagues sound like a challenge that you want to take on, I know a company in London that’s looking for a data scientist!”

Is there a message about the nuts and bolts of data? Is analytics repeating the sins of the first enterprise search vendors? It is so much easier to sell sizzle than focus on the basics like figuring out what’s important and getting valid data. Let’s just take the easy path seems to be one risk for analytics cheerleaders.

Stephen E Arnold, December 22, 2013

Attivio is Synonymous with Partnership

December 21, 2013

If you need a business intelligence solution, apparently Attivio is the one stop shop to go. Attivio has formed two strategic partnerships. The Providence Journal announced that “Actian And Attivio OEM Agreement Accelerates Big Data Business Value By Integrating Big Content.” Actian, a big data analytics company, has an OEM agreement with Attivio to use its Active Intelligence Engine (AIE) to ramp their data analytics solution. AIE completes Actian’s goal to deliver analytics on all types of data from social media to surveys to research documents.

The article states:

” ‘Big Content has become a vital piece in the Big Data puzzle,’ said David Schubmehl, Research Director, IDC. ‘The majority of enterprise information created today is human-generated, but legacy systems have traditionally required processing structured data and unstructured content separately. The addition of Attivio AIE to Actian ParAccel provides an extremely cost-effective option that delivers impressive performance and value.’ “

Panorama announced on its official Web site that, “Panorama And Attivio Announce BI Technology Alliance Partnership.” The AIE will be combined with Panorama’s software to improve the business value of content and big data. Panorama’s BI solution will use the AIE to streamline enterprise decision-making processes by eliminating the need to switch between applications to access data. This will speed up business productivity and improve data access.

The article explains:

“ ‘One of the goals of collaborative BI is to connect data, insights and people within the organization,’ said Sid Probstein, CTO at Attivio. ‘The partnership with Panorama achieves this because it gives customers seamless and intuitive discovery of information from sources as varied as corporate BI to semi-structured data and unstructured content.’”

Attivio is a tool used to improve big data projects to enhance usage of data. The company’s strategy to be a base for other solutions to be built on is similar to what Fulcrum Technologies did in 1985.

Whitney Grace, December 21, 2013

Sponsored by ArnoldIT.com, developer of Augmentext

Attensity May Be on the Rise Again

December 20, 2013

Attensity is a name that comes to mind when organizations need to track social analytics for customer relationship management. The company has not been receiving positive PR in the past year, but when we recently visited Attensity’s management Web page. We noticed that the page had a few new faces with impressive resumes. Will these new board members take the company out of the red and place them on the right path?

Let us review each person. Howard Lau joined Attensity in January 2013, says his LinkedIn page, and he has twenty-five years in the business software sector. He used to be an executive at SAP Labs and SAP Ventures and East Gate Capital. He is now Attensity’s CEO and Chairman. Lau is a venture capitalist and has turned a profit four times the investor’s original investment. He is knowledgeable and has the right experience to turn Attensity around. He checks out well.

Thomas Dreikauss is the general manager of Attensity GmbH in Europe and has the large responsibility of running business development across Western Europe. He has worked in sales management and marketing enterprise software for over twenty years. Derikauss has proven he can build strong teams and helping companies expand beyond a small startup. He worked at Inxight Software GmbH, Xerox PARC, and Business Objects. He was probably brought onto the team, because he is noted to help companies grow when times are tough. Another good apple.

The Chief Financial Officer Frank Brown is next:

“Frank brings over 25 years of experience in the technology and finance industries. Prior to Attensity, he has worked with a number of leading companies in the software, communications, and semiconductor industries, at the executive and board level, to chart corporate strategy and manage internal operations. Frank’s experience includes positions with IBM Corporation, Andersen Consulting, Oracle Corporation and Lehman Brothers. Frank’s background also includes a number of years in the investment banking and venture capital industries. His successful track record as a venture capitalist includes investments across the technology and healthcare sectors. As the founder of Amber Ventures, Frank has worked as a senior finance executive in a variety of privately held technology companies guiding their activities in areas such as budgeting, accounting, fundraising and mergers and acquisitions.  Frank received his M.B.A. from The Wharton School of the University of Pennsylvania and graduated from the University of California, Berkeley with a B.S. in Decision Sciences, Finance and Accounting.”

Brown has the important duty of bringing in revenue and rerouting financial plans. It is a difficult position to be in, especially if the company is trying to reinvent itself. Experience and openness to new ideas is the route Attensity should rely on as the company tries to get back on track. It will be a long, winding path up the mountain. These three will act as the climbing poles to keep Attensity from falling.

Whitney Grace, December 20, 2013

Sponsored by ArnoldIT.com, developer of Augmentext

BayesDB for Easier Analysis

December 18, 2013

Interesting—it seems the venerated Thomas Bayes is now with us in database land. BayesDB is being developed, in conjunction with an analysis method called CrossCat, by a team of folks from MIT‘s Probabilistic Computing Project and the Shafto Lab at the University of Louisville.

The project’s page explains:

“BayesDB, a Bayesian database table, lets users query the probable implications of their data as easily as a SQL database lets them query the data itself. Using the built-in Bayesian Query Language (BQL), users with no statistics training can solve basic data science problems, such as detecting predictive relationships between variables, inferring missing values, simulating probable observations, and identifying statistically similar database entries.

BayesDB is suitable for analyzing complex, heterogeneous data tables with up to tens of thousands of rows and hundreds of variables. No preprocessing or parameter adjustment is required, though experts can override BayesDB’s default assumptions when appropriate.

BayesDB’s inferences are based in part on CrossCat, a new, nonparametric Bayesian machine learning method, that automatically estimates the full joint distribution behind arbitrary data tables.”

The database is designed for two types of folks: those with no statistics chops who nonetheless have tabular data to analyze, and those proficient with statistics who have a non-standard problem or who have no time or patience for custom modeling. The team credits CrossCat in part with making BayesDB possible, but also say the BQL language was key to its development.

The description includes examples, a discussion of which types of data and problems the database addresses best, reasons to trust the results, why they named it BayesDB, and more. Check out the page for all the details.

Cynthia Murrell, December 18, 2013

Sponsored by ArnoldIT.com, developer of Augmentext

A Non Search Person Explains Why Search Is a Lost Cause

December 16, 2013

The author of “2013: the Year ‘the Stream’ Crested” is focused on tapping into flows of data. Twitter and real time “Big Data” streams are the subtext for the essay. I liked the analysis. In one 2,500 word write up, the severe weaknesses of enterprise and Web search systems are exposed.

The main point of the article is that “the stream”—that is, flows of information and data—is what people want. The flow is of sufficient volume that making sense of it is difficult. Therefore, an opportunity exists for outfits like The Atlantic to provide curation, perspective, and editorial filtering. The write up’s code for this higher-value type of content process is “the stock.”

The article asserts:

This is the strange circumstance that obtained in 2013, given the volume of the stream. Regular Internet users only had three options: 1) be overwhelmed 2) hire a computer to deploy its logic to help sort things 3) get out of the water.

The take away for me is that the article makes clear that search and retrieval just don’t work. Some “new” is needed. Perhaps this frustration with search is the trigger behind the interest in “artificial intelligence” and “machine learning”? Predictive analytics may have a shot at solving the problem of finding and identifying needed information, but from what I have seen, there is a lot of talk about fancy math and little evidence that it works at low cost in a manner that makes sense to the average person. Data scientists are not a dime a dozen. Average folks are.

Will the search and content processing vendors step forward and provide concrete facts that show a particular system can solve a Big Data problem for Everyman and Everywoman? We know Google is shifting to an approach to search that yields revenue. Money, not precision and recall, is increasingly important. The search and content  vendors who toss around the word “all” have not been able to deliver unless the content corpus is tightly defined and constrained.

Isn’t it obvious that processing infinite flows and changes to “old” content are likely to cost a lot of money. Google, Bing, and Yandex search are not particularly “good.” Each is becoming a system designed to support other functions. In fact, looking for information that is only five or six years “old” is an exercise in frustration. Where has that document “gone.” What other data are not in the index. The vendors are not talking.

In the enterprise, the problem is almost as hopeless. Vendors invent new words to describe a function that seems to convey high value. Do you remember this catchphrase: “One step to ROI”? How do you think that company performed? The founders were able to sell the company and some of the technology lives on today, but the limitations of the system remain painfully evident.

Search and retrieval is complex, expensive to implement in an effective manner, and stuck in a rut. Giving away a search system seems to reduce costs? But are license fees the major expense? Embracing fancy math seems to deliver high value answers? But are the outputs accurate? Users just assume these systems work.

Kudos to Atlantic for helping to make clear that in today’s data world, something new is needed. Changing the words used to describe such out of favor functions as “editorial policy”, controlled terms, scheduled updates, and the like is more popular than innovation.

Stephen E Arnold, December 16, 2013

« Previous PageNext Page »

  • Archives

  • Recent Posts

  • Meta