Linguamatics Releases New Cloud Based Text Mining Solution

March 15, 2012

Search appears to be a transparent technology but in reality it is not. With the mass amounts of unstructured information being released into cyberspace there is a growing need for solutions to sort it. Enter text mining.Text mining allows users to extract value from vast amounts of unstructured textual data.

Business Wire recently reported on the release of a new text mining platform by Linguamatics in the news release “Linguamatics Puts Big Data Mining on the Cloud.”

According to the release, in response to the industry trend to move software applications on to the cloud, Linguamatics has launched the first NLP-based, scalable text mining platform on the cloud.

The article states:

The new service builds on the successful launch by Linguamatics last year of I2E OnDemand, the Software-as-a-Service version of Linguamatics’ I2E text mining software. I2E OnDemand proved to be so popular with both small and large organizations, that I2E is now fully available as a managed services offering, with the same flexibility in choice of data resources as with the in-house, Enterprise version of I2E. Customers are thus able to benefit from best-of-breed text mining with minimum setup and maintenance costs.”
We are very excited about the possibilities of text mining on the cloud as well as Linguamatics’ ability to get its software up and running quickly.

Our view is that Linguamatics is an outfit worth monitoring.

Jasmine Ashton, March 15, 2012

Sponsored by Pandia.com

Attensity Election Forecasts

March 14, 2012

Is the prediction half right or half wrong? Sci-Tech Today seems to opt for optimism with “Twitter Analysis Gets Elections Half Right.” Attensity attempted to demonstrate its social analytics chops by forecasting Super Tuesday Republican Primary results using Twitter tweets. Their predictions were about 50% accurate; isn’t that about what you’d get flipping coins?

A lack of location data seems to be the reason Attensity’s predictions were less precise than hoped. Writer Scott Martin reveals:

Part of the problem lies in a lack of location-based data about Twitter users’ tweets. Such information is ‘scarce’ on Twitter, says Michael Wu, principal scientist of analytics for Lithium, a social-analytics firm. That’s because Twitter users would have to turn on the ‘location’ feature in their mobile devices. A vast pool of location-based tweets would enable analytics experts to better connect tweets to where they come from across the nation. In the case of Super Tuesday, that would mean more localized information on tweets about candidates.

Another roadblock to accurate prediction lies in identifying when multiple tweets come from the same enthusiastic tweeter, or are spam-like robo-tweets. Furthermore, there is no ready way to correlate the expression of opinions with actions, like actually voting. It seems that this analytic process has a long way to go. It also seems that half right is close enough to spin marketing horseshoes.

Serving several big-name clients, Attensity provides enterprise-class social analytics as well as industry solutions for vertical markets. They pride themselves on the accuracy and ease of use of their tools. My thought is that I will pick horses the old fashioned way.

Cynthia Murrell, March 14, 2012

Sponsored by Pandia.com

Reference Resource for Big Data Vendors

March 13, 2012

SoftArtisians and Riparian Data have been reporting on a series that examines some of the key players in Boston’s emerging big data scene.

The recent article, “Boston’s Big Datascape, Part 2: Nasuni, VoltDB, Lexalytics, Totutek, Cloudant”  is the second in the series and examines five companies who may differ in their growth stages and approach but are similar in their ideology that “big data is the castle, and their tools [are] the keys.”

The article breaks each company down by product, founder, technologies used, target industries, and location.

Tokutek’s mission is to transform the way data is stored and retrieved and deliver a quantum leap in the performance of databases and file systems. The company breakdown was:

“Product: TokuDB brings massive data processing and analysis capabilities to heretofore neglected MySQL. It’s a drop-in replacement for InnoDV that extends the capacity of MySQL databases from GBs to TBs.

Founders: Michael A. Bender, Martín Farach-Colton (ln), Bradley C. Kuszmaul

Technologies used: MySQL, MVCC, ACID, Fractal Tree™ indexing

Target industries: Online Advertising, eCommerce, Social Networking, Mobile Solutions, ePublishing.”

We’re interested to see how this series develops and the innovative new companies that come about from it.

Jasmine Ashton, March 13, 2012

Sponsored by Pandia.com

Inteltrax: Top Stories, March 5 to March 9

March 12, 2012

Inteltrax, the data fusion and business intelligence information service, captured three key stories germane to search this week, specifically, trends in big data.

Our biggest trend spotting article was undoubtedly “Big Predictions for 2012 Big Data” in which we laid out the upcoming months and how they will be even bigger than 2011’s massive data year.

Safety and Security Prompt Analytic Trend” took a closer look at new trends, specifically how to keep folks safe via analytic technology.

Finally, “Consumer Thinking Becomes a Big Analytic Focus” is undoubtedly the hottest topic in big data and we toss our two cents into the hat.

Big data and analytics are an evolving being. The landscape today is nothing like it was twelve months ago. Thankfully, we are watching every blip on the radar to give readers a comprehensive feel for the past, present and future of analytics.

Follow the Inteltrax news stream by visiting www.inteltrax.com

Patrick Roland, Editor, Inteltrax.

March 12, 2012

More Allegations about Fast Search Impropriety

March 8, 2012

With legions of Microsoft Certified Resellers singing the praises of the FS4SP (formerly the Fast Search & Transfer search and retrieval system), sour notes are not easily heard. I don’t think many users of FS4SP know or care about the history of the company, its university-infused technology, or the machinations of the company’s senior management and Board of Directors. Ancient history.

I learned quite a bit in my close encounters with the Fast ESP technology. No, ESP does not mean extra sensory perception. ESP allegedly meant the enterprise search platform. Fast Search, before its purchase by Microsoft, was a platform, not a search engine. The idea was that the collection of components would be used to build applications in which search was an enabler. The idea was a good one, but search based applications required more than a PowerPoint to become a reality. The 64 bit Exalead system, developed long before Dassault acquired Exalead, was one of the first next generation, post Google systems to have a shot at delivering a viable search based application. (The race for SBAs, in my opinion, is not yet over, and there are some search vendors like PolySpot which are pushing in interesting new directions.) Fast Search was using marketing to pump up license deals. In fact, the marketing arm was more athletic than the firm’s engineering units. That, in my view, was the “issue” with Fast Search. Talk and demos were good. Implementation was a different platter of herring five ways.

image

Fast Search block diagram circa 2005. The system shows semantic and ontological components, asserts information on demand, and content publishing functions—all in addition to search and retrieval. Similar systems are marketed today, but hybrid content manipulation systems are often a work in progress in 2012. © Fast Search & Transfer

I once ended up with an interesting challenge resulting from a relatively large-scale, high-profile search implementation. Now you may have larger jobs than I typically get, but I was struggling with the shift from Inktomi to the AT&T Fast search system in order to index the public facing content of the US federal government.

Inktomi worked reasonably well, but the US government decided in its infinite wisdom to run a “free and open competition.” The usual suspects responded to the request for proposal and statement of work. I recall that “smarter than everyone else” Google ignored the US government’s requirements.

image

This image is from a presentation by Dr. Lervik about Digital Libraries, no date. The slide highlights the six key functions of the Fast Search search engine. These are extremely sophisticated functions. In 2012, only a few vendors can implement a single system with these operations running in the core platform. In fact, the wording could be used by search vendor marketers today. Fast Search knew where search was heading, but the future still has not arrived because writing about a function is different from delivering that function in a time and resource window which licensees can accommodate. © Fast Search & Transfer

Fast Search, with the guidance of savvy AT&T capture professionals, snagged the contract. That was a fateful procurement. Fast Search yielded to a team from Vivisimo and Microsoft. Then Microsoft bought Fast Search, and the US government began its shift to open source search. Another consequence is that Google, as you may know, never caught on in the US Federal government in the manner that I and others assumed the company would. I often wonder what would have happened if Google’s capture team had responded to the statement of work instead of pointing out that the requirements were not interesting.

Read more

Big Data Excitement at the 2012 Strata Conference

March 8, 2012

Don’t get hit by a stray bullet at the big data corral. IT World examines “The Wild West of Big Data.” Fresh from this year’s Strata Conference in Santa Clara, journalist Brian Proffitt  describes how the current hubbub around big data mirrors the open source environment of a decade ago (the sense of urgency around a rising technology) and how it doesn’t (the lack of a grass-roots community feel).

Excitement is understandable in this burgeoning field, and Proffit felt the anticipation of profit as a “zing” in the air. However, he seems to long for the atmosphere of yore, when excited hackers fueled the advance of innovation for innovation’s sake, rather than the current domination of cloud advances by corporate types looking to make a buck. While he admits companies acknowledge the open source contributions to their products, they usually do so by way of pointing out their own efforts to give back.

The article observes:

“Big data’s community is purely commercial and without the threat of a big competitor to stand in its way. In this sense, it is more of a gold rush than Linux ever was, because without the checks of the internal and external pressures that the early Linux community endured, there seems to be nothing that can get in big data’s way.

“Which may be why we are seeing, even now, signs from experts that are warning potential customers and the vendors willing to take their money to slow down and start really thinking about what they want to do.”

Excellent advice for any gold rush, we’d say. Proffit feels the same, but observes that such voices of caution were in the minority among the Conference’s speakers. No surprise there; who has time for the voice of reason during a stampede?

Cynthia Murrell, March 8, 2012

Sponsored by Pandia.com

Social Media Analytics: What Are Social Media Data?

March 8, 2012

We have been following Text Analytics News, along with Useful Social Media, in its recent series of interviews with experts in the field of Social Media Analytics. The third installment focuses on what exactly social media data is and where it comes from.

Social Media Analytics Expert Interview Series: Part 3” is conducted by the Chief Editor of Text Analytics news, Ezra Steinberg. The interviews are published as a lead-up to the Social Media Analytics Summit. The interview panel for this installment includes: Tom H. C. Anderson CEO, OdinTextAnderson Analytics; Nathan Gilliatt Principal, Social Target; Chris Moody COO, Gnip; and Kami Watson Huyse CEO, Zoetica Media. The interview covers experts’ definitions and interpretations of social media data and attempts to resolve confusion about how to use these data. Some insights from the interview follow:

“USM: When you think of “Social Media Data,” what do you think of first? Second?

Kami (Zoetica Media): Social media data is at the heart of understanding your community. Far from being cold and impersonal, data can tell a story that intuition alone cannot deliver. As much as we like to believe that we fully understand our community, what people say and what people do are often very different. Data can help to guide intuition.

For that reason, the second thing I think of when I consider social media data is its importance as a tool to diagnose, prioritize and evaluate what you are doing as an organization and use it to make course corrections.

USM:  Do you think there is currently a common understanding as to what constitutes social media data?

Chris (Gnip): Definitely not.  For example, some think of social media data as Twitter data because Twitter has done a better job than some other companies of making their data available in a full coverage, reliable, scalable format.  The reality is that social media data comes in lots of different forms from lots of different sources.   We’re working hard to help companies understand how different types of social data can be useful for different types of analysis.”

The interview focuses on understanding social media data and getting the most out of the analytics that it provides. Focus is also given to social media monitoring vendors and analytical tools, with opinions from the experts on which ones are valuable and how they work. Businesses are learning that considering these opinions and implementing social media is valuable when attempting to learn and understand customers and potential customers. The full interview can be found here and can give insight on this marketing tool and how it works.

Andrea Hayden, March 8, 2012

Sponsored by Pandia.com

Data Mining Hits the Big Screen

March 4, 2012

It seems that 3D is not just for the big screen any more. According to the SlideShare article “Visual Data Mining with HeatMiner” three-dimensional heatmaps can be used to represent data. The makers of HeatMiner claim that large data sets with a variety of correlating attributes can be hard to understand “using traditional data analysis and visualization methods.” “HeatMiner is a new visual data mining technology which visualizes the data as three-dimensional heatmaps.” HeatMiner argues that most data reports are too simple and therefore lack accuracy. Visual data mining with HeatMiner relies on 3D shapes to represent frequent value combinations. “Colors can be used as the fourth dimension or to ease interpretation.” At first glance the technology is very attractive and does grab your attention but will is that enough for users to buy into this visual mining technology. Only time will tell if this new technology is actually practical or just a pretty picture.

April Holmes, March 4, 2012

Sponsored by Pandia.com

Google Predicts the Oscars

March 3, 2012

It is that time of year again when friends and families gather around their television sets to weigh in on who should win the Oscars. Google and search, however, do not usually come to mind when thinking of the Academy Awards — until now.

Web Pro News recently reported on the search giant’s prognostications in the article, “Google Predicts Oscar Winners.”

According to the article, a Google search team crunched the numbers and broke down the search data and made predictions for the categories of Best Actor, Best Actress, and Best Picture.

How did they figure it out? When discussing Best Picture predictions, the Google blog stated:

“Last year we found that for three years running, the films that won best picture had two things in common when it came to search data. First, the winning movies had all shown an upward trend in search volume for at least four consecutive weeks during the previous year. Second, within the U.S. the winning film had the highest percentage of its searches originating from the state of New York.”

While the Google blog ended up narrowing it down to three potential winners, Extremely Loud & Incredibly Close, The Artist, and Midnight in Paris, I found the title of the article somewhat misleading. While Google utilized search to narrow down the candidates, it did not predict a sole winner. How do you think Google did?  If you were grading Google, did the company get an A or an F?

Jasmine Ashton, March 3, 2012

Sponsored by Pandia.com

Ontoprise GmbH: Multiple Issues Says Wikipedia

March 3, 2012

Now Wikipedia is a go-to resource for Google. I heard from one of my colleagues that Wikipedia turns up as the top hit on a surprising number of queries. I don’t trust Wikipedia, but I don’t trust any encyclopedia produced by volunteers including volunteers. Volunteers often participate in a spoofing fiesta.

seo danger transparent

Note: I will be using this symbol when I write about subjects which trigger associations in my mind about use of words, bound phrases, and links to affect how results may be returned from Exalead.com, Jike.com, and Yandex.ru, among other modern Web indexing services either supported by government entities or commercial organizations.

I was updating my list of Overflight companies. We have added five companies to a new Overflight service called, quite imaginatively, Taxonomy Overflight. We have added five firms and are going through the process of figuring out if the outfits are in business or putting on a vaudeville act for paying customers.

The first five companies are:

  1. Millenium
  2. Mondeca
  3. Nuance
  4. Synaptica
  5. Visual Mining
  6. Wand

We will be adding to the Taxonomy Overflight another group of companies on March 4, 2012. I have not yet decided how to “score” each vendor. For enterprise search Overflight, I use a goose method. Click here for an example: Overflight about Autonomy. Three ducks. Darned good.

I wanted to mention one quite interesting finding. We came across a company doing business as Ontoprise. The firm’s Web site is www.ontoprise.de. We are checking to see which companies have legitimate Web sites, no matter how sparse.

We noted that the Wikipedia entry for Ontoprise carried this somewhat interesting “warning”:

image

The gist of this warning is to give me a sense of caution, if not wariness, with regard to this company which offers products which delivered “ontologies.” The company’s research is called “Ontorule”, which has a faintly ominous sound to me. If I look at the naming of products from such firms as Convera before it experienced financial stress, Convera’s product naming was like science fiction but less dogmatic than Ontoprise’s language choice. So I cannot correlate Convera and Ontoprise on other than my personal “semantic”baloney detector. But Convera went south in a rather unexpected business action.

Read more

« Previous PageNext Page »

  • Archives

  • Recent Posts

  • Meta