Inteltrax: Top Stories, August 8 to August 12, 2011

August 15, 2011

Inteltrax, the data fusion and business intelligence information service, captured three key stories germane to search this week, specifically how the legal world is impacted by data analytics.

One of our most popular entries this week was “Legal Marketplace Filled with Analytic Options.”  This was a quick look at all the data mining tools available to lawyers.

Another hot topic was our article, “Zettaset and Others Cashing in on Forensics.”  Proving forensic science has been aided undoubtedly by predictive analytics in ways CSI could only dream of.

In addition, our story, “Facial Recognition a Boon for Facebook and a Threat for SSNs”  detailed how legal tools, like facial recognition software, can backfire, causing a serious breach in security.

For the most part, we feel the legal world is aided in amazing ways by big data management systems. From the courtroom to the police station, people are utilizing these tools. But with any strong advance in technology, there is always a risk of misuse. We’ll be following these trends and others to watch this fascinating corner of the industry unfold.

Follow the Inteltrax news stream by visiting www.inteltrax.com

Patrick Roland, Editor, Inteltrax, August 15, 2011

Sponsored by Digital Reasoning, developers of the next-generation analytics platform, Synthesys.

IBM Sets New File Scanning Record

August 5, 2011

IBM’s announcements fascinate us. The company releases information about products, services, and inventions and then we don’t hear too much about them. We still are waiting for a live demo of the search prowess of Watson. We think indexing Wikipedia would be a good start, but it seems that Watson has developed an interest in medicine. No problem. We’re patient. (No pun intended.)

We liked the write up “IBM System Scans 10 Billion Files in 43 Minutes,” reports TecheEYE.net. That beats their own previous record set back in 2007. Writer Matthew Finnegan elaborates:

“IBM has successfully scanned 10 billion files in just 43 minutes, opening the doors to access of zettabytes of information storage. This means a massive improvement on the previous record, a relatively sluggish one billion files scanned in three hours.

Changes credited for the success include relying on a single platform data environment and management task simplification. Also, an algorithm was devised that maximized use of all ten eight-core systems in the General Parallel File System. Researchers expect this accomplishment to point the way to ever greater data management efficiency in the future.

Our view is that this seems like a lot of files, but without a comparison against some other vendors of high speed file access, we interpret the number as similar to Amazon’s reporting of how successful Amazon Web Services is. We think Amazon is successful, but the metrics are tough to anchor to something to which we can relate. IBM is, it appears, emulating Amazon’s approach to unanchored metrics.

Our question: when will we see these different and amazing technologies in Watson? When will we see a third party analysis of file scanning speed or better yet, an article from a customer detailing the method and payoff from IBM’s remarkable technology?

Cynthia Murrell, August 5, 2011

Sponsored by Pandia.com, publishers of The New Landscape of Enterprise Search

Thoughts from an Industry Leader: Margie Hlava, Access Innovations

August 4, 2011

Here are some astute observations on the direction of enterprise search from someone who knows what she’s talking about. Library Technology Guides points to an interview with Margie Hlava, president of Access Innovations, in “Access Innovations founder and industry pioneer talks about trends in taxonomy and search.”

Ms Hlava’s 33 years in the search industry informed her observations on current trends, three of which she sees as significant: Cloud and Software as a Service (SaaS) computing, term mining, and the demand for metadata.

The move to the Cloud and SaaS computing demands more of our hardware, not less, Hlava insists. In particular, broadband networks are struggling to keep up. One advantage of the shift is a declining need to navigate labyrinths of hardware, software, and even internal politics on the client side. Other pluses are the motion toward increased data sharing and service enhancement. Also, more ways to maintain security and intellectual property rights are on the horizon.

She says that term mining is “a process involving conceptual extraction using thesaurus terms and their synonyms with a rule-base, then looking for occurrences to create more detailed data maps,” according to Hlava. Her company leverages this concept to make the most of clients’ large data sets. She is interested in new angles like mashups, data fusion, visualization, linked data, and personalization, but with a caveat: success in all these depends on the quality of the data itself. “Rotten data gives rotten results.”

Ms. Hlava regards taxonomies and other metadata enrichment as the way to bring efficiency to our searches. In that realm, the benefits have only begun:

“In terms of taxonomies and search, ‘I think we have just scratched the surface. With good data, our clients are in a good position to do an incredible array of new and interesting things. Good taxonomies take everything to the next level, forming the basis of not only mashups, but also author networks, project collaborations, deeper and better information retrieval,’ she concluded.”

Wise words from a wise woman. We look forward to observing these predictions take shape as the search industry moves forward. The interview with Margie Hlava, can be read in full here.

Access Innovations offers a wide range of content management services. The company has been building its semantic-based solutions for over thirty years and prides itself on its unique tool set and experienced personnel.

Stephen E Arnold, August 4, 2011

Sponsored by Pandia.com, publishers of The New Landscape of Enterprise Search

Ardentia Search Now Connexica

July 29, 2011

Short honk: We were updating the Overflight links today and noted that Ardentia Search which had positioned itself as a “business intelligence company” is now redirecting to Connexica.com. The About Us page references Ardentia Search. The managing director of the company is Richard Lewis. Here’s the important bit:

As CTO at Ardentia, [Richard Lewis] was responsible for the development of BI and Data Warehouse products which are now used in over 100 NHS organizations as well as providing analysis and extract services for the National Program for IT. Richard founded Connexica in 2006 by buying the IPR for his latest BI and search product from Ardentia.

If you don’t recall the Ardentia system, here’s a block diagram I unearthed from the Overflight archive:

ardentia overview

A number of search and content processing companies are repositioning, not disappearing.

Stephen E Arnold, July 29, 2011

Freebie

The Mongo Mambo: NoSQL Is Tireless

July 26, 2011

Okay, so this isn’t exactly search-related, but we think it’s worth a mention. The blurring of search and data management is starting to become a more common symptom of the big data world.

OpenMyMind.net provides helpful information with “Practical NoSQL—Solving a Real Problem with MongoDB and Redis.

Blogger (and software developer) Karl Seguin details his process of making an improvement to the Mogade game developer site. He is eager to share his use of a new tool coupled with a new modeling approach. In his conclusion, Seguin states,

“This reinforces my opinion about NoSQL in general. MongoDB has a couple specializations that are truly awesome (geospatial, logging), but it’s largely a general purpose data store with a number of advantages over RDBMS’. Many other NoSQL solutions are more specialized. Redis, while capable of more than what I’m using it for, is more specialized, and handles/looks at/views data differently. These solutions work well together and not only make it fun to work with data again, they make it easy and efficient.”

We appreciate the effort that Mr. Seguin put into his write up. We think that the blend of technologies is one of the harbingers of a significant shift in the data management world. One can only go so far with the traditional RDBMS before money crushes one’s big data aspirations. XML has been made to perform some interesting tricks, but under the demands of price sensitive information technology shops, there is some push back for this former prom queen. And the basic NoSQL world is being asked to deliver functions and services that extend well beyond the basics of fetching a result set.

Change is upon us and it may have a significant impact on vendors who are well positioned in the big data, search based application space. We like the moves of the Mongo Mambo but we love the music of Exalead’s CloudView approach.

Cynthia Murrell July 19, 2011

Sponsored by ArnoldIT.com, the resource for enterprise search information and current news about data fusion

MarkLogic, FAST, Categorical Affirmatives, and a Direction Change

July 5, 2011

I weakened this morning (July 4, 2011) with a marketing Fourth of July boom. I received one of those ever present LinkedIn updates putting a comment from the Enterprise Search Engine Professionals Group in front of me.

image

The MarkLogic positioning exploded on my awareness like a Fourth of July skyrocket’s burst.

Most of the comments on the LinkedIn group are ho hum. One hot topic has been Microsoft’s failure to put much effort in its blogs about Fast Search & Transfer’s technology. Snore. Microsoft put down $1.2 billion for Fast, made some marketing noises, and had a fellow named Mr. Treo-something talk to me about the “new” Fast Search system. Then search turned out to be more like a snap in but without the simplicity of a Web part. Microsoft moved on and search is there, but like Google’s shift to Android, search is not where the action is. I am not sure who “runs” the enterprise search unit at Microsoft. Lots of revolving door action is my impression of Microsoft’s management approach in the last year.

The noise died down and Fast has become another component in the sprawling Shanghai of code known as SharePoint 2010. Making Fast “fast” and tuning it to return results that don’t vary with each update has created a significant amount of business for Microsoft partners “certified” to work on Fast Search. Licensees of the Linux/Unix version of ESP are now like birds pushed from the next by an impatient mother.

New MarkLogic Market Positioning?

Set Microsoft aside for a moment and look at this post from a MarkLogic professional who once worked at Fast Search and subsequently at Microsoft. I am not sure how to hyperlink to LinkedIn posts without generating a flood of blue and white screens begging for log in, sign up, and money. I will include a link, but you are on your own.

Here’s the alleged MarkLogic professional’s comment:

Many organizations are replacing FAST with MarkLogic. MarkLogic offers a scalable enterprise search engine with all the features of FAST plus more…

Wow.

An XML engine with wrappers is now capable of “all” the Fast features. In my new monograph “The New Landscape of Enterprise Search”, I took some care to review information presented by Fast at CERN, the wizard lair in Europe, about Fast Search’s effort to rewrite Fast ESP, which was originally a Web search engine. The core was wrapped to convert Web search into enterprise search. This was neither quick nor particularly successful. Fast Search & Transfer ran into some tough financial waters, ended up the focus of a government investigation, and was quickly sold for a price that surprised me and the goslings in Harrod’s Creek.

You can get the details of the focus of the planned reinvention of the Fast system and the link to the source document at CERN which I reference in my Landscape study. A rewrite indicates that some functions were not in 2007 and 2008 performing in  a manner that was acceptable to someone in Fast Search’s management. Then the acquisition took place. The Linux/Unix support was nuked. Fast under Microsoft’s wing has become a utility in the incredible assemblage of components that comprises SharePoint 2010. I track the SharePoint ecosystem in my information service SharePointSemantics.com. If you haven’t seen the content, you might want to check it out.

Read more

A Data Handover

June 22, 2011

In “Twitter and Google Hand Over Data: We’re All Newspapers Now,” PocketLint examines changes in how Twitter and Google now respond to requests from authorities for details about their users.

Two Twitter-related cases in England, one that had the “South Tyneside council acquiring Twitter data connected to someone who had published potentially libelous statements on a blog,” are examined. Google is facing a court order that will require it to hand over emails deleted by (celebrity chef) Gordon Ramsay’s father in law Chris Hutchinson.

“For some this (ability to identify users) is reassuring, after all the anonymity available with the Internet can pose legitimate threats and opportunities for many to act unpleasantly or even illegally and without punishment.”

While some might like it, the long view of these developments has to make you ask how much data will be provided, ie all of John Doe’s e-mail, all of John Doe’s e-mail to Jane Doe, or John Doe’s April 15, 9:42am e-mail to Jane Doe, and once that data is released, where does the go and who’s reading their contents?

These are questions that perhaps need to be asked and answered.

Stephen E. Arnold, June 22, 2011

Freebie

 

Big Data Inhabits New Space in the Virtual Market

June 20, 2011

We noticed this press release, “Big Data Mall Opens on the Informatica Marketplace” which was picked up by GlobeNewswire.

Big data is the buzzword du jour to describe large amounts of structured and unstructured information. The idea is that there is so much data to process that traditional methods fall short of delivering useful results at a reasonable cost in the time available for a 30 something decision maker to fill his or her role as a “decider.”

Companies like Informatica are making tackling this contemporary challenge a priority, and continue to lead the way in terms of data management solutions. Concurrent with the release of their Informatica 9.1 Platform, consumers now have access to the recent addition to the Informatica Marketplace, the Big Data Mall.

The Marketplace allows both customer and vendor to collaborate in an effort to better manage the goals of modern commerce. The methods arrived at are what is referred to within the Marketplace as blocks. Specific blocks are then collected into sections known as malls. The release provides an explanation of this new section:

“The Big Data Mall is a focal point for the industry in addressing the challenges and opportunities in Big Data,” said Tony Young, chief information officer, Informatica. “The new Mall debuts with 40 Blocks from Informatica and other leading vendors that map to the three major technology trends that define Big Data – Big Transaction Data, Big Interaction Data and Big Data Processing. New Blocks will be added going forward, as more and more innovative solutions emerge from the industry around Big Data.”

Will big data become the next frontier for findability or will predictive analytics become the next big thing?

Micheal Cory, June 20, 2011

Sponsored by ArnoldIT.com, the resource for enterprise search information and current news about data fusion

Digital Reasoning Adds Chinese Support to Synthesys

June 9, 2011

Digital Reasoning Introduces Chinese Language Support for Big Data Analytics,” announces the company’s press release. This latest advance from the natural language wizards acknowledges the growing prevalence of Chinese on the Web. The support augments their premiere product, Synthesys:

“Synthesys can now analyze the unstructured data from a variety of sources in both English and Chinese to uncover potential threats, fraud, and political unrest. By automating this process, intelligence analysts can gain actionable intelligence in context quickly and without translation.”

This key development is the sort of thing that makes us view Digital Reasoning as a break out company in content processing. Their math-based approach to natural language analytics puts them ahead of the curve in this increasingly important field. Synthesis has become an essential tool for government agencies and businesses alike.

This support for Chinese is just the beginning. Rob Metcalf, President and COO, knows that “the next generation of Big Data solutions for unstructured data will need to natively support the world’s most widely spoken languages.”

We’re delighted to see Digital Reasoning continue to excel.

Cynthia Murrell June 8, 2011

Sponsored by ArnoldIT.com, the resource for enterprise search information and current news about data fusion

A Taxonomy of NoSQL Databases

June 6, 2011

Search is morphing. The line between databases and search is thin and in some case porous. We believe that most readers of Beyond Search are familiar with the Access and JET engines from Microsoft.

However, some individuals find that the traditional, decades old relational database inappropriate for certain tasks. The solution for some is NoSQL databases. We learned in “The Four Categories of NoSQL Databases”:

Most people just see one big pile of NoSQL databases, while there are quite some differences. You couldn’t use a Key-Value store when you need a Graph database for example, while Relational database systems are all quite compatible.

The author identifies four distinct categories of NoSQL databases:

  • Key-values—A math method powers this technique implemented in Google’s and its variants’ approach
  • Column Family—A columnar oriented method of organization
  • Document—Key value method1
  • Graph—Node and edge set up.

No database method is without drawbacks. The article points out that most NoSQL approaches eliminate the central, declarative language of SQL to allow for faster processing. Coupled with different architectures, NoSQL gains some advantages for “big data”; that is, large data sets and certain types of processing. But each models described in the article requires its own method of querying, trading a single, simple method of access for more flexible storage. These programs may not embrace the latest methods from Digital Reasoning, Kitanga and others, but this source is definitely worth tucking away for reference.

Stephen E Arnold, June 6, 2011

Sponsored by ArnoldIT.com, the resource for enterprise search information and current news about data fusion

« Previous PageNext Page »

  • Archives

  • Recent Posts

  • Meta