Predictive Analytics Becoming an Important Governance Tool

March 31, 2013

Many of our cities need help right now. Those destined for default may need software to help with planning the future, and predictive analytics may be the answer. That is but one area where analytics could help our municipalities; GCN examines the relationship between such software and government agencies in “Analytics: Predicting the Future (and Past and Present).”

Though police are still a long way from the predictive power of 2002’s “Minority Report,” notes writer Rutrell Yasin, police departments in a number of places are using analytics software to forecast trouble. And the advantages are not limited to law-enforcement.

The article begins with a basic explanation of “predictive analytics,” but quickly moves on to some illustrations. Miami-Dade County, Florida, for example, uses products from IBM to manage water resources, traffic, and crime. One key advantage—interdepartmental collaboration. See the article for the details on that county’s use of this technology.

Though perhaps not the most popular of applications, predictive analysis is also now being used to enhance tax collection. So far, the IRS and the Australian Taxation Authority have embraced the tools, but certainly more tax agencies must be eyeing the possibilities. Any tax cheats out there—you have been warned.

Leave it to the CIA‘s head technology guy to capture the essence of the predictive analysis picture as we move into the future. Yasin writes:

“The real power of big data analytics will be unlocked when analytic tools are in the hands of everybody, not just among data scientists who will tell people how to use it, according to Gus Hunt, the CIA’s CTO,  during a recent seminar on Big Data in Washington, D.C.

“‘We are going to have to get analytics and visualization [tools] that are so dead-simple easy to use, anybody can take advantage of them, anybody can use them,’ Hunt said.”

Are we there yet?

Cynthia Murrell, March 31, 2013

Sponsored by ArnoldIT.com, developer of Augmentext

Big Data and the New Mainframe Era

March 30, 2013

Short honk. Navigate to either your paper copy of the Wall Street Journal or the electronic version of “Demand Surges for Supercomputers.” The estimable Wall Street Journal asserts:

Sales of supercomputers priced at $500,000 and higher jumped 29% last year to $5.6 billion, research firm IDC estimated. That contrasted with demand for general-purpose servers, which fell 1.9% to $51.3 billion, the firm said.

Most folks assume that nifty cloud services are just the ticket for jobs requiring computational horsepower. Maybe not? For the cheerleaders for big data, the hyperbole to crunch bits every which way may usher in a new era of—hang on, gentle reader—the 21st century mainframe.

Amazon, Google, and Rackspace type number crunching solutions may not be enough for some applications. If big iron continues to sell along with big storage, I may dust off my old JCL reference book and brush up on DASDs.

Stephen E Arnold, March 30, 2013

Search Evaluation in the Wild

March 26, 2013

If you are struggling with search, you may be calling your search engine optimization advisor. I responded to a query from an SEO expert who needed information about enterprise search. His clients, as I understood the question, were seeking guidance from a person with expertise in spoofing the indexing and relevance algorithms used by public Web search vendors. (The discussion appeared in the Search-Based Applications (SBA) and Enterprise Search group on LinkedIn. Note that you may need to be a member of LinkedIn to view the archived discussion.)

The whole notion of turning search into marketing has interested me for a number of year. Our modern technology environment creates a need for faux information. The idea, as Jacques Ellul pointed out in Propaganda, is that modern man needs something to fill a void.

How can search deliver easy, comfortable, and good enough results? Easy. Don’t let the user formulate a query. A happy quack to Resistance Quotes.

It, therefore, makes perfect sense that a customer who is buying relevance in a page of free Web results would expect an SEO expert to provide similar functionality for enterprise search. Not surprisingly, the notion of controlling search results based on an externality like key word stuffing or content flooding is a logical way to approach enterprise search.

Precision, recall, hard metrics about indexing time, and the other impedimenta of the traditional information retrieval expert are secondary to results. Like the metrics about Web traffic, a number is better than no number. If the number’s flaws are not understood, the number is better than nothing. In fact, the entire approach to search as marketing is based on results which are good enough. One can see the consequences of this thinking when one runs a query on Bing or on systems which permit users’ comments to influence relevancy. Vivisimo activated this type of value adding years ago and it still is a good example of trying to make search useful. A result which delivers a laundry list of results which forces the user to work through the document list and determine what is useful is gone. If a document has internal votes of excellence, that document is the “right” one. Instead of precision and recall, modern systems are delivering “good enough” results. The user sees one top hit and makes the assumption that the system has made decisions more informed.

There are some downsides to the good enough approach to search which deliver a concrete result which, like Web traffic statistics, looks so solid, so meaningful. That downside is that the user consumes information which may not be accurate, germane, or timely. In the quest for better search, good enough trumps the mentally exhausting methods of the traditional precision and recall crowd.

To get a better feel for the implications of this “good enough” line of thinking, you may find the September 2012 “deliverable” from Promise whose acronym should be spelled PPromise in my opinion, “Tutorial on Evaluation in the Wild.” The abstract for the document does not emphasize the “good enough” angle, stating:

The methodology estimates the user perception based on a wide range of criteria that cover four  categories,  namely  indexing,  document  matching,  the  quality  of  the  search  results  and  the user interface of the system. The criteria are established best practices in the information retrieval  domain  as  well  as  advancements  for  user  search  experience.  For  each  criterion  a test  script  has  been  defined  that  contains  step-by-step  instructions,  a  scoring  schema  and adaptations for the three PROMISE use case domains.

The idea is that by running what strike me as subjective data collection from users of systems, an organization can gain insight into the search system’s “performance” and “all aspects of his or her behavior.” (The “all” is a bit problematic to me.)

Read more

IBM Content Analytics and Search V2.2 Exam

March 26, 2013

I am not sure how, but two links found their way to me today. The subject of the exam is IBM’s Content Analytics and Search V2.2.

Information about the IBM test is at http://www-03.ibm.com/certify/certs/27003701.shtml. Information about the April 2011 version of the system which is the current one is at this IBM link. The current version is going on three years old, which does not suggest continuous, aggressive updating to me.

The first  link points to Blog Pass 4 Test. The site presents some sample questions for the examination, which is part of the IBM certification process.

You can pass the IBM 000-583 (IBM Content Analytics and Search V2.2 with an “examination guide.”

The examination is available from Blog.pass4test.net. Here are three sample questions to whet your appetite:

Which documents from the collection are used to create the clustering proposal?
A. All of the documents in the index are used.
B. A random sample of the number that you specify
C. The first 1000 documents that were added to the index.
D. A round-robin alphabetically ordered sampling from each different crawler
Answer: B

Which languages listed are supported for text analytics collections?
A. French, Arabic, Hindi, Malay
B. German, English, Polish, Greek
C. Hebrew, Italian, English, Russian
D. English, Spanish, Arabic, German
Answer: D

Which is NOT a supported operating system.?
A. AIX 5.3 (32-bit)
B. AIX 6.1 (64-bit)
C. Red Hat Enterprise Linux Advanced Server (32-bit)
D. Microsoft Windows Server 2003 Enterprise (32-bit)
Answer: A

Pretty thin gruel for the cold winter mornings required to get complex proprietary and open source systems to work in an optimal manner.

The second link is to Exam 2 Home. The idea is that for $49, a person can buy a PDF with questions and answers. You can find this exam guide at http://www.exam2home.com/000-583.htm. The site asserts:

Many IBM Content Analytics and Search V2.2 test questions or brain dump providers in the market focus solely on passing the exam while skipping the real-world exam preparation. This approach only gives short-term solution while giving the candidates real setbacks in the job market. The main focus of Exam2Home’s IBM 000-583 questions is to teach you the techniques to prepare your exam in the right sense covering all aspects of the exam. We have truly a 1-2 knockout solution for your IBM 000-583 exam.

Two observations. I must be on a list of folks trying to master IBM Content Analytics and Search V2.2. Interesting idea, just not accurate. Second, these two pitches seem quite similar. Is this another of the learn quick, get a raise training schemes. I ran across a similar program for Quicken. Interesting but suspicious to me.

Stephen E Arnold, March 26, 2013

IBM Goes Big On Big Data

March 25, 2013

IBM is probably the biggest name in big data when it comes to commercial, propriety vendors. The company is an established, household name and they continue to make great technological advances, the most notable being Watson the AI. When visiting the company’s Web site, IBM has gone above and beyond to set themselves apart from other Big Data companies. Take a look at the Smarter Analytics page they created. IBM is stressing the analytical aspect of big data and how their solutions cover software, research, hardware, and services:

 

“Big data is more than a matter of size; it is a way to uncover insights and opportunities from new and emerging internal and external sources of data and content. IBM’s big data capabilities include an enterprise-class big data platform, predictive and content analytics, and decision management to give your organization a competitive edge. IBM’s capabilities and signature solutions are designed to complement your existing information, analytics and content management infrastructure, so you can get started quickly and achieve game-changing results.”

 

Unlike other companies who spout what they have done, IBM provides video evidence documenting how big data has changed/helped companies. Several of those who benefited were T-Mobile, Vestas Wind Systems, NYSE Euronext, and Fiserv. IBM knows how to market itself as a viable big data solution. Unlike other companies it has the multi-generation appeal because of its longevity and new advances.

 

Whitney Grace, March 25, 2013

Sponsored by ArnoldIT.com, developer of Beyond Search

Big Data Headlines Roundup

March 22, 2013

Data Knowledge Center rounds up some of the Big Data headlines from the last week in its article, “HP, Dell Announce New Big Data Analytics Solutions.” The article covers major players from Datastax to Dell but opens with a discussion of HP’s ArcSight.

The article begins:

“HP (HPQ) announced new offerings to help organizations to gain security intelligence from large data sets to better detect and prevent threats. The security information and event management (SIEM) capabilities of HP ArcSight with the HP Autonomy IDOL content analytics engine automatically recognizes the context, concepts, sentiments and usage patterns related to how users interact with all forms of data. Art Gilliland, Enterprise Security Products, HP, said, ‘With the integration of cloud monitoring, content analytics and Big Data processing, HP provides clients with the context needed to effectively stop potential breaches.’”

There are definite up and comers in the Big Data realm, but many new customers to the market will want to go with a trusted solution. LucidWorks offers LucidWorks Search and LucidWorks Big Data, both of which are standards in the field, based on the open source power of Apache Lucene and Solr. Open source also brings the benefits of flexibility and affordability in addition to security.

Emily Rae Aldridge, March 22, 2013

Sponsored by ArnoldIT.com, developer of Beyond Search

Try Before You Buy

March 22, 2013

The old saying that there is a hidden meaning in words means even more in the analytics world. The Semantria API system helps to turn unstructured data into data that makes sense. Semantria even offers a free demo so that customers can see how their system works. According to the website:

“Semantria’s API helps organizations to extract meaning from large amounts of unstructured text. The value of the content can only be accessed if you see the trends and sentiments that are hidden within. Add sophisticated text analytics and sentiment analysis to your application: turn your unstructured content into actionable data.”

Semantria provides users with a fully customizable and user friendly system. The system is also very user-friendly and users can gain valuable insight from their unstructured content with just the simple click of a button. The Semantria API uses the latest techniques for extraction of data so clients can be confident that they are using the latest technology and getting the best information possible. The convenient and attractive Pay-As-You-Go service makes sure that regardless of budget users can get the services they need. More importantly Semantria offers unlimited support and maintenance so users always know where to go to get the answers they need. One of the best benefits is the online demo that Semantria provides for anyone interested to take a peak at how their service works. The try before you buy approach makes it hard to stay away.

April Holmes, March 22, 2012

Sponsored by ArnoldIT.com, developer of Augmentext

New Tool Integrates with Text Analytics

March 21, 2013

Language and analytics are starting a new trend by coming together. According to the Destination CRM.com article “New SDL Machine Translation Tool Integrates with Text Analytics” SDL has announced that its machine translation tool can now be integrated to work with text analytics solutions. SDL BeGlobal can translate both structured and unstructured information across more than 80 different language combinations. The information is then analyzed using text analytics solutions. This gives users the ability to access global customer insights as well as important business trends. Jean-Francois Damais, Deputy Managing Director of loyalty global clients solutions at Ispos had the following to say regarding SDL BeGlobal.

“With the growth in global business and the accessibility of online information, we now have a much greater need to access and analyze data from multiple languages. As a company focused on innovation and dedicated to our clients’ successes, we deployed SDL BeGlobal machine translation to further improve our research insights and bring new value to our customers.”

SDL BeGlobal has already caught on with several companies in the text analytics industry and several well known companies have jumped on the bandwagon. Raytheon BBN Technologies currently uses the technology for broadcast and Web content monitoring and Expert Systems uses it for semantic intelligence. Language and analytics are two things that are not normally thought of together but seems like SDL BeGlobal has a good thing going. Only time will tell if the new friendship between language and analytics will last the test of time.

April Holmes, March 21, 2012

Sponsored by ArnoldIT.com, developer of Augmentext

WCC Group and ImageWare

March 20, 2013

I saw a reference to a court filing by the law firm called San Diego IP Law Group LLP. You can find the document at the San Diego court as Case 3:13-cv-oo309-DMS-JMA. I took a quick look and it appeared that the a company in the search and content processing business is a party to the legal matter. The “defendant”, if I read the document correctly, is WCC Services US, Inc., a Delaware corporation owned by WCC Group BV in the Netherlands.

Here’s what WCC says about its company:

WCC is a high-end software company that automates the matching process by providing more accurate and intelligent results. Non-core activities such as client implementations are performed by qualified partners like Accenture or EDS. To maintain its stated company objectives, WCC recruits and retains a motivated, flexible and highly educated staff. The knowledge and passion of our people drives industry-leading innovation and delights customers with the quality of our products and support. WCC is committed to a transparent Corporate Governance structure, even as a privately-held company. The organization’s openness, internally and externally, gives stakeholders up-to-date information about WCC and its future course. Conservative accounting policies assure continuity of the company and clearly signal WCC’s reliability as a business partner.

The court document carries the phrase “Complain for patent infringement” with a demand for a jury trial. The court document references a number of patents; for example, US 7298873 and some others.

I just wanted to document the existence of this court document. Like the Palantir i2 Group dust up, these disputes about content processing are interesting to me. Once resolved, the information about the matter can disappear. Google, of course, does not like urls which fail to resolve. I don’t loud sirens. Like Google, there’s not much one can do about certain content going dark. Stuff happens whether Google or I like it.

Keep in mind that I don’t have a dog in this fight. I have been monitoring WCC Group’s information retrieval business, but the company has kept a low profile. I did try to contact the company a couple of years ago, but I was unable to get much traction.

WCC’s search system is called Elise. There are some public descriptions of the search related business at these links:

The San Diego Law Group’s Web site is http://firm.sandiegoiplaw.com/. The WCC Web site (assuming I have located the correct Web destination) is http://www.wcc-group.com/.

Stephen E Arnold, March 20, 2013

Bitext: Moving Forward with Enterprise Semantics

March 20, 2013

Antonio S. Valderrábanos, founder of Bitext, recently granted an exclusive interview to the Arnold Information Technology Search Wizards Speak series. Bitext provides multilingual semantic technologies, with probably the highest accuracy in the market, for companies that use text analytics and natural language interfaces. The full text of the interview is available at http://www.arnoldit.com/search-wizards-speak/bitext-2.html.

Bitext provides B2B multilingual semantic technologies with probably the highest accuracy in the market. Bitext works for companies in two main markets: Text Analytics (Concept and Entity Extraction, Sentiment Analysis) for Social CRM, Enterprise Feedback Management or Voice of the Customer; and Natural Language Interfaces for Search Engines and Virtual Assistants. Visit Bitext at http://www.bitext.com. Contact information is available at http://www.bitext.com/contact.html.

Bitext is seeing rapidly growth, including recent deals with Salesforce and the Spanish government. The company has added significant and important technology to its multilingual content processing system.

In addition to support for more languages, the company is getting significant attention for its flexible sentiment analysis system. Valderrábanos gave this example: “flies” may be a noun, but also a verb. We say “time flies like an arrow” versus “fruit flies like bananas.” Bitext believes computers should be able to parse both sentences and get the right meaning. With that goal in mind, they started the development of an NLP (natural language processing) platform flexible enough to perform multilingual analysis just by exchanging grammars, not modifying the core engine.

He told ArnoldIT’s Search Wizards Speak:

Our system and method give us a competitive advantage with regards to quick development and deployment,” Valderrábanos said. “Currently, our NLP platform can handle 10 languages. Unlike most linguistic platforms, the Bitext API ‘snaps in’ to existing software.

Bitext’s main area of research is focused on deep language analysis, which captures the semantics of text. “Our work involves dealing with word meanings and truly understanding what they mean, interpreting wishes, intentions, moods or desires,” Valderrábanos explained. “We just need to know what type of content, according to our client, is useful for her business purposes, and then we program the relevant linguistic structures.” He added:

Many vendors advocate a ‘rip and replace’. Bitext does not. Its architecture allows our system to integrate with almost any enterprise application.”

Bitext already delivers accuracy, reliability and flexibility. In the future, the company will be focusing on bringing those capabilities to mobile applications. “IPads, tablet devices in general, and mobile phones are becoming the main computing devices in a world where almost everybody will be always online. This opens a new whole arena for mobile applications which will have to cater for any single need mobile users may have,” Valderrábanos said.

Donald C. Anderson, March 20, 2013

« Previous PageNext Page »

  • Archives

  • Recent Posts

  • Meta