Free Knowledgebase Builder for Mind Mapping
February 24, 2014
Mind maps can be a valuable tool for the visual among us, and you can easily build your own virtual version with Knowledgebase Builder 2.6 from InfoRapid, based in Waiblingen, Germany. The best part—it’s free for personal use. As with most such business models, the company hopes you’ll try the freeware version and decide you can’t live without the tool in your workplace. The Professional Edition, which lets multiple users work together on the same knowledge base, goes for 99 euros (about $135 as of this writing). The price for the version with all the bells and whistles, the Enterprise Version, varies by company size, but starts at 1,000 euros (about $1,360 as I type) for a small business.
The description tells us:
“InfoRapid KnowledgeBase Builder allows you to easily create complex Mind Maps with millions of interconnected items. One single Mind Map can hold your entire knowledge, all your thoughts and ideas in a clear way. The data is stored securely in a local database file. While traditional Mind Maps don’t offer cross connections, InfoRapid KnowledgeBase Builder can connect any item with each other and label the connection lines. The program contains an archive for documents, images and web pages that may be imported and attached to any chart item or connection line.”
The six-minute video on the website demonstrates the Builder’s functionality, using as its example text about the software itself. The connection lines they mention above, which shift to adjust to new input, are reason enough to switch from pen-and-paper or MSPaint mapping techniques. Another key feature: You can link to documents or web pages from within the map, simplifying follow-through (a weak point for many of us.) The Highlighter Analysis is pretty nifty, too. Anyone curious about this tool should check out the site—the (personal use) price can’t be beat.
Cynthia Murrell, February 24, 2014
Sponsored by ArnoldIT.com, developer of Augmentext
PathAR: Bold Claims
February 23, 2014
I came across a quite remarkable marketing assertion. The company using the wording is PathAR LLC, based in the midwest. Here’s what the company says:
Today 1 of the 3.8 Billion users of social media WILL impact your organization! Do you know who that 1 user is? How do we do it?
We built the world’s most advanced commercially available end-to-end solution for creating actionable intelligence from big data! Our proprietary intelligence engine powers Dunami, our web-based software platform. Dunami combines breakthrough advances in network analysis with advanced analytical techniques derived from long standing intelligence practices. Dunami’s broad capabilities are being used to Find, Understand, and Predict the behaviors of thought leaders and organizers on any topic, including identifying extremists, criminals, and others who are inciting potential violence around the globe!
When I read the statements, I wonder how predictive methods can pinpoint a single datum as the pivotal item of information.
Dunami, as a product/service name, poses some findability challenges. The name is in use for an exercise studio, a religious connotation, and a visual novel.
The company has filed for a trademark. See http://bit.ly/1hlK9mk. The company has a modest LinkedIn presence. See http://goo.gl/5a2JiK.
Is this another outfit chasing after IBM i2, Recorded Future, and the dozens of vendors listed on the Carasoft Web site?
Stephen E Arnold, February 23, 2014
Frequentists Versus Bayesians: Is HP Amused?
February 19, 2014
I read a long report and then a handful of spin off reports about HP and Autonomy, mid February 2014 version. The Financial Times’s story is a for fee job. You can get a feel for the information in “HP Executives Knew of Autonomy’s Hardware Sales Losses: Report.” There are clever discussions of this allegedly “new information” in a number of blogs. What is interesting is an allegedly accurate chunk of information in “HP Explores Settlement of Autonomy Shareholder Lawsuit.” My head is spinning. HP buys something. Changes the person on watch when the deal was worked out. HP gets a new boss and makes changes to its board of directors. HP then accuses everyone except itself for buying Autonomy for a lot of money. HP then whips up the regulators, agitates accounting firms, and pokes Michael Lynch with a cattle prod.
As this activity was in the microwave, it appears that HP knew how the hardware/software deals were handled. If the reports are accurate, Dell hardware was more desirable than HP’s hardware.
But there is a more interesting twist. I refer you, gentle reader, to “A Fervent Defense of Frequentist Statistics.” Autonomy’s “black box” consists of Bayesian methods and what I call MCMC or Monte Carlo and Markov Chain techniques. The idea is that once some judgment calls are made, the Integrated Data Operating Layer or IDOL can chug away without human involvement. When properly resourced and trained, the Autonomy system works for certain types of content processing and information retrieval applications. You can read more about IDOL in our for-fee analysis of IDOL. This document reviews several important patents germane to the Autonomy system. You can purchase a copy of this analysis at https://gumroad.com/l/autonomy.
In a Fervent Defense, an old battle line is reactivated. The “frequentists” are not exactly thrilled with the rise of Bayesian methods. Autonomy emerged from Cambridge University when some of the Bayesian methods were revealed as crucial to World War II activities. Freqeuntists point out that there are some myths about Bayesian methods. The write up is not for MBAs, failed Web masters, and unemployed middle school teachers. For example, the myths allegedly dispelled in the article are:
- “Bayesian methods are optimal.
- Bayesian methods are optimal except for computational considerations.
- We can deal with computational constraints simply by making approximations to Bayes.
- The prior isn’t a big deal because Bayesians can always share likelihood ratios.
- Frequentist methods need to assume their model is correct, or that the data are i.i.d.
- Frequentist methods can only deal with simple models, and make arbitrary cutoffs in model complexity (aka: “I’m Bayesian because I want to do Solomonoff induction”).
- Frequentist methods hide their assumptions while Bayesian methods make assumptions explicit.
- Frequentist methods are fragile, Bayesian methods are robust.
- Frequentist methods are responsible for bad science
- Frequentist methods are unprincipled/hacky.
- Frequentist methods have no promising approach to computationally bounded inference.”
The key point is that HP is going to learn, already has learned, or learned and just forgotten that Bayesian methods are not a suitable for every single information processing application. In fact, using Bayesian when a frequentist method is more appropriate can produce unsatisfactory results for a discriminating data scientist. The use of frequentist methods when Bayesian is more appropriate can yield equally dissatisfying outputs.
The point is that if one buys a system built on one method and then applies it inappropriately, the knowledgeable user is going to be angry. It is possible that some disappointed users will take legal action, demand a license refund, or just hit the conference circuit and explain why such and such a system was a failure.
Will HP put the three ring circus of buying Autonomy to rest and then find itself mired in the jaws of a Bayesian versus frequentist dispute? My hunch is, “Yep.”
Could HP have convinced itself that Autonomy was a universal fix it kit for information processing problems? If the answer is, “Yes,” then HP is going to have to come to grips with licensees who are going to point out that the solution did not cure the problem.
In short, HP faces more excitement. The company will not be “idle” any time soon. HP may not be amused, but I am. Search is indeed a bit more difficult than some would have customers believe.
Stephen E Arnold, February 19, 2014
Marvel Introduced by Elasticsearch to Monitor and Manage Data Extraction
February 17, 2014
The article titled Elasticsearch Debuts Marvel To Deploy And Monitor Its Open Source Search And Data Analytics Technology on TechCrunch provides insight into Marvel, which the article calls a “deployment management and monitoring solution.” Elasticsearch is a technology for extracting information from structured and unstructured data and its users include such big names as Netflix, Verizon and Facebook among others. The article explains how Marvel will work to manage Elasticsearch,
“Enter Marvel, Elasticsearch’s first commercial offering, that makes it easy to run search, monitor performance, get visual views in real time and take action to fix things and improve performance. Marvel allows Elasticsearch system operators, who manage the technology at companies like Foursquare, see their Elasticsearch deployments in action, initiate instant checkup, and access historical data in context. Potential systems issues can be spotted and resolved before they become problems, and troubleshooting is faster. Pricing starts at $500 per five nodes.”
Elasticsearch reported that their revenue growth in 2013 was at over 400% and Marvel will only further their popularity. Already a user-friendly and lightweight technology, Elasticsearch is targeting developers interested in real-time discernibility of their data. Marvel may be great news for Elasticsearch and its users, but is certainly bad news for competitor Lucid Imagination.
Chelsea Kerwin, February 17, 2014
Sponsored by ArnoldIT.com, developer of Augmentext
Advice on Making the Most of Limited Data
February 12, 2014
The article How To Do Predictive Analytics with Limited Data from Datameer on Slideshare suggests that Limited Data may replace Big Data in import. The idea of “semi-supervised learning” is presented to handle the difficulties associated with creating predictions based on limited data such as expense and manageability and simply missing key data. The overview states,
“As it turns out, recent research on machine learning techniques has found a way to deal effectively with such situations with a technique called semi-supervised learning. These techniques are often able to leverage the vast amount of related, but unlabeled data to generate accurate models. In this talk, we will give an overview of the most common techniques including co-training regularization. We first explain the principles and underlying assumptions of semi-supervised learning and then show how to implement such methods with Hadoop.”
The presentation summarizes possible approaches to semi-supervised learning and the assumptions it is possible to make about unlabeled data (these include such models as clustering, low density and manifold assumptions). It also covers the concepts of Label Propagation and Nearest Neighbor Join. However, as inviting as it is to forget Big Data, and switch to predictive analytics with Limited Data the suggestion may sound too much like Bayes-Laplace.
Chelsea Kerwin, February 12, 2014
Sponsored by ArnoldIT.com, developer of Augmentext
Attivio and Quant5 Partner to Meet Challenges of Data Analytics
February 11, 2014
The article on PRNewswire titled Attivio and Quant5 Partner to Bring Fast and Reliable Predictive Customer Analytics to the Cloud explains the partnership between the two analytics innovators. Aimed at producing information from data without the hassle of a team of data scientists, the partnership promises to effectively create insights that companies will be able to act on. The partnership responds to the growing frustration some companies face with gleaning useful information from huge amounts of data. The article explains,
“Attivio built its business around the core principle that integrating big data and big content should not require expensive mainframe legacy systems, handcuffing service agreements, years of integration and expensive data scientists. Attivio enterprise customers experience business-changing efficiency, sales and competitive results within 90 days. Similarly, Quant5 arose from the understanding that businesses need simple, elegant solutions to address difficult and complex marketing challenges. Quant5 customers experience increased revenues, reduced customer churn and an affordable and fast path to predictive analytics.”
The possibility of indirect sales following in the footsteps of Autonomy and Endeca does seem to be a part of the 2014 tactics. The Attivio–Quant5, Inc. solutions are offered in five major areas of concern: Lead & Opportunity Scoring, Customer Segmentation, Targeted Offers, Product Usage and Product Relationships.
Chelsea Kerwin, February 11, 2014
Sponsored by ArnoldIT.com, developer of Augmentext
Government Buys into Text Analytics
February 7, 2014
What do you make of this headline from All Analytics: “Text And The City: Municipalities Discover Text Analytics”? Businesses have been using text mining software for awhile and understand the insights it can deliver to business decisions. The same goes for law firms that must wade through piles of litigation. Are governments really only catching onto text mining software now?
The article reports on several examples where municipal governments have employed text mining and analytics. Law enforcement agencies are using it to identify key concepts to deliver quick information to officials. The 311 systems, known as the source of local information and immediate contact with services, is another system that can benefit from text analytics, because it can organize and process the information faster and more consistently.
There are many ways text analytics can be helpful to local governments:
“Identifying root causes is a unique value proposition for text analytics in government. It’s one thing to know something happened — a crime, a missed garbage collection, a school expulsion — and another to understand where the problem started. Conventional data often lacks clues about causes, but text reveals a lot.”
The bigger question is will local governments spend the money on these systems? Perhaps, but analytic software is expensive and governments are pressured to find low-cost solutions. Expertise and money are in short supply on this issue.
Whitney Grace, February 07, 2014
Sponsored by ArnoldIT.com, developer of Augmentext
Quote to Note: Big Data Skill and Value Linked
February 6, 2014
Tucked in “The Morning Ledger: Companies Seek Help Putting Big Data to Work” was a quote attributed to SAS, a vendor of statistical solutions and software. The quote:
David Ginsberg, chief data scientist at SAP, said communication skills are critically important in the field, and that a key player on his big-data team is a “guy who can translate Ph.D. to English. Those are the hardest people to find.”
I have been working through patent documents from some interesting companies involved in Big Data. The math is somewhat repetitive, but the combination of numerical ingredients makes the “invention” it seems.
One common thread runs through the information I have reviewed in preparation for my lectures in Dubai in early March 2014. Fancy software needs humans to:
- Verify the transforms are within acceptable limits
- Configure thresholds
- Specify outputs often using old fashioned methods like SQL and Boolean
- Figure out what the outputs “mean”.
With search and content processing vendors asserting that their systems make it easy for end users to tap the power of Big Data, I have some doubts. With most “analysts” working in Excel, a leap to the types of systems disclosed in open source patent documents will be at the outer edge of end users’ current skills.
Big Data requires much of skilled humans. When there are too few human Big Data experts, Big Data may not deliver much, if any, value to those looking for a silver bullet for their business.
Stephen E Arnold, February 6, 2014
The Future of Business Intelligence
January 26, 2014
In the article titled Business Intelligence Usage Evolving Subtly on Smart Data Collective it is made apparent that new developments in business intelligence and analytics are still growing. The article assumes that the 2013 trend in cloud computing popularity will continue into 2014.
Looking further ahead, the article states:
“There could soon be a whole new BI paradigm, in which many affordable analysis processes are created at once, rather than devoting the whole budget to one effort. Enterprise Apps Today explained that this is another natural role for the cloud, with good projects surviving and poor options falling by the wayside, all without the effort or funding that would be necessary to accomplish the same on-site.”
The article cites a MarketsandMarkets survey that concluded that BI would be found useful in many sectors. More specifically, “the source indicated that the technology will grow at a rate of 8.3 percent through 2018.” That would mean a value of $20.8 billion in 2018, up from the current worth of $13.9 billion. However, others are less optimistic, believing the slow evolution of business intelligence may be too snail-like, since business intelligence is currently meeting sales resistance in France, as we reported in the article Business Intelligence: Free Pressure for Fee Solutions. Perhaps subtle is not enough?
Chelsea Kerwin, January 26, 2014
Sponsored by ArnoldIT.com, developer of Augmentext
A Formula for Selling Content Processing Licenses
January 23, 2014
Do equations sell? Some color:
I know that I received negative feedback when I described the mathematical procedures used for Google’s semantic search inventions. I receive presentations and links to presentations frequently. Few of these contain mathematical expressions. In my forthcoming no-cost discussion of Autonomy from 1996 to 2007, I include one equation. I learned my lesson. Today’s search and content processing truth seekers want training wheels, not higher level math. I find this interesting because as systems become easier to use, the fancy math becomes more important.
Anyway, imagine my surprise when I received a link to a company founded 14 years ago. The outfit does business as Digital Reasoning, and it competes with Palantir (a segment superstar), IBM i2 (the industry leader for relationship analysis), and Recorded Future (backed, in part, by the Google). Dozens of other companies chase revenues in this content processing sector. Today’s New York Times includes a content marketing home run by an outfit called YarcData. You can find this op ed piece by Tim White on page A 23 of the dead tree version of the paper I received this morning (January 23, 2014). Now that’s a search engine optimization Pandas and the Times’s demographic can love.
To the presentation. My link points to Paragon Science at http://slidesha.re/1jpXAGd. I was logged in automatically, so you may have to register to flip through the slide deck.
Navigate to slides 33 and following. Slides 1 to 32 review how text has been parsed for decades. The snappy stuff kicks in on page 33. There are some incomprehensible graphics. These Hollywood style data visualizations are colorful. I, unlike the 20 somethings who devour this approach to information, have a tough time figuring out what I am supposed to glean.
At slide 42, I am introduced to “dynamic cluster analysis.” The approach echoes the methods developed by Dr. Ron Sacks-Davis in the late 1970s and embedded in some of the routines of the 1980 system that a decade later became better known as InQuirion and then TeraText.
At slide 44, the fun begins. Here’s an example which I am sure you will recall from your class in chaos mathematics. If you can’t locate your class notes, you can get a refresher at http://bit.ly/1mKR3G9 courtesy of Cal Tech, home of the easy math classes as I learned during my stint at Halliburton Nuclear Utility Services. The tough math classes were taught at MIT, the outfit that broke new ground in industry sponsored educational methods.


