Rough Sets, Ants, and Mereology:

A New Approach to Knowledge Management

Submitted August 14, 2001, Information World Review

In Charlotte, North Carolina, a modern office complex, adjacent the university, serves as the headquarters for a two year old company, NuTech Solutions.

NuTech's founder and Chief Scientist is Dr. Zbigniew Michalewicz. He is the former head of the University of North Carolina's mathematics department. Almost singlehandedly he has kept alive a mathematical tradition extending back more than 100 years to his home town of Warsaw, Poland.

In the early 20th century the Warsaw "school of mathematics," was one of the focal points of theoretical mathematics. Specifically, the faculty and students struggled to reconcile the formalism of classical logic with the type of iterative and self-referential algorithms that gave rise to neural networks, artificial intelligence, and heuristic computational systems.

In the past 15 years, Dr. Michalewicz has sought out many of his fellow students from Warsaw and cultivated hundreds of the world's sharpest mathematical minds on applying insights hammered out in the early 20th century by St. Lesniewski (1916) and extended by himself, Z. Pawlak, L. Polkowski and A. Skowron, in the last decade.

In an interview with Dr. Michalewicz at the company's offices in Charlotte, North Carolina, I asked how the Warsaw school of mathematics found its way to Charlotte, North Carolina.

Dr. Michalewicz said, "Myself and many others had to leave Warsaw because of World War II. I was teaching in New Zealand and happened to know someone at the University of North Carolina. I applied for a job, eventually became the head of the mathematics department, and started NuTech. At each step of the way, I kept trying to pull together the remarkable minds from the Warsaw school. I had to do this because the work we were doing then applies so directly to the many difficult business and engineering problems people face today."

NuTech Solutions was started to bring Dr. Michalewicz's insights to a commercial operation. Although the author of numerous books on advanced mathematics and a speaker at major mathematical conferences worldwide, Dr. Michalewicz is quick to say, "Look at my colleagues. Look at the young people at NuTech Solutions. They are the reason that the mathematical field known as Mereology is so dynamic today. Not me."

Ever humble, Dr. Michalewicz has assembled more than 150 professionals in mathematics, computer science, and allied disciplines. The staff is deployed at offices in three U.S. cities, Dortmund, Germany, and, not surprisingly, Warsaw, Poland. The buzz of multiple languages fills the NuTech offices. White boards are covered with mathematical symbols squeezed between timelines for projects at Siemens, Bank of America, Ford Motor Co., the Polish energy authority, and dozens of other blue-chip firms.

Among the staff are Dr. Michalewicz's brother (also an expert in Mereology), a son (a mathematician who is quick to say, "I'm not as smart as my uncle"), and a couple of cousins. The others in the company come from a rich variety of countries, including China, various Eastern European states, and North America.

The family's interest in mathematics was a "happy accident," said Dr. Michalewicz. "We did puzzles when we were young. It had a big effect on us, I guess. In fact, we have to watch ourselves in meetings because sometimes someone will have a new maths problems, and we want to solve it, of course."

"But I am lucky," he adds. "These bright people from all over the world find me. Mereology is going to have a profound impact on data mining, statistical analysis, intelligent devices, and knowledge extraction and access. I think it is one of the more important advances in mathematics in the last decade."

Mereology is the "technology" inside Dr. Michalewicz's pioneering work in what is now known as "rough sets". The idea is based on work done by the Warsaw school in the early 20th century. (The same work that had a profound impact on Einstein, Bohr, and others.)

The idea behind the mathematical approach in Mereology is the study of part-whole relationships. "In order to solve very tough computational problems, mathematicians need short cuts. But when taking a short cut, one has to have some way of checking to make sure the direction is right. It is no good following a new path if one falls off a cliff," said Dr. Michalewicz.

The concept of Mereology has given Dr. Michalewicz's team the opportunity to devise new approaches to common, very difficult problems. These include optimization of inventory (a variation of the computationally intractable Travel ling Salesman problem), how to distribute nuclear fuel in a reactor's core to operate safely yet extend the useful life of the fuel load, identification of potential instances of financial fraud before the fraud takes place, and similar "interesting problems." The company work has generated four patents with another half dozen making their way through the European and American process.

Dr. Michalewicz demonstrated for me the company's soon-to-be-released search product. The search and retrieval system is based on the company's technology.

Although there are more than 1,000 search-and-retrieval systems on offer today, NuTech's approach is different because it uses the mathematics of Mereology to deliver what Dr. Michalewicz calls "adaptive intelligence." The second distinguishing feature is the speed with which the system can parse, index, and cluster results in the firm's Windows 2000 based architecture. "Yes, computational velocity is one of the side benefits of the rough set technology. The mathematics are complex but the user doesn't need to know anything about math. Just want he or she wants to know. We try to make things simple and efficient. As my students and colleagues will tell you I want to figure out what's the best outcome and then help people get there. In fact, the key to Mereology and rough sets is not how advanced the technique. It is ease of and, of course, good results."

The company's search solution is an intelligent search and knowledge extraction system. The interface provides a query box, hits are dynamically clustered into logical groups. Each hit includes an on-the-fly abstract of the document or other content object, and under each cluster the five best hits are displayed. (See illustration.) The system also offers the user point-and-click access to advanced search functions; for example, narrowing the results set for "more like this" searching.

Data are gathered by a series of spiders that use various artificial intelligence techniques that respond to network latency problems and document completeness on the fly.

The spidered data are processed by the NuTech parsing and indexing system. Data are analyzed and clustered using NuTech's patented algorithms for rough sets. The user interface allows the user to interact with the data with no hidden functions that the user must learn or stumble upon by accident.

The search system uses computational intelligence filtering techniques to organize the documents into groups or clusters. The user can exclude clusters from the document set with a single click.

The system adds standard tags to documents; for example, address, title, date, and abstract. More importantly, the system allows customers to add their own tags as dictated by their internal document sets.

The system generates a taxonomy based on a set of documents. Unlike some of the taxonomy systems offered by other companies, NuTech's does not require that the customer have a controlled vocabulary or list of terms. If the customer has such a list, the NuTech engine can use this as part of its knowledge resource.

The search engine can spider, index, and cluster documents in Lotus Notes, Portable Document Format, HTML, XML, and other popular content types include Word and PowerPoint files.

The technology of rough sets allows NuTech's engine to adapt as new information is spidered and indexed. Reclustering takes place in real time. The reason is that the mathematics of rough sets reduces the computational loads by as much as 90 percent. Linguistic and intelligent clustering have been computationally intensive and expensive. NuTech delivers these features in a code package that runs on a dual processor Windows server with 512 megabytes of RAM.

Experimenting with the company's business information test data set built for the Charlotte North Carolina Chamber of Commerce, I discovered that the clusters of documents were more useful that the folders generated by Northern Light. NuTech's clusters display hits in clearly named groups. Each group displays the top five hits in that cluster for the user's query. Unlike systems from other taxonomy-based systems, NuTech's approach requires no predefined word list or human intervention. "These can be used, if they are available," said Dr. Michalewicz. "But in my experience, most organizations don't have these type of lists or ontologies available. So we built a system that the customer can turn on and use."

NuTech's search-and-retrieval system is available for Intranet and Internet applications. The application of Mereology to search-and-retrieval breaks new ground for NuTech.

Mr. Arnold is an independent consultant working from Harrod's Creek, Kentucky, USA with two boxer dogs and 13 servers. His forthcoming book "Umbrellas, Lift, and Traction: The New Trajectory of the Internet" will be published in May 2001 by Infonortics, Ltd. in Tetbury, Glou.

