How Technology and a Thirst for Knowledge Are
Revolutionizing Access to Content

Copyright 1998-1999 by Stephen E. Arnold.

Stephen E. Arnold, President
Arnold Information Technologies (AIT)

Author's Note: This presentation was prepared at the request of the Special Libraries Association in Washington, D.C. The talk, which included screen shots of the products listed in the hyperlinks below,
addressed information technology and its ability to open new opportunities for information professionals to contribute to furthering more effective use of knowledge resources within an organization (Intranet) or to selected customers, suppliers, and constituents (Extranet). The author participated in an open forum following the talk. The information presented on this Web page may be freely distributed within educational institutions and among library professionals. Any other use of the ideas, conclusions, and opinions in this document require written permission of the author. A CD-ROM containing this text plus the 56 slides that accompanied the talk is available from Arnold Information Technologies. The cost
The cost

What This Lecture Covers

I am delighted to be able to share with you some of my thoughts about the new software and information tools. I will focus these observations on the challenges and the opportunities these new tools represent for libraries, particularly special libraries and special librarians. Special libraries have an important role to play in many organizations because information access is the business of special libraries and their staff.

The Areas of Investigation

  • In the course of gathering information for this subject, the amount of data available is overwhelming. In order to give some shape and form to my observations, I want to cover three subjects.
    First, what are the market realities we must deal with? There are many misconceptions about new software, particularly Internet software, that should be placed in a factual context.
  • Second, I want to review with you a few of the most prescient developments in interfaces and the functions to which they provide access. What will these interfaces ultimately be like? I think video games provide a good idea of our future.
  • Finally, I want to offer a few thoughts about where we will be going or where these new developments will take us.

Libraries Are Important

Let me emphasize that libraries are vitally important. The reasons are obvious, but like Poe’s purloined letter, often overlooked. People are the heart of libraries, not books and data. Libraries’ principal service is to get people into contact with the books and the data. Libraries, therefore, are a nexus.
Second, paper remains an important medium. Paper must be managed and organized in meaningful ways, in books, in journals, in other types of documents. Libraries, whatever they may be called, will be a catalyst in managing this content. Finally, the Internet is another medium but is an environmental medium which supports competition, innovation, mutation, and change--all moving at the speed of the network itself. Just as libraries manage physical audio discs, letters, art, and other media. Libraries, in some form, will leave their imprint on digital collections.

A Big Question

Is there a future for libraries?

Yes, there is. As the Web expands as a tool and more services, libraries will provide access to content. In the U.S., for example, it is popular to envision a world in which any terminal is a portal to a virtual library. However, this vision is only partially accurate. Where there is a lack of infrastructure, old fashioned, brick and mortar libraries will be the portal for online access for many people.
In developed countries, the situation is more complex, but libraries will play a vital role in housing certain types of content so that it is not deleted when a local Internet service provider goes out of business or when a computer crashes and no provision for archiving has been made.

Virtual Libraries But Not Everywhere

If we consider where virtual libraries will gain momentum, it is clear that several countries have the resources--that is, infrastructure, money, and people--to demand and create new types of libraries. These countries are among the most developed on earth: the U.S., Canada, the United Kingdom, France, Germany, and the Nordic countries.

Internet-Poor Countries

There is another group of countries--regions in some cases--where new types of libraries will be needed. These are areas where the impact of the Internet is beginning to make itself known: southeast Asia, Australia, New Zealand, Mexico, and Brazil.

The Signals

How can one determine where the new electronic model of libraries and the brick-and-mortar libraries will undergo fusion. The key is the number of online users as it related to a countries telephones per capita and the gross domestic product per person in that country. Where there is a high GDP and a number of telephones per capita over 0.55 per person, the pressure for fusing traditional and digital libraries will be the greatest. Where the factors are not present, libraries will experience less “digital pressure,” although it will be present to some degree.

Nice to Have, Must Have

The impact of the Internet in the developed and rapidly developing nations cannot be underestimated. Where the Internet diffuses, there is a concomitant series of observable events; namely, rapid change in product and service offerings, an increased demand for software that does certain routine functions automatically or in a way more like an appliance. After all, one does not set rheostats when turning a digital television. The most powerful economic factor is that technology is not perceived by those with resources as “nice to have”; technology is a “must have.” An imperative exists to have a PC, be connected, get an e-mail address, provide the children with a digital encyclopedia, and other common digital services. In this environment, the presentations can be text, of course. But once one is exposed to a game like presentation of content, the user wants a rich media environment. Like it or not, the Web is becoming more a mass medium, almost a TV and PC on steroids.

Enabling Technologies

What are the transforming technologies that are bring this new world into the homes and offices and automobiles of the highly developed countries?

First, wherever there are high speed connections at a cost that the educated can afford, the emergence of a pervasive network--that is, access to an Internet connection and the Hypertext Transfer Protocol--is an evolution of the core Internet access via a dial up modem or a dedicated line.

But there is one important difference. The new networks are smart. Data are cached or stored in multiple places to speed retrieval. Applications and Web sites “remember” what a particular user wants to did on a previous visit. The new environment deals with images, sound, video, and text with almost equal facility. Large domains of content can be examined in lists, maps, or clusters to make it easier to locate the particular fact with the highest knowledge value pay off to the user or customer.

The Dialog Corporation

One example of the new approach to online access can be seen in the Dialog Corporation’s products. The newest Dialog interface is still (at this time) under wraps, but the interface provided to the Muscat service gives an idea of how far traditional online services have come in the last two years. Instead of a naked command line, we have colors, user friendly suggested terms, and point and click ways to limit the query to a domain. Dialog's software offers powerful search and retrieval, but it also offers many specialized functions to respond to users' needs for access to internal and external data. For example, Dialog can index content inside a company in the same way it indexes content from third-party publishers. The customer can adjust the vocabulary and indexing to meet specific requirements. So, smart software in the form of tools that allow a customer to make Dialog work in a specific way is a major new direction for Dialog, a company that strongly resisted change under its former owners at Knight-Ridder.

Why Point-and-Click?

The reason is that people are confronted with many different ways to access information. So, when a service is encounterd, the question is, "How does a person narrow a search?" It is siimply easier and more convenient to have some function a mouse click away. Other functions can be accessed when needed. Even though a person may know a specific command, it is easier to say, "No more Boolean." The user looks at results and then narrows them by clicking on one or more terms to include in a filter of the retrieved set. Muscat can be found at

DR-LINK: Combining Point and Click with Advanced Searching Features

The research and development team at Manning & Napier Advisors has taken a different approach to professional’s online search and retrieval. With customers primarily in the defense and intelligence community, great precision is necessary. The company’s new interface provides access to specialized search features like limiting the query to a particular content domain or specific time limits in a logical, clean presentation of the specific options for each query. The results are displayed in order of relevance. The key point is that virtually no training is required to access powerful search and retrieval features. This interface can be seen at


Second, search software is trying to present large numbers of hits in a more usable form. From a specialized software development company comes a high speed clustering tool. A user can key one or more terms into a search window like this one available from Inference Corp., a firm specializing in customer support search-and-retreival softare.

Categories Built on the Fly

The results are presented under logical headings. What is important in the Inference Find technology is that the cluster headings are developed from the results retrieved by the query. The Find technology provides customer support systems with the tools needed to answer questions that user ask via telephone or electronic mail support services. The key advance is that the indexing and assignment of classification codes is handled by software. The service can be seen at

Old Services with New Names

Once searching and clustering are made easier, the next logical use of the online tools is to automate certain queries, run them each update, and then disseminate the information retrieved from the standing query to those with a need to know these data. Filtering is “new geek speak” for a standing query or SDI (selective dissemination of information). Routing is “new geek speak” for e-mailing the hits or records to those on the distribution list. See, those trained in library science need only learn a couple of new terms to fit comfortably into the world of 18 year old Web wizards. The best-known of the routing technologies are electronic mail delivery of results from Selective Dissemination of Information functions on Dialog, the "old" BRS, ESA/ISA Quest, and other online services dating from the late 1970s. So the third technology development is software that lets the user have more control over what to do with results.

Retrieval Technologies' First Page

Retrieval Technologies, Inc. ( takes a wide range of business information from well-known publishers. The company uses any index terms or tags assigned to the articles by their publishers. In addition, RTI assigns its own terms and provides the user with an alphabetical listing of pre-structured clusters of information on a particular topic. Users can create their own queries. Once a filter is either selected from a list or crafted by the user, the user can specify how many clips to receive on what frequency plus many other options.

User Annotation: An RTI Power Tool

Any user of the RTI system can attach a digital note to an article. In this way, the RTI system allows a person to make a comment and share that comment with any other people in the distribution group.
What sets RTI apart is that the software can index and route documents created within an organization or work group. Without substantial effort, the RTI tools address the need to have internal and external content searchable, filterable, and routable from each worker’s desktop.

Intellisearch: A Similar Service

A variation on this service with slightly different types of content may be seen at

The New NewsEdge

The newly repositioned NewsEdge (a combination of Desktop Data and Individual, Inc.) has expanded its service offering to give the user more control over the filtering and information handling functions.. (


Why do routine tasks manually? A fourth technology development has been the use of agents, which are scripts that perform certain types of information processes automatically. The user intervenes only when necessary.

Autonomy is a company that has crafted a number of powerful software routines that perform different tasks. The Autonomy software can perform most of the functions mentioned above. What sets it apart, however, is the flexibility of the Autonomy toolkit and the software’s ability to handle a wide range of different types of content. Autonomy remains one of the leaders in developing software that can perform a wide range of intelligent functions when indexing, retrieving, and displaying text. Among the technologies “under the hood” of the Autonomy tools are neural network algorithms and a variety of sophisticated engineering techniques. The software and sample services are available at

Agent Software

Have you ever wondered how Amazon knows that you are interested in a particular type of book?

The reason is that Amazon and many other sites have used software that performs two distinct functions.The first is to assign each user a unique identification number so that the user’s actions can be captured in a database.

The second is to examine that set of actions and match it to content or interests that are likely to appeal to that person.Here is what happens. If I click on a computer book and order it, the Amazon software records my interest in computer books, notes that I bought a book about Python, a programming language. Then the next time I log into the system, the software shows me a list of new Python books.Firefly (now owned by Microsoft Corp.) and Net Perceptions are among the leaders in this technology. It may be seen at

Dmars: Agent Software to Watch

One of the most sophisticated agent software has been developed in Australia at the Australian Artificial Intelligence Institute. ( When the system is set up, specific information about the beliefs, desires, and intentions of the system users are entered into a series of tables.

What Dmars Does

Then when a user performs an action, the Dmars (distributed multi-agent reasoning system) monitors, makes decisions about what the user is seeking, and goes and locates additional pertinent information. The information the Dmars system finds is then displayed in a separate window on the user’s computer.
The key advance over systems like Net Perceptions’ is that the Dmars service is performing multiple parallel agent functions in real time. It is like having several research assistants helping during one’s search and retrieval sessions.Annoying? Well, these service can be, but these are early days. Watch for agent technology to improve rapidly over the next 12 to 18 months.

Why Visualization?

The sheer volume of data makes figuring out what to look at very difficult. Not surprisingly, the old adage "A picture is worth a thousand words" is getting considerable attention. The human mind can process about 800 words a minute, maybe somewhat more, maybe less. Looking at a picture for a second or two, the mind can grab the digital equivalent of gigabytes of data in one chunk.

Entering the mainstream are visual displays of content. What’s “visualization” do? The idea is simple, but it requires a robust computing environment to make work. Anything that can be counted--for example, index terms, names of people, places, dates, etc.--can be displayed in a graphic form.
If there is a domain of text and one queries it, the result is a set of records. The index terms or “names” in these records can be processed using various algorithms and then displayed.

One example of what I call the French school of visualization is Semio map. It is at
http://www. Other French innovators like Datops S.A. are moving along similar technical trajectories.
The result is a map that one can use to get a snapshot of the content in the data set or domain. Clicking on a link displays more links or a double click displays a list of hits.


A unit of Manning & Napier, the same group that developed DR-LINK, has created visualization tools as well. These tools can be integrated into different applications. One way to use the tools is to look at content in a series of bar charts. Each quadrant shows the most cited people, places, or things in the set. Quadrants can be combined. With a couple of clicks, the user can locate information about a specific event, see the countries, people, and related activities in a graphical display. A click on any bar shows the hits related to that subject. A demonstration and information is available at

Under the Hood

These attractive displays are computationally intensive. A series of processes take place behind the scenes. Most search-and-retrieval companies guard their methodologies closely. Manning & Napier Advisors has granted AIT permission to provide this high-level, somewhat abstract view, of the various steps or processing modules that the system implements.

These range from loading content to chopping them into logical segments to tag insertion and finally building searchable indices. With the rise in computational power and the decrease in the cost of powerful machines, separate processes can be welded into a system that yields what Peer Boerner, a vice president at Manning & Napier Information Services, calls “intelligent documents."

KnolwedgeX (IBM)

IBM has been a leader in this field of processing text and building comprehensive systems. To give you an idea of what is coming for users of Lotus Notes, envision a large domain of content in a Notes database displayed with links showing how the content fits together.

Link Analysis

Each link shows a relationship among data points. A click on a node reveals the document or documents that contain the information pertinent to the link relationship. These tools may be accessed from

The Memex Approach

Memex, a Scottish company, has carried link analysis a step beyond. Data are loaded into the Memex system and converted to a binary format. When a query is passed to the data, the links or relationships are displayed in a visual tree. Clicking on a node displays the data. The Memex tool builds its clusters automatically. A demonstration and explanation of the automated link analysis tools are available from

i2: Multimedia Clustering and Linking

What does one do when the data to be analyzed consist of non-text objects. In the course of some research project, video, audio, numeric, and hybrid data types are gathered. A company in Cambridge, England, has developed a tool that allows these objects to be added to a data environment. When a query is passed to the system, the user can see links that the system or the user has defined among the data objects. When a video object is clicked, the system runs that video. These media-rich search-and-retrieval environments allow the user a degree of freedom in content analysis. The i2 tools require that the user construct some links manually. Software has no way of knowing what the content of a video or a sound recording is.

Under the Hood

Behind the scenes, the i2 data environment stores all of the information in either its native form in its native file format like Word or in standard database tables. The result is that the i2 tools operate on existing data reducing the requirement for the conversion of source data to a format that a particular system may require. The i2 demonstration is available at

Hyperbolic Maps for Navigation

Figuring out what the content of a large Web site it can be a daunting task. One of the most frequently visited sites on the Internet is Netscape ( To get a fresh approach to this popular service, Inxight Software has applied its hyperbolic map technology to the site. Instead of looking at the graphical pages of Netcenter, one can go to and access the site using a hyperbolic map. Each node on the map represents a topic or content area. A click on the area leads one to the details of that content cluster.

Next Generation Interfaces

Another remarkable approach to search and retrieval is that multi-modal interface from Plumb Design ( The left hand panel contains a real time depiction of the index terms relevant to the content space that one is exploring. There is a key word area so that a term or concept may be entered. A series of icons provides access to other collections or domains of text. The “hit” is displayed in a separate panel on the screen. If a video or sound object is returned, a click on the appropriate icon displays the content.

Flying with Java

The most interesting aspect of this interface is the Java-based flying index. The term or concept that is the focus of the query appears at the center of the terms. The different colors are used to indicate relevance and other attributes of the links associated with the core concept. A click on a node changes the display of content in the viewer panel.

A Way to Explore Content and Know What's New in Real Time

As a result, the user can explore a visualized content space, key new terms, or click on interesting objects. Select a demonstration program. Remember that demonstration software using Library of Congress of Smithsonian data is a tip off that the company is doing development work for government agencies whose work may be classified.

Innovation Drivers

Let’s take a moment to recap these new developments. They are the result of fast machines, relatively low cost memory and storage. Many operating systems now support applications that once were possible in highly specialized computing environments. The shift to computers with multiple processors is well underway. The result is that today’s programs are not one program.

Think of a layer cake. Each layer consists of different processes and activities. All of the cake’s layers are active at the same time. Finally, the shift to this rich computing environment has entered the mainstream. The standard expectation for information is real time news. What this means is that information must flow constantly and be refreshed as events or activities take place. The shift to real time is merely the tactic recognition that that the old world of computing has been left far, far behind.

Another Challenge

There is considerable excitement about online access to full text. Based on analyses we conducted while researching one of our studies, text accounts for only about five percent of the information content created in the U.S. The majority of the content--as measured in digital and analog form--is broadcast audio and video. Most of these data are not available to online searchers.

How can one keep track of video clips. songs, recordings of speeches? Not surprisingly, new software tools are finding their way to market to allow small businesses and researchers to manage these types of multi-media information.

Consider the Magnifi product line. Built on IBM Almaden’s image technology, the Magnifi software allows indexing, searching, and retrieving of text and non-text data. To see this product one can visit CBS Sportsline or go directly to the Magnifi site at

Automatic Indexing and Retrieval of Images

Excalibur Technologies, an imaging company located in Virginia, is best known for its scanning and search and retrieval software. However, the company has ventured into the automatic recognition and indexing of images. The idea is that a large number of images can be digitized and stored in an archive. A user can then find an image and tell the system to locate similar images. Alternatively, the user can specific a particular color, shape, or feature and the system will locate similar images. This product can be seen at A typical query’s results are displayed in thumbnail form. A click on a thumbnail displays the image and any textual metadata for that image.

What’s next in the software for searching and retrieving. With reasonable certainty, we can say that software is getting smarter. Furthermore the emphasis will shift from information that is from unknown sources like most of the information on the public Internet and toward the content that is available from colleagues. In other words, the emphasis will be upon content from Intranet and Extranet environments.
We also know that the “Internet” world is broken into two distinct user segments. There is the consumer of popular Internet. There is the professional Internet. Most of the professionals seeking information will be seeking branded content or information with a clear provenance.
It is also clear to anyone who has dabbled in professional Internet, Intranet, or Extranet applications that size matters. “Size” translates roughly to resources, staff, reach, in short, the mass necessary to build an application, get it deployed, support it, and enhance it. The emerging world of Internet applications are complex constructs and require professional design, management, and support.

Some Everyday Realities

Before closing, it may be useful to examine the organizational realities into which these advanced tools will be integrated.

There are separate islands of technology and software in most organizations. It is unlikely that this situation will be changed in the next 12 to 24 months. Budget considerations and user behavior translate to slow change in many organizations.

There are four separate streams of content. These are financial information systems, electronic filing systems or library systems, sales and marketing information systems, and paper-based “legacy” systems.

The value of the tools built on technologies like those described above will enable managers to retain existing systems, yet gain access to the browser-based systems that are now emerging as a standard in many offices and homes. When software is used to interface with legacy systems without changing the legacy system in a substantial way, that software can be thought of as “wrapper software.” The result is that a person can connect to legacy data without having to know about the legacy systems beneath the wrapper.

Wrapper Software for Information Management

Most wrapper systems are costly. Plumtree ( has created a Windows NT application that allows a work group to build shared information systems. These content objects can be viewed from Windows.

The Plumtree corporate portal system costs about $20,000 and allows anyone to make use of powerful “content management” tools. The software has many options for sharing information, annotating documents, and generating reports for the users. Each of the content domains can be updated by a user. Subsets of the information in the Plumtree can be created. Documents can be routed to others in the group. Different views of the content stored in the Plumtree system can be obtained with a mouse click.

Hewlett-Packard's New Enterprise Information Integration Software

Another powerful tool is available from Hewlett-Packard. The Changengine ( allows an organization to integrate information from many different sources from a single console. Unlike Plumtree, Changengine operates on an organization wide basis. Units integrated can be in the same city or half way around the world.

Broadvision: E-Commerce to Information Management

One of the most exciting companies in this dynamic area of integrated information management is Broadvision. The Broadvision tools ( can integrate information within an organization and also from sources outside of the organization, sometimes called Extranet sources

What Is the Decision Process?

Whether one looks for a departmental tool (Plumtree), and organizational tool (Changengine), or a distributed construct of many entities (Broadvision), four key questions must be answered. There is no specific order, but each must be given due consideration:

  • What does our organization mean by “knowledge”?
  • What are out sources? Which do we value?
  • Who will have access to what information?
  • What technology do we need to accomplish our aims?

Regardless of the enthusiasm many organizations have toward new technology. The traditional ways still play an important role. There will be a need for voice and paper communication. Even hand delivery systems will play an important role for the foreseeable future.

It is equally important to recognize that while most organizations will want to embrace new technology. The principal inhibitor is not the fear of technology; it is the cost of the technology. What this means is that these new tools will diffuse through developed economies when the cost of the technology becomes increasingly affordable.

A Suggested Methodology

The methodology for getting started now without moving ahead in a rash manner is to follow a five step program:

  • First, do a review of one’s current situation, including people, work flow, technology, and interactions of these elements.
  • Second, dig into the infrastructure inside the organization or the external requirements for a service.
  • Third, develop a specification, tactical plan, and budget.
  • Fourth, undertake a pilot or demonstration project.
  • Finally, modify the concept based on the pilot and deploy the system, product, or application.

Points to Keep in Mind

Throughout this series of processes, the developer or team should keep several points in mind:

  • First, the system must do something that people used to do but in a manner that is now more enjoyable and more efficient.
  • Second, the system should allow the user to do the work in a personalized manner.
  • Finally, dull or complex tasks must be eliminated, simplified or redefined

Questions to Ask to Benchmark Progress

Throughout the development or innovation process, several key questions must be asked and answered multiple times:

  • What problem do we have to solve?
  • Does this fit with our culture?
    Do we have the systems staff and infrastructure?
  • Do we have realistic expectations?
  • Have we visited sites where the system is in use?

As we come to the end of this review of tools and the new integrating applications, some broad developments can be identified. These are:

As computing power increases, we have seen that intelligent applications become more common Intelligence is becoming a basic function of software and systems. And people demand visual, integrated systems in today's data rich environment. It is my belief and that of my colleagues at AIT that librarians are among the best equipped to help answer information questions.


There are tremendous opportunities for information professionals. There will be continued great demands on personal learning and growth. We are likely to find ourselves working through new thickets of regulation. There will be continued security concerns. To address these challenges, we need to embrace an experimental, even entrepreneurial, spirit.

Stephen E. Arnold
President, Arnold Information Technology (AIT)

