How Technology and a Thirst for Knowledge Are
Revolutionizing Access to Content
Copyright 1998-1999 by Stephen E. Arnold.
Stephen E. Arnold, President
Arnold Information Technologies (AIT)
Author's Note: This presentation was prepared at the
request of the Special Libraries
Association in Washington, D.C. The talk, which included
screen shots of the products listed in the hyperlinks below,
addressed information technology and its ability to open new
opportunities for information professionals to contribute to
furthering more effective use of knowledge resources within an
organization (Intranet) or to selected customers, suppliers,
and constituents (Extranet). The author participated in an open
forum following the talk. The information presented on this Web
page may be freely distributed within educational institutions
and among library professionals. Any other use of the ideas,
conclusions, and opinions in this document require written permission
of the author. A CD-ROM containing this text plus the 56 slides
that accompanied the talk is available from Arnold Information
Technologies. The cost
for a single copy in Powerpoint 97 is US$20 within the United
States, Canada, and Mexico; US$30 elsewhere. Contact: AIT at
sa@arnoldit.com.
What This Lecture Covers
I am delighted to be able to share with you some of my thoughts
about the new software and information tools. I will focus these
observations on the challenges and the opportunities these new
tools represent for libraries, particularly special libraries
and special librarians. Special libraries have an important role
to play in many organizations because information access is the
business of special libraries and their staff.
The Areas of Investigation
- In the course of gathering information for this subject,
the amount of data available is overwhelming. In order to give
some shape and form to my observations, I want to cover three
subjects.
First, what are the market realities we must deal with? There
are many misconceptions about new software, particularly Internet
software, that should be placed in a factual context.
- Second, I want to review with you a few of the most prescient
developments in interfaces and the functions to which they provide
access. What will these interfaces ultimately be like? I think
video games provide a good idea of our future.
- Finally, I want to offer a few thoughts about where we will
be going or where these new developments will take us.
Libraries Are Important
Let me emphasize that libraries are vitally important. The
reasons are obvious, but like Poes purloined letter, often
overlooked. People are the heart of libraries, not books and
data. Libraries principal service is to get people into
contact with the books and the data. Libraries, therefore, are
a nexus.
Second, paper remains an important medium. Paper must be managed
and organized in meaningful ways, in books, in journals, in other
types of documents. Libraries, whatever they may be called, will
be a catalyst in managing this content. Finally, the Internet
is another medium but is an environmental medium which supports
competition, innovation, mutation, and change--all moving at
the speed of the network itself. Just as libraries manage physical
audio discs, letters, art, and other media. Libraries, in some
form, will leave their imprint on digital collections.
A Big Question
Is there a future for libraries?
Yes, there is. As the Web expands as a tool and more services,
libraries will provide access to content. In the U.S., for example,
it is popular to envision a world in which any terminal is a
portal to a virtual library. However, this vision is only partially
accurate. Where there is a lack of infrastructure, old fashioned,
brick and mortar libraries will be the portal for online access
for many people.
In developed countries, the situation is more complex, but libraries
will play a vital role in housing certain types of content so
that it is not deleted when a local Internet service provider
goes out of business or when a computer crashes and no provision
for archiving has been made.
Virtual Libraries But Not Everywhere
If we consider where virtual libraries will gain momentum,
it is clear that several countries have the resources--that is,
infrastructure, money, and people--to demand and create new types
of libraries. These countries are among the most developed on
earth: the U.S., Canada, the United Kingdom, France, Germany,
and the Nordic countries.
Internet-Poor Countries
There is another group of countries--regions in some cases--where
new types of libraries will be needed. These are areas where
the impact of the Internet is beginning to make itself known:
southeast Asia, Australia, New Zealand, Mexico, and Brazil.
The Signals
How can one determine where the new electronic model of libraries
and the brick-and-mortar libraries will undergo fusion. The key
is the number of online users as it related to a countries telephones
per capita and the gross domestic product per person in that
country. Where there is a high GDP and a number of telephones
per capita over 0.55 per person, the pressure for fusing traditional
and digital libraries will be the greatest. Where the factors
are not present, libraries will experience less digital
pressure, although it will be present to some degree.
Nice to Have, Must Have
The impact of the Internet in the developed and rapidly developing
nations cannot be underestimated. Where the Internet diffuses,
there is a concomitant series of observable events; namely, rapid
change in product and service offerings, an increased demand
for software that does certain routine functions automatically
or in a way more like an appliance. After all, one does not set
rheostats when turning a digital television. The most powerful
economic factor is that technology is not perceived by those
with resources as nice to have; technology is a must
have. An imperative exists to have a PC, be connected,
get an e-mail address, provide the children with a digital encyclopedia,
and other common digital services. In this environment, the presentations
can be text, of course. But once one is exposed to a game like
presentation of content, the user wants a rich media environment.
Like it or not, the Web is becoming more a mass medium, almost
a TV and PC on steroids.
Enabling Technologies
What are the transforming technologies that are bring this
new world into the homes and offices and automobiles of the highly
developed countries?
First, wherever there are high speed connections at a cost
that the educated can afford, the emergence of a pervasive
network--that is, access to an Internet connection and the
Hypertext Transfer Protocol--is an evolution of the core Internet
access via a dial up modem or a dedicated line.
But there is one important difference. The new networks are
smart. Data are cached or stored in multiple places to speed
retrieval. Applications and Web sites remember what
a particular user wants to did on a previous visit. The new environment
deals with images, sound, video, and text with almost equal facility.
Large domains of content can be examined in lists, maps, or clusters
to make it easier to locate the particular fact with the highest
knowledge value pay off to the user or customer.
The Dialog Corporation
One example of the new approach to online access can be seen
in the Dialog Corporations products. The newest Dialog
interface is still (at this time) under wraps, but the interface
provided to the Muscat service gives an idea of how far traditional
online services have come in the last two years. Instead of a
naked command line, we have colors, user friendly suggested terms,
and point and click ways to limit the query to a domain. Dialog's
software offers powerful search and retrieval, but it also offers
many specialized functions to respond to users' needs for access
to internal and external data. For example, Dialog can index
content inside a company in the same way it indexes content from
third-party publishers. The customer can adjust the vocabulary
and indexing to meet specific requirements. So, smart software
in the form of tools that allow a customer to make Dialog work
in a specific way is a major new direction for Dialog, a company
that strongly resisted change under its former owners at Knight-Ridder.
Why Point-and-Click?
The reason is that people are confronted with many different
ways to access information. So, when a service is encounterd,
the question is, "How does a person narrow a search?"
It is siimply easier and more convenient to have some function
a mouse click away. Other functions can be accessed when needed.
Even though a person may know a specific command, it is easier
to say, "No more Boolean." The user looks at results
and then narrows them by clicking on one or more terms to include
in a filter of the retrieved set. Muscat can be found at http://www.muscat.co.uk.
DR-LINK: Combining Point and Click with Advanced Searching
Features
The research and development team at Manning & Napier
Advisors has taken a different approach to professionals
online search and retrieval. With customers primarily in the
defense and intelligence community, great precision is necessary.
The companys new interface provides access to specialized
search features like limiting the query to a particular content
domain or specific time limits in a logical, clean presentation
of the specific options for each query. The results are displayed
in order of relevance. The key point is that virtually no training
is required to access powerful search and retrieval features.
This interface can be seen at http://www.mnis.com.
Clustering
Second, search software is trying to present large numbers
of hits in a more usable form. From a specialized software development
company comes a high speed clustering tool. A user can key one
or more terms into a search window like this one available from
Inference Corp., a firm specializing in customer support search-and-retreival
softare.
Categories Built on the Fly
The results are presented under logical headings. What is
important in the Inference Find technology is that the cluster
headings are developed from the results retrieved by the query.
The Find technology provides customer support systems with the
tools needed to answer questions that user ask via telephone
or electronic mail support services. The key advance is that
the indexing and assignment of classification codes is handled
by software. The service can be seen at http://www.inference.com.
Old Services with New Names
Once searching and clustering are made easier, the next logical
use of the online tools is to automate certain queries, run them
each update, and then disseminate the information retrieved from
the standing query to those with a need to know these data. Filtering
is new geek speak for a standing query or SDI (selective
dissemination of information). Routing is new geek speak
for e-mailing the hits or records to those on the distribution
list. See, those trained in library science need only learn a
couple of new terms to fit comfortably into the world of 18 year
old Web wizards. The best-known of the routing technologies are
electronic mail delivery of results from Selective Dissemination
of Information functions on Dialog, the "old" BRS,
ESA/ISA Quest, and other online services dating from the late
1970s. So the third technology development is software that lets
the user have more control over what to do with results.
Retrieval Technologies' First Page
Retrieval Technologies, Inc. (http://www.rtinews.com)
takes a wide range of business information from well-known publishers.
The company uses any index terms or tags assigned to the articles
by their publishers. In addition, RTI assigns its own terms and
provides the user with an alphabetical listing of pre-structured
clusters of information on a particular topic. Users can create
their own queries. Once a filter is either selected from a list
or crafted by the user, the user can specify how many clips to
receive on what frequency plus many other options.
User Annotation: An RTI Power Tool
Any user of the RTI system can attach a digital note to an
article. In this way, the RTI system allows a person to make
a comment and share that comment with any other people in the
distribution group.
What sets RTI apart is that the software can index and route
documents created within an organization or work group. Without
substantial effort, the RTI tools address the need to have internal
and external content searchable, filterable, and routable from
each workers desktop.
Intellisearch: A Similar Service
A variation on this service with slightly different types
of content may be seen at http://www.intellisearchnow.com/.
The New NewsEdge
The newly repositioned NewsEdge (a combination of Desktop
Data and Individual, Inc.) has expanded its service offering
to give the user more control over the filtering and information
handling functions.. (http://www.newsedge.com).
Autonomy
Why do routine tasks manually? A fourth technology development
has been the use of agents, which are scripts that perform certain
types of information processes automatically. The user intervenes
only when necessary.
Autonomy is a company that has crafted a number of powerful
software routines that perform different tasks. The Autonomy
software can perform most of the functions mentioned above. What
sets it apart, however, is the flexibility of the Autonomy toolkit
and the softwares ability to handle a wide range of different
types of content. Autonomy remains one of the leaders in developing
software that can perform a wide range of intelligent functions
when indexing, retrieving, and displaying text. Among the technologies
under the hood of the Autonomy tools are neural network
algorithms and a variety of sophisticated engineering techniques.
The software and sample services are available at http://www.autonomy.com.
Agent Software
Have you ever wondered how Amazon knows that you are interested
in a particular type of book?
The reason is that Amazon and many other sites have used software
that performs two distinct functions.The first is to assign each
user a unique identification number so that the users actions
can be captured in a database.
The second is to examine that set of actions and match it
to content or interests that are likely to appeal to that person.Here
is what happens. If I click on a computer book and order it,
the Amazon software records my interest in computer books, notes
that I bought a book about Python, a programming language. Then
the next time I log into the system, the software shows me a
list of new Python books.Firefly (now owned by Microsoft Corp.)
and Net Perceptions are among the leaders in this technology.
It may be seen at http://www.netperceptions.com.
Dmars: Agent Software to Watch
One of the most sophisticated agent software has been developed
in Australia at the Australian Artificial Intelligence Institute.
(http://aaii.com.au) When the system is set up, specific information
about the beliefs, desires, and intentions of the system users
are entered into a series of tables.
What Dmars Does
Then when a user performs an action, the Dmars (distributed
multi-agent reasoning system) monitors, makes decisions about
what the user is seeking, and goes and locates additional pertinent
information. The information the Dmars system finds is then displayed
in a separate window on the users computer.
The key advance over systems like Net Perceptions is that
the Dmars service is performing multiple parallel agent functions
in real time. It is like having several research assistants helping
during ones search and retrieval sessions.Annoying? Well,
these service can be, but these are early days. Watch for agent
technology to improve rapidly over the next 12 to 18 months.
Why Visualization?
The sheer volume of data makes figuring out what to look at
very difficult. Not surprisingly, the old adage "A picture
is worth a thousand words" is getting considerable attention.
The human mind can process about 800 words a minute, maybe somewhat
more, maybe less. Looking at a picture for a second or two, the
mind can grab the digital equivalent of gigabytes of data in
one chunk.
Entering the mainstream are visual displays of content. Whats
visualization do? The idea is simple, but it requires
a robust computing environment to make work. Anything that can
be counted--for example, index terms, names of people, places,
dates, etc.--can be displayed in a graphic form.
If there is a domain of text and one queries it, the result is
a set of records. The index terms or names in these
records can be processed using various algorithms and then displayed.
One example of what I call the French school of visualization
is Semio map. It is at
http://www. semio.com. Other
French innovators like Datops S.A. are moving along similar technical
trajectories.
The result is a map that one can use to get a snapshot of the
content in the data set or domain. Clicking on a link displays
more links or a double click displays a list of hits.
Textwise
A unit of Manning & Napier, the same group that developed
DR-LINK, has created visualization tools as well. These tools
can be integrated into different applications. One way to use
the tools is to look at content in a series of bar charts. Each
quadrant shows the most cited people, places, or things in the
set. Quadrants can be combined. With a couple of clicks, the
user can locate information about a specific event, see the countries,
people, and related activities in a graphical display. A click
on any bar shows the hits related to that subject. A demonstration
and information is available at http://www.textwise.com.
Under the Hood
These attractive displays are computationally intensive. A
series of processes take place behind the scenes. Most search-and-retrieval
companies guard their methodologies closely. Manning & Napier
Advisors has granted AIT permission to provide this high-level,
somewhat abstract view, of the various steps or processing modules
that the system implements.
These range from loading content to chopping them into logical
segments to tag insertion and finally building searchable indices.
With the rise in computational power and the decrease in the
cost of powerful machines, separate processes can be welded into
a system that yields what Peer Boerner, a vice president at Manning
& Napier Information Services, calls intelligent documents."
KnolwedgeX (IBM)
IBM has been a leader in this field of processing text and
building comprehensive systems. To give you an idea of what is
coming for users of Lotus Notes, envision a large domain of content
in a Notes database displayed with links showing how the content
fits together.
Link Analysis
Each link shows a relationship among data points. A click
on a node reveals the document or documents that contain the
information pertinent to the link relationship. These tools may
be accessed from http://www.knowledgex.com.
The Memex Approach
Memex, a Scottish company, has carried link analysis a step
beyond. Data are loaded into the Memex system and converted to
a binary format. When a query is passed to the data, the links
or relationships are displayed in a visual tree. Clicking on
a node displays the data. The Memex tool builds its clusters
automatically. A demonstration and explanation of the automated
link analysis tools are available from http://www.memex.com.
i2: Multimedia Clustering and Linking
What does one do when the data to be analyzed consist of non-text
objects. In the course of some research project, video, audio,
numeric, and hybrid data types are gathered. A company in Cambridge,
England, has developed a tool that allows these objects to be
added to a data environment. When a query is passed to the system,
the user can see links that the system or the user has defined
among the data objects. When a video object is clicked, the system
runs that video. These media-rich search-and-retrieval environments
allow the user a degree of freedom in content analysis. The i2
tools require that the user construct some links manually. Software
has no way of knowing what the content of a video or a sound
recording is.
Under the Hood
Behind the scenes, the i2 data environment stores all of the
information in either its native form in its native file format
like Word or in standard database tables. The result is that
the i2 tools operate on existing data reducing the requirement
for the conversion of source data to a format that a particular
system may require. The i2 demonstration is available at http://www.i2.co.uk.
Hyperbolic Maps for Navigation
Figuring out what the content of a large Web site it can be
a daunting task. One of the most frequently visited sites on
the Internet is Netscape (http://www.netscape.com).
To get a fresh approach to this popular service, Inxight Software
has applied its hyperbolic map technology to the site. Instead
of looking at the graphical pages of Netcenter, one can go to
http://www.inxight.com
and access the site using a hyperbolic map. Each node on the
map represents a topic or content area. A click on the area leads
one to the details of that content cluster.
Next Generation Interfaces
Another remarkable approach to search and retrieval is that
multi-modal interface from Plumb Design (http://www.plumbdesign.com).
The left hand panel contains a real time depiction of the index
terms relevant to the content space that one is exploring. There
is a key word area so that a term or concept may be entered.
A series of icons provides access to other collections or domains
of text. The hit is displayed in a separate panel
on the screen. If a video or sound object is returned, a click
on the appropriate icon displays the content.
Flying with Java
The most interesting aspect of this interface is the Java-based
flying index. The term or concept that is the focus of the query
appears at the center of the terms. The different colors are
used to indicate relevance and other attributes of the links
associated with the core concept. A click on a node changes the
display of content in the viewer panel.
A Way to Explore Content and Know What's New in Real Time
As a result, the user can explore a visualized content space,
key new terms, or click on interesting objects. Select a demonstration
program. Remember that demonstration software using Library of
Congress of Smithsonian data is a tip off that the company is
doing development work for government agencies whose work may
be classified.
Innovation Drivers
Lets take a moment to recap these new developments.
They are the result of fast machines, relatively low cost memory
and storage. Many operating systems now support applications
that once were possible in highly specialized computing environments.
The shift to computers with multiple processors is well underway.
The result is that todays programs are not one program.
Think of a layer cake. Each layer consists of different processes
and activities. All of the cakes layers are active at the
same time. Finally, the shift to this rich computing environment
has entered the mainstream. The standard expectation for information
is real time news. What this means is that information must flow
constantly and be refreshed as events or activities take place.
The shift to real time is merely the tactic recognition that
that the old world of computing has been left far, far behind.
Another Challenge
There is considerable excitement about online access to full
text. Based on analyses we conducted while researching one of
our studies, text accounts for only about five percent of the
information content created in the U.S. The majority of the content--as
measured in digital and analog form--is broadcast audio and video.
Most of these data are not available to online searchers.
How can one keep track of video clips. songs, recordings of
speeches? Not surprisingly, new software tools are finding their
way to market to allow small businesses and researchers to manage
these types of multi-media information.
Consider the Magnifi product line. Built on IBM Almadens
image technology, the Magnifi software allows indexing, searching,
and retrieving of text and non-text data. To see this product
one can visit CBS Sportsline or go directly to the Magnifi site
at http://www.magnifi.com.
Automatic Indexing and Retrieval of Images
Excalibur Technologies, an imaging company located in Virginia,
is best known for its scanning and search and retrieval software.
However, the company has ventured into the automatic recognition
and indexing of images. The idea is that a large number of images
can be digitized and stored in an archive. A user can then find
an image and tell the system to locate similar images. Alternatively,
the user can specific a particular color, shape, or feature and
the system will locate similar images. This product can be seen
at http://www.excalib.com.
A typical querys results are displayed in thumbnail form.
A click on a thumbnail displays the image and any textual metadata
for that image.
Whats next in the software for searching and retrieving.
With reasonable certainty, we can say that software is getting
smarter. Furthermore the emphasis will shift from information
that is from unknown sources like most of the information on
the public Internet and toward the content that is available
from colleagues. In other words, the emphasis will be upon content
from Intranet and Extranet environments.
We also know that the Internet world is broken into
two distinct user segments. There is the consumer of popular
Internet. There is the professional Internet. Most of the professionals
seeking information will be seeking branded content or information
with a clear provenance.
It is also clear to anyone who has dabbled in professional Internet,
Intranet, or Extranet applications that size matters. Size
translates roughly to resources, staff, reach, in short, the
mass necessary to build an application, get it deployed, support
it, and enhance it. The emerging world of Internet applications
are complex constructs and require professional design, management,
and support.
Some Everyday Realities
Before closing, it may be useful to examine the organizational
realities into which these advanced tools will be integrated.
There are separate islands of technology and software in most
organizations. It is unlikely that this situation will be changed
in the next 12 to 24 months. Budget considerations and user behavior
translate to slow change in many organizations.
There are four separate streams of content. These are financial
information systems, electronic filing systems or library systems,
sales and marketing information systems, and paper-based legacy
systems.
The value of the tools built on technologies like those described
above will enable managers to retain existing systems, yet gain
access to the browser-based systems that are now emerging as
a standard in many offices and homes. When software is used to
interface with legacy systems without changing the legacy system
in a substantial way, that software can be thought of as wrapper
software. The result is that a person can connect to legacy
data without having to know about the legacy systems beneath
the wrapper.
Wrapper Software for Information Management
Most wrapper systems are costly. Plumtree (http://www.plumtree.com)
has created a Windows NT application that allows a work group
to build shared information systems. These content objects can
be viewed from Windows.
The Plumtree corporate portal system costs about $20,000 and
allows anyone to make use of powerful content management
tools. The software has many options for sharing information,
annotating documents, and generating reports for the users. Each
of the content domains can be updated by a user. Subsets of the
information in the Plumtree can be created. Documents can be
routed to others in the group. Different views of the content
stored in the Plumtree system can be obtained with a mouse click.
Hewlett-Packard's New Enterprise Information Integration
Software
Another powerful tool is available from Hewlett-Packard. The
Changengine (http://www.hp.com)
allows an organization to integrate information from many different
sources from a single console. Unlike Plumtree, Changengine operates
on an organization wide basis. Units integrated can be in the
same city or half way around the world.
Broadvision: E-Commerce to Information Management
One of the most exciting companies in this dynamic area of
integrated information management is Broadvision. The Broadvision
tools (http://www.broadvision.com)
can integrate information within an organization and also from
sources outside of the organization, sometimes called Extranet
sources
What Is the Decision Process?
Whether one looks for a departmental tool (Plumtree), and
organizational tool (Changengine), or a distributed construct
of many entities (Broadvision), four key questions must be answered.
There is no specific order, but each must be given due consideration:
- What does our organization mean by knowledge?
- What are out sources? Which do we value?
- Who will have access to what information?
- What technology do we need to accomplish our aims?
Regardless of the enthusiasm many organizations have toward
new technology. The traditional ways still play an important
role. There will be a need for voice and paper communication.
Even hand delivery systems will play an important role for the
foreseeable future.
It is equally important to recognize that while most organizations
will want to embrace new technology. The principal inhibitor
is not the fear of technology; it is the cost of the technology.
What this means is that these new tools will diffuse through
developed economies when the cost of the technology becomes increasingly
affordable.
A Suggested Methodology
The methodology for getting started now without moving ahead
in a rash manner is to follow a five step program:
- First, do a review of ones current situation, including
people, work flow, technology, and interactions of these elements.
- Second, dig into the infrastructure inside the organization
or the external requirements for a service.
- Third, develop a specification, tactical plan, and budget.
- Fourth, undertake a pilot or demonstration project.
- Finally, modify the concept based on the pilot and deploy
the system, product, or application.
Points to Keep in Mind
Throughout this series of processes, the developer or team
should keep several points in mind:
- First, the system must do something that people used to do
but in a manner that is now more enjoyable and more efficient.
- Second, the system should allow the user to do the work in
a personalized manner.
- Finally, dull or complex tasks must be eliminated, simplified
or redefined
Questions to Ask to Benchmark Progress
Throughout the development or innovation process, several
key questions must be asked and answered multiple times:
- What problem do we have to solve?
- Does this fit with our culture?
Do we have the systems staff and infrastructure?
- Do we have realistic expectations?
- Have we visited sites where the system is in use?
As we come to the end of this review of tools and the new
integrating applications, some broad developments can be identified.
These are:
As computing power increases, we have seen that intelligent
applications become more common Intelligence is becoming a basic
function of software and systems. And people demand visual, integrated
systems in today's data rich environment. It is my belief and
that of my colleagues at AIT that librarians are among the best
equipped to help answer information questions.
Conclusions
There are tremendous opportunities for information professionals.
There will be continued great demands on personal learning and
growth. We are likely to find ourselves working through new thickets
of regulation. There will be continued security concerns. To
address these challenges, we need to embrace an experimental,
even entrepreneurial, spirit.
Stephen E. Arnold
President, Arnold Information Technology (AIT)
sa@arnoldit.com
[ Top ] [ AIT Home ] [ Site Map ]
|