|
|
Post-Web Chemistry:
|
|
ChemWeb is a solid information service, and it deserves praise for the content, execution and stability of the site. The site will improve over time. The use of structured coding schemes such as the Extensible Markup Language (XML), Java and Java Script, and advanced database schemas will allow ChemWeb to offer a richer, more valuable information experience to its users.
Now, shift the focus from the public internet when a multi-billion pound enterprise has built a robust, new service, to an intranet. An intranet, as most people understand the term, is the network that links employees and contract workers within an organisation. Intranets in the last 18 months have shifted from proprietary networking systems to internet technology.
Most organisations offer via their intranets, these services:
What this short summary of functions reveals is that in most organisations, the information ecosystem is not healthy. It may not be practical for two or three research chemists to collaborate via their workstations. In many laboratories, handling three or four simultaneous real time functions may be impossible. How many can simultaneously access technical information, collaborate with colleagues in one of the organisation's laboratories in another geographic region, find the status of a patent filing, or browse spreadsheet data from an experiment conducted two years ago?
Individually, these functions can be handled. Combine them and systems fall over. The stagnant ecosystem has organisms (chemists, actually) working in a type of forced isolation. Even many computing tools exist in isolation. Each tool speeds up a single, specialised task. The overall work process may not be much different from Leonard da Vinci's habit of keeping his papers in a large trunk and hauling it with him when he moved by cart among Florence, Milan and Venice. When something was needed, a manual process of hunting and sorting was necessary. Not much has changed in most chemical laboratories and research facilities. The good news is that high-dimension, databased intranets are coming to organisations with increasing momentum. Even better news lies ahead: these intranets will use internet technology so linking groups becomes somewhat easier.
If Zeus were around, perhaps he would hurl a thunderbolt that would fuse the atomic processes that bedevil professionals into one cohesive mass. One question my consulting teams are asked is, "Are there viable solutions to the extreme fragmentation of systems, data and network services?" Yes, there are solutions. Consider these developments in just three unrelated computing `arenas':
1. Many systems work like one: Java, XML and CML
It must be admitted: databases are not inherently exciting. Of the major applications, a database lurks like valence, necessary but not grabbing the spotlight. The goal of having a single meta-index for all the content in an organisation is difficult to achieve for three reasons:
Addressing all of these points in a single system in a timely, economical manner is a difficult "information problem". Until recently, it was a very, very tough database problem.
Most people know the names of the big three database producers: Oracle, IBM DB2 and SQL Server. But the innovation in databases are not often found in these manufacturers of the Volvo truck. Innovation comes from some of the companies that are below the radar screens of some popular computing journals and the software licensing experts in major pharmaceutical companies. Databases that talk with one another are becoming an increasingly `doable' engineering job.1 There are many exciting products in the market. For our purposes, two examples will help explain where database technology is headed.
Consider thinWEB.com.2 This company has developed a suite of technology to allow extremely rapid access to mainstream databases such as Oracle. thinWEB provides Java tools that allow a developer to pull together information from a range of sources, manipulate it on the user's workstation, and display the results about 35 times faster than other Java systems. The thinWEB technology is used by Sun Microsystems, IBM, Microsoft, New Atlanta Communications and Igate technologies.
What is the benefit of the thinWEB technology? The use of Java as a server side and a client side tool is an important step in the maturation of the language. Organisations can develop complete applications using Java to perform the middleware or `glue' functions. Other technologies play an important role in this aspect of database technology, specifically, XML, Java script (a separate type of scripting language from Java), perl and tcl (Tool Control Language) among others. Putting this string of jargon into more straightforward terms, consider these applications:
Ardent Software has developed extended relational database technology.3 The approach taken by this company is not a cure-all, but it does allow different types of data to be placed in a database structure. The approach is a `database of databases' or a series of `nested databases'. The structure accommodates the meta-indexes that are essential to finding and making sense out of certain data types. From its inception, Ardent's database was designed to support extended relations. Conversion of any of the "big three" databases to an extended structure will, Ardent believes, necessitate complete restructuring and rewriting of most of the modules in those engines. The company's UniVerse products support XML, Java, and other modern programming languages.
What are the benefits of an advanced relational database that can handle objects and speak XML? There are three:
Ardent technologists believe that the combination of these capabilities provides a means of easily storing methods within the database and relating them to their objects: search, retrieval, automating functions and adding `intelligence' to information warehouse functions. Ardent's customers today, include PariBas, CEA and Korean Information Patent Office.
Ardent speaks and understands XML as does thinWEB. Do XML and new database architectures have direct relationship with chemistry? Although XML is a comparatively new information `type', the Chemical Markup Language has been developed to handle chemical information. Information about CML is available from the Open Molecule Foundation.4 The principal advantage of this variant of XML is that it standardises the XML `tags' and supports some of the special requirements of chemical data that would otherwise have to be hand-coded in XML. The Jumbo browser on the following page allows the user to render the representation of a molecule. A click on the data hierarchy allows the user to jump to other information in the dataset. The Jumbo tool is available from the OMF.
The implication of database access and XML / CXL is that multi-object interactions are more easily implemented. The result can be richer pages.
In the last year, the sales of PDAs or Personal Digital Assistants, handheld computers, and ultralight notebooks soared. Several companies have introduced electronic mail appliances which will retail in the United States for about $100.
The idea behind all of these devices is that internet connectivity, like access to designer bottled water, should be everywhere. A quick glance at the chart below shows that an internet dial tone will come first to the United States, Japan and Europe will come as no surprise. These data come from the US Department of Commerce, so the number of internet users is about 40 percent below the estimates for mid-1999. Nevertheless, the snapshot of the disproportionate concentration of internet access in a handful of countries is interesting:
|
The pervasive network will certainly be a reality in countries with strong positions in chemicals, pharmaceuticals, cosmetics and food manufacturing. What we are looking at is a wave of internet infrastructure construction which is followed by spiking use in business, professional and personal applications.
What does this mean to chemists and the professionals working in fields that are closely allied to chemistry?
The Gooey software illustrates how one communications application can flourish in a pervasive network. Gooey is an internet tool enabling people simultaneously browsing the same web site to communicate with each other. With Gooey, chat is not restricted to specific areas or sites but turns into an integral, natural part of one's browser interface. Using Gooey, a user can chat on virtually any site on the either the public web, intranet or extranet (an internet environment limited to people with access permissions regardless of their affiliation).
Gooey breaks the wall between web surfers and turns the web from a labyrinth of HTML pages into a lively, human environment. The Gooey application brings together people who share the same interests and Net habits, creating what the developer calls "the first Dynamic Roving Community".5 Gooey users can receive a constantly updated list of all the other online users on any site they visit. The software has a friendly, intuitive interface. It allows a user to conduct group and private chat and exchange information. Sites supporting Gooey in July 1999 were theNews.com, Codex Data Systems, Zoana.com, and Monsterdaata among others.
A typical Gooey session looks like the web page shown below:
Several points to note about this application are:
Without the complexities of Lotus Notes and other collaboration tools, the Gooey applet creates a more flexible, open `space' in which to interact. Gooey is one in a what will be a long line of collaborative, messaging tools.
Another remarkable innovation in real-time collaboration is the product called Third Voice.6 This California company (www.thirdvoice.com) allows a user to post comments on a web site. These comments can only be viewed if a user is also viewing the specific site with the Third Voice plug-in running. The software runs only within Internet Explorer and exploits proprietary hooks in the Windows environment.
A Third Voice capable site looks like the illustration below when a user opens a page:Applications of Third Web are proliferating in the consumer internet. The only science site using Third Voice is NASA.7 A tool like Third Voice may have some potential applications for distributed research groups. A set of research results can be posted on a secure intranet sites. Members of the team can view the results and post comments.
A good example of the progress made in visualisation is evident
from the figure at the left. These images were prepared by Christopher Leach and
Henry S. Rzepa.9 The authors used Virtual Reality Modeling Language or
VRML to create representations of specific compounds.
One advantage of the VRML technology is that
it allows the chemist to `move around' within the structure. A grayscale image
does not do justice to the richness of the digital construct. Most chemists will
agree that this type of tool, appropriately used, can be a useful adjunct to
certain non-visual methods.
Another approach to visualisation is Chime. Without repeating the heritage of Chime, it runs within a browser, and it has been a useful complement to Roger Sayle's RasMol.10 Chime shows molecules like RasMol, but unlike RasMol, Chime shows the molecules inside a web page. Chime shows only the molecules written into the web page by its author. An interesting feature of Chime is that structures move. The display accommodates multiple displays. A typical display appears below:
With connectivity shifting from stationary internet access to untethered wireless access in some areas, these types of applications are innovation triggers. An `innovation trigger' sets off other applications. It is this cascade of innovations, each seemingly outdoing or extending another innovation, that gives today's pervasive networked environment its hyperspeed. For the chemist with an internet link, a powerful workstation and an interest in pushing the boundaries of collaborative interaction, the day at the laboratory becomes more productive and more interesting.
Many chemists are visual creatures. Chemistry books have long been a feast for the eyes and mind. Until recently, visualisations of chemical structures and real-time visual presentations of reactions were limited to proprietary software running on high-end workstations.
Within the last 12 months, a number of programming innovations have brought visualisation to desktop personal computers. ChemWeb, mentioned earlier in this essay, offers visualisations when Reed-Elsevier's Chime software is running. Synopsys, a UK-based database company, introduced its ActiveX Accord plug in suite. With data running on an appropriate server, a user can manipulate structures using the `Control' module from Internet Explorer to view visual representations of the information in the database and manipulate.8 The full retail product is OLE2 enabled and allows chemistry to be transferred to other desktop applications and chemical editors, such as ChemDraw, with retained chemical-awareness.
What about exploring complex data spaces in which the information is contained in a large database? Visual tools can be extremely useful for getting a different view of the information in a large database. The example used to illustrate this is the Know-It product from Manning & Napier Information Services drawn from a pilot at a major pharmaceutical company. The users of the Know It product included chemists, market analysts, and staff with different technical specialties.
The Know It tool provides a set of tools for looking at the contents of a large database without requiring that the user know the contents of the database before explore it. Know It is a different type of data mining and knowledge access tool. It combines several different features and functions in a browser interface. These include:
A representative view within Know-It appears below:
The Know It software was developed by Manning & Napier Information Services. Additional information is located at www.mnis.com.11
Chemistry stripped to its essentials is similar to fitting pieces of a puzzle together. The combinations of the pieces can be explored in ways. Access to legacy databases containing information once locked behind green screens can be searched, manipulated and copied into today's desktop systems. Chemical information from commercial sources, proprietary databases and public sources can be scanned, manipulated and merged with a click of the mouse. Visual displays in single images, digital video and three dimensional spaces are in the hands of researchers at organisations of all sizes. Students, even at the pre-university level, find the chemistry a different subject from the students who preceded them by three or four years.
With so much progress evident, what is ahead? There are many clever statements about the foolishness resulting when an essaying prognosticates. Treacherous ground prediction is. Several observations are warranted:
First, the technologies and applications built from relatively new building blocks are adding a fifth dimension to the four in which professional chemists and their colleagues work. We are able to cover a subject in its length, width and depth. Journals, research reports and data in electronic form regardless of their location allow a particular compound or class of compounds to be `informationised' in the blink of an eye. What once required days, weeks or months, may still require that much time, but certain basic tasks are accelerated enormously.
The fourth dimension is time. With the three examples of the `thunderhead' touched upon in this paper, one thread unites them: time. Chemical information can be looked at over a period of years or nanoseconds. Indeed, visualisations can replay reactions endlessly with a `pause' button at the tip of one's index finger. Slicing up time and exploring the events in these segments has aided understanding of certain types of reactions and interactions.
The fifth dimension, and the one with the most potential to speed innovation in chemical research and development, is what has been called `hive mind', `collaborative thinking', or `groupware'. The terms are somewhat infelicitous, but the concept each attempts to convey carries chemical research into some exciting new arenas. A short list of the impact of the collaborative revolution includes:
Powerful workstations and large research budgets do not explain the intense excitement associated with working in a networked environment.
Second, chemical innovation has not been exhausted. It is probably risky to say that only the surface of chemical innovation has been penetrated. The possibilities are probably unlimited or large enough to guarantee a flow of innovation for many, many years. We may not have an infinitely extensible `chemical space', but it is large.
Third, the rapid progress in software tools, raw computer capability, faster and more robust network connections and what might be characterised as the `browser paradigm'. It is my view that the browser is a fancy `green screen'. With the release of Windows 2000, optimised Java compilers and tools from many vendors, and the promise of high-capability chips from Intel, the chemical information landscape will be transformed. In a word, `change' is a persistent feature of the chemical professional's work bench.
In closing, it is interesting to think of Florence in the late 15th century. After 500 years of steady progress, it seems as though the internet is likely to become the centre of creativity in the new millennium.
Stephen E. Arnold
Arnold Information Technologies
Postal Box 320
Harrod's Creek, Kentucky 40027
1. Companies that are making interesting strides in database include Computer Associates with products names after television stars, Objectivity, Merant (formerly Intersolv), Information Builders with strong tools for accessing `information' residing on legacy systems.
2. thinWEB.com is located at 6 Antares Drive, Phase III, Suite 101, Ottawa, Ontario K2E 8A9 Canada. The web site is at http://www.thinweb.com/.
3. Ardent is located at 50 Washington Street, Westboro, Massachusetts USA, 01581. The firm's web site is http://www.ardentsoftware.com/.
4. More information is available at http://www.xml-cml.org/.
5. Hypernix Technologies Ltd. is located at 11 Nachmani Street, Tel Aviv, 65497 Israel. The company's web site is at http://www.gooey.com/.
6. Third Voice Inc, 101 Redwood Shores Parkway, Suite 200, Redwood City, California, 94065, USA
7. A list of sites supporting Third Voice appears at http://www.thirdvoice.com/. The Third Voice plug-in must be running in order to access the service.
8. The Accord software is not a free plug in. The retail price in August 1999 was about $700. The web site is http://www.synopsys.co.uk/.
9. Christopher Leach and Henry S. Rzepa,"VRML Models for Analysing Chemical Structure-Activity Relationships," Department of Chemistry, Imperial College, London, SW7 2AY. The data were located at http://ww.ch.ic.ac.uk/rzepa/vrml/panel3.html
10. RasMol uses a proprietary file and requires a separate program. Hundreds of free molecule files are available for download. The software enables easy rotation of the molecule in any direction using the mouse. RasMol home page at http://www.umass.edu/microbio/rasmo.
11. Manning & Napier's research facility developed this software as part of an information and database research program. White papers about the technology appear at http://www.textwise.com/. The organisation's offices are located at 1100 Chase Square, Rochester, New York 14604, USA.