Libraries, the Web, Interactive Forms and CGI Scripts


Tony Barry
Centre for Networked Access to Scholarly Information,
Australian National University Library
<URL:http://snazzy.anu.edu.au/People/TonyB.html>
Tony.Barry@anu.edu.au

Paper presented to the 1996 VALA Biennial Conference and Exhibition."Electronic Dream? Virtual Nightmare? The Reality for Libraries", Melbourne, 30 Jan to 1 Feb. 1996
This paper is in two versions. An RTF version for printing and an online version. In the RTF version the hypertext links appear as footnotes and the interractive form is a screen image.

Introduction

The web is now achieving a central role in libraries as a perusal of Barbara Stewart's Top 200 Technical Services Benefits of home page development shows as does Acqweb..

This paper assumes some knowledge of html in general, as a system of tagging text files for a browser to render the meaning of those tags, although much of it is understandable without such knowledge.

The theme of the paper is that we continue to be in an environment of rapid change which will redefine the role of publishers and libraries. The last few years has seen change centred on network, delivery, the integration of images and other media into documents and the hypertextual revolution which is leading to new forms of cooperative information delivery. The near future will see a shift of attention to interactivity and dynamism in documents.

Change continuing

The capabilities described here are merely a "snapshot" of the state of play now and some speculation on likely development. In such a fast moving area predictions more than a few months ahead is risky. At the VALA conference in 1991 I speculated that the Internet was likely to be an important adjunct to communication in research and should be of valid interest to libraries. I did not expect it to be come as important as it has done so in so many areas and certainly I did not expect the change to take place so quickly. The 1993 VALA was held the week the beta version of Mosaic was released for the Macintosh and Windows platforms. At that conference the activity in Internet room involving Mosaic upsatged the vendor demonstrations.In my presentationto the conference I hypothesised that WWW would quickly replace gopher. Again the change was far faster than I expected. We can only expect the unexpected and take the suggestion I make in this paper in that light.

New kind of documents - dynamic

Initially network publishing via World Wide Web introduced three revolutionary changes -

  1. Delivery became integrated into the publishing mechanism
  2. Hypertext created enable documents to be linked so that the citation structure of the net became integrated into publishing.
  3. A substantial reduction in the price of publishing provided the mechanism to a range of material to be published that would not have been practicable in the past.

These capabilities and challenged our concepts of what a document is, what constitutes publishing and what constitutes a library. An example of such a document is Doug Davis's paper Library Automation on the WWW

New capabilities now face us and through them opportunities. Interactive forms, interfaces between documents and databases and communication abilities are now providing the capability to generate documents -

  1. With in-build assistance to interact with the author.
  2. Which can interface to searchable databases and which may allow users to modify this data.
  3. Which let the reader manipulate models of what is described so as to be able to achieve better understanding of the content.

Standards & "Netscapisms"

Standards have been central to web development. While much of the debate about standards has been how they are broken by one vendor or another, or not finalised, the existence of a core common standards such as http, MIME, HTML 1 and CGI has enable developers to rapidly produce products which could interoperate. Where the standards have been "broken", notably by Netscape in its browser, it has been by the addition of non standard capability which did not prevent interoperation with other products. This has been key to continued develop. Another major factor has been the use of an increasingly common marketing model on the Internet, of providing a free, or shareware base version, of software to individuals and educational institutions.

Let us now shift to looking at the details of some of this technology.

Interactive Forms


The purpose of interactive forms in html is to provide techniques that will.

  1. Display a form which can be filled in by the person reading it.

  2. Submit the form to a program which can analyse it.

  3. Using the content of the form to take some action.

  4. Return a page to the user depended on the content of the form.

The form might be used to -

This ability to provide a graphical interface to an arbitrary program opens up a degree of platform independence for end users. A web form completed on a web browser on a machine running one operating system can be submitted to a program on a second machine with a different operating system running where the web server resides. The form, the server and the CGI mechanism provides a way to deliver software specific to one operating system to any other..

Browsers that don't support forms

Older browsers do not support forms. While their number are declining they are still not insignificant but should rapidly become so. Within institutions the deployment of more recent versions of software is a difficult support problem as older hardware will have insufficient capability to run it. An alternative, where a suitable unix host is available, is to provide lynx software which is a character based WWW clients which can be accessed via telnet and therefore is even available to dumb terminals. The loss of graphics through such access will, for some servers, make them virtually unusable however. Another alternative is to proved a text based form wich can be printed and mailed of copies and emailed.

Forms tags

A detailed discussion of the tags used to define forms is beyond the scope of this brief paper which will concentrate on the capabilities of these forms however a brief description is appropriate.

Within an html document a section containing a form is identified by a tag "<FORM>" which will indicate the location of the script (by a URL) of the program which will process the result of a submitted form.

A form may be made up of a number of input fields which may be a number of types -

Text : Such a field will take in typeable characters to pass to the server. A fixed length field may be specified or a window.

Password: Where typing will hidden by the browser.

Checkbox : Which will display a checkbox of a number of alternatives which can be selected.

Radio : Which will display a radio button for of a number of mutually exclusive alternatives only one of which can can be selected.

Submit : A special field which indicates that entry is complete.

Reset : Which clears all the fields in the form back to their starting values.

Associated with each input field there is -

Fields can also be hidden and used to pass back preset default values which can be set by the writer of the form to identify which version (or location) of a number of similar forms is returning the data received by a script.

The following diagram illustrates the use of these various elements.

URLs

Data to be submitted to documents via hypertext links is passed in the form of addenda to the URL for the script handling the data.

Three types of data can be passed.

ISINDEX
: This does not need a form to be used. ISINDEX is used to pass a single string to a database and which until recently was used as an interface to WAIS databases. The data is passed appended to the URL with a question mark as a delimiter eg
http://some.host/path/script.cgi?my+data

spaces and coded as '+' symbols and some other characters are also coded

The interface to the Innopac system for instance passes data in this way.

GET : The form equivalent of ISINDEX, while supported, this method of data passing is retained for back compatibility and should be avoided.

PUT : This is the format currently used by forms. data from a form is passed appended to the URL for the script after a "?" character. The data is passed in pairs consisting of the name of the data fields and the data itself. Each pair is separated by an ampersand and the name and data by an equal sign. The special characters =, & and % and any eight bit characters are as hexadecimal preceded by a percent sign. As with ISINDEX spaces are coded by plus signs.. Normally you do not have to worry about this as special routines exist to decode this input. A URL can also include another variable called a path variable. This cannot be set dynamically from a form and is useful for sending information which is specific to a form such as identifying which form is calling the script if there are a variety of forms which use the same script.

CGI scripts

Definition

The Common Gateway Interface is the standard which lets programs be interfaced to a web server and was developed at NCSA.which provides extensive documentation on its use. In this way a web browser on any platform can be interfaced to a program on a machine running a web server. This allows for cross platform use of software and can provide a GUI front end to any program. Within limits such an interface can be hand tailored by end users to suit their purposes. Yahoo provides an extensive set of links to CGI resources

Languages

CGI scripts are still program and programming skills are still required to achieve sophisticated results. On the other hand there are a range of packages available, for instance written in perl, which simplify the writing of the more common form handling scripts. It theory a wide range of languages can be used. Typically on unix machines the languages are, perl, C, tcl and a variety of shells. For the Mac there are interfaces for Applescript, perl and a number of more specialised languages including direct interfaces to programs such as Filemaker Pro. On Windows typical languages are Visual basic, C and perl.

In operation, a call to a cgi script , first collects any parameters which might be attached to the URL such as those coming from a form, and passes them to the script. The script then takes whatever action the author required. Normally this would involve getting the server to send data back to the client, usually in html. As this is not the only type of file which can be returned the header information sent from the server must also include information on the file type via a MIME tag.

Alternatives

There are alternatives to using cgi as "glue" between program and server. A number of database vendors for instance are building web servers into their products enabling direct interrogation of them over the net via a web browser. The development of Java also will provide an alternative to cgi scripts and this is explored below.
Another alternative is "on the fly" conversion where a specialised server can be used to convert documents held to html or other formats when requested from the network but retaining them in their original format eg Microsoft Word. verity Corporation in the US and Softlaw in Canberra are following this route.

Image maps

Image maps provide a method of using a graphical, rather than a textual method of accessing hyperlinks. Rather than associating a hyperlink with a particular sequence of text it can be associated with an area on a displayed graphic. A possible use for this is in a diagram which associates further information with particular components via hyperlinks. An interesting distributed example of this is in the Virtual World Tourist which provides a distributed graphical index based on maps to many of the world's web sites. The Australian portion of this being maintained by David Green at Charles Sturt University.

While this is some times though to be an example of interactivity it is no more so than a hypertext menu. On the other hand some interesting visual effects can be obtained. HTML 3 allows the processing of image maps at the browser level and this has been implemented in the current beta version of Netscape 2 so the use of this will probably become much more widespread. The introduction of Java is likely to create substantial new opportunities for interactive graphical techniques.

Dynamic documents - push, pull

Only supported by Netscape this provides a simple concept of dynamic document where a succession of pages can be loaded automatically following on from the first. This could be used to -

Within the HTML 3 specification there is a tag , <META>, where information about the document can be stored (such as cataloguing information). In this tag you can specify a period in seconds after which the document should be refreshed ie downloaded again and updated. This can be set to specify another document and the second in turn can call a third and so on. As this is instigated by the client it is called "Client Pull".
The server can also force multiple loads by the client by telling the client to stay connected and sending multiple part documents. This is "Server Push". Examples of this can be found at the Home Pages server

VRML

The Virtual Reality Markup Language is used to describe three dimensional spaces and objects which can be accessed, displayed and explored via World Wide Web. Silicon Graphics has been very active in the development of the standard and it is an area of active development. In many ways VRML is like the web was two years ago before the release of good viewers and editors for desktop platforms. Background on VRML can be obtained for the VRML FAQ and the specifications are available at vrml.wired.com.
The application of this technology in libraries could range from the virtual tours of library buildings to the creation of new visual forms of catalogue where document or subjects would float in a three dimensional space and searching would involve finding the correct neighbourhood and observing which items are nearby. In the short term the lack of people familiar with 3-D graphics will probably limit its impact than other in commercial ventures.

Java

Currently it is difficult to open a computer magazine, or quite often business magazines, without seeing a mention of Sun's Java language and their Hotjava web browser. Early in December last year Microsoft conceded that its technology would not be ready in time and licensed Java. Netscape intend to build it into their browser. The reason for this hype is clear as the concepts embodied in Java are opening up a new computing paradigm.

Developed at Sun, the Java language provides a mechanism where the web can download small applications called applets, in the same way that the web downloads any file, which can be run on the local machine regardless of the operating system involved . Not only can these applets add animation but they can also be used to extend the capability of the browser so that it can handle, for instance, new data types or even new network protocols. This should greatly expand the capability of what can be performed across the network. For instance it opens up the possibility of obtaining software across the network as needed in small functional parts rather as huge fully featured applications. It will also reduce the importance of the operating system

Discussion

This technology is clearly going to have powerful implication for libraries. Not just through new forms of documents which will need to be access or acquired but also through interfaces which the library provides to its services. It seems certain that form based interfaces to databases of all kinds will rapidly become the standard way of delivering information to our clients and will replace telnet and propriety interfaces. This needs to be considered with another standard which has been supported by libraries for many years - z39.50 - which finally seems to be reaching a stage of active use. While and early version of z39.50 is widely deployed in the form of WAIS the current version is only just now reaching widespread deployment with support from LC, RLG and OCLC but more importantly with the endorsement of the US Government through the Government Information Locator Service (GILS) which mandates it use for Federal agencies.

Catalogues and databases

Already major library ILMS vendors and library database suppliers are providing web interfaces or z39.50 interfaces. The latter can be access via gateway servers and the National Library is considering mounting such a gateway. Access to z39.50 is also being built into Netscape clients with the Windows client already being in alpha test. Some examples are -

Use of forms for administration

Jim Robertson maintains a list of forms applications in libraries from which some of the examples below are drawn.The Yale Medical Library has made extensive use of forms.

An obvious application of forms technology is in taking users requests and ILL is an obvious place to start. The University of Idaho is one of many which takes requests in this way. Closer to home the REDD project at the university of Queensland has created a whole document request, scanning and delivery system on PCs with innovative integration of forms technology.

Suggestion boxes have been implemented at LSU Libraries and although the University of Waterloo's form for questions about service cannot be accessed off site the way answers are displayed is quite interesting.

Examples of reference use via online search request form are at Benjamin Feinberg Library and the University of Alaska has an "Ask a Reference Librarian"form. Asking reference questions is at the University of Washington.

The University of Kansas Medical Centre has reserve request forms. and purchase requests at taken at the University of Washington.

Future of cataloguing and indexing

While there are many attempts being made to look at integrating electronic documents into the traditional processes of the library, and in part the National Library/CAUL IDA project is addressing that, as is the OCLC Internet cataloguing Project In parallel with these efforts there has been a rapid development of automated web robots which are scanning the network generating sophisticated keyword indexes with a surprisingly good level of performance considering the lack of selectivity applied to the material to which the index is applied. Considering the relatively low cost of generating such indexes, the high cost of traditional cataloguing applied to the same material or even selectively can be questioned.With Lycos already now indexing 17 million items the future of such centralised services is open to question and distributed services like Harvest show more promise and this is being pursued in a joint project involving ANU, the National Library, ADFA, Charles Sturt University and University of Queensland. This has not stopped major vendors like Dec entering this area with their new index engine Alta Vista

Even for the cataloguing of print material the creation of centralised databases like an Australian version of Ohiolink or the National Library's World1 are open to question where the network provides opportunities for distributed solutions. Many of the constraints introduced by the need to deploy widely common software which will work on any platform which have made such developments difficult in the past may be eliminated by Java.

ILMS systems

It seems likely that these development cannot leave ILMS systems unaffected. The established systems are all based on technology that reached its peak in the eighties designed around a centralised database served by dumb terminals. These older systems do not use client server techniques nor do they recognise the existence of the network as many have found when trying to pass locational information concerning networked devices or getting system to print to networked printers which have been designed to print to dedicated printers attached to terminals. The rapid expansion of Intranets and the migration of workgroup software to the web cries out for the next generation of ILMS systems to be designed round this technology.

The development of other standards exchange of information across the Internet such as EDI and the boost to component software provided by Java suggests to me that the days of the monolithic ILMS system may be numbered and a more modular approach may be available.

Conclusions - Interactivity and Libraries

The Internet and libraries are both tools to assist communication between those who have information that they wish to convey and those that wish to receive that information. The Internet especially via World Wide Web is opening up new media of communication which challenges the traditional role of libraries. The library community will have to decide to what extent they will support -
To ignore these is to be left behind and have other groups take up the running. New ways of delivering information will require new ways of oranisation information and providing access to it. These thinngs will challenge both libraries and the profession of librarianship.

Tony Barry
7 Jan 1996

Feedback on the Paper

To give effect to some of the ideas in this paper a feedback form follows. This raises a particular problem with such documents as they require a program at the server end. As this paper may be mounted by VALA it is not possible to guarantee that such a program could be used. Instead it used a feature - a mailto URL - which is not supported by all browsers. Such are the compromises we must make at the present time. The form will use email to transmit the completed form from the browser.

Feedback Form
Name (Optional)
Email address (Optional)
Did you find the information in this paper useful?
(Select one)
Did you find the presentation -
(Select one)
Did you see the paper presented? Yes No
General comments

Clear entries on the form
Send off your comments


Further paper based reading
While anything in printed form in such a fast moving field will always be out of date a large number of books have been appearing on web publishing and html. Most of these are fairly elementary. Current material can always be found on the net at the W3 consortium site as well as there being useful collections of links at the Yahoo server and at Sun for Java developments.

Three very recent books do give quite good coverage.

Tittlel, Ed and Mark Gaither, Mecklermedia Official Internet World 60 minute guide to Java, Foster City, IDG Books Worldwide, 1995, ISBN 1-56884-711-4

Lemay, Laura, Teach yourself more web publishing with HTML in a week, Indianapolis, Sams.net, 1995, ISBN 1-57521-005-3

Tittle, Ed Foundations of World wide web programming with HTML & CGI, 1995, Foster City, IDG Books Worldwide. ISBN 1-56884-703-3.