Libraries, the Web, Interactive Forms and CGI Scripts
Tony Barry
Centre for Networked
Access to Scholarly Information,
Australian National University
Library
<URL:http://snazzy.anu.edu.au/People/TonyB.html>
Tony.Barry@anu.edu.au
Paper presented to the 1996 VALA Biennial Conference and Exhibition."Electronic Dream? Virtual Nightmare? The Reality for Libraries", Melbourne, 30 Jan
to 1 Feb. 1996
This paper is in two versions. An RTF version for printing and an online version. In the RTF version the hypertext links appear as footnotes and the interractive
form is a screen image.
Introduction
The web is now achieving a central role in libraries as a perusal of Barbara Stewart's
Top 200 Technical Services Benefits of home page development shows as does Acqweb..
This paper assumes some knowledge of html in general, as a system of tagging text
files for a browser to render the meaning of those tags, although much of it is understandable
without such knowledge.
The theme of the paper is that we continue to be in an environment of rapid change
which will redefine the role of publishers and libraries. The last few years has
seen change centred on network, delivery, the integration of images and other media
into documents and the hypertextual revolution which is leading to new forms of cooperative
information delivery. The near future will see a shift of attention to interactivity
and dynamism in documents.
Change continuing
The capabilities described here are merely a "snapshot" of the state of play now and
some speculation on likely development. In such a fast moving area predictions more
than a few months ahead is risky. At the VALA conference in 1991 I speculated that
the Internet was likely to be an important adjunct to communication in research and
should be of valid interest to libraries. I did not expect it to be come as important
as it has done so in so many areas and certainly I did not expect the change to take
place so quickly. The 1993 VALA was held the week the beta version of Mosaic was released
for the Macintosh and Windows platforms. At that conference the activity in Internet
room involving Mosaic upsatged the vendor demonstrations.In my presentationto the
conference I hypothesised that WWW would quickly replace gopher. Again the change
was far faster than I expected. We can only expect the unexpected and take the suggestion
I make in this paper in that light.
New kind of documents - dynamic
Initially network publishing via World Wide Web introduced three revolutionary changes
-
- Delivery became integrated into the publishing mechanism
- Hypertext created enable documents to be linked so that the citation structure of
the net became integrated into publishing.
- A substantial reduction in the price of publishing provided the mechanism to a range
of material to be published that would not have been practicable in the past.
These capabilities and challenged our concepts of what a document is, what constitutes
publishing and what constitutes a library. An example of such a document is Doug
Davis's paper Library Automation on the WWW
New capabilities now face us and through them opportunities. Interactive forms, interfaces
between documents and databases and communication abilities are now providing the
capability to generate documents -
- With in-build assistance to interact with the author.
- Which can interface to searchable databases and which may allow users to modify this
data.
- Which let the reader manipulate models of what is described so as to be able to achieve
better understanding of the content.
Standards & "Netscapisms"
Standards have been central to web development. While much of the debate about standards
has been how they are broken by one vendor or another, or not finalised, the existence
of a core common standards such as http, MIME, HTML 1 and CGI has enable developers to rapidly produce products which could interoperate. Where the standards have
been "broken", notably by Netscape in its browser, it has been by the addition of
non standard capability which did not prevent interoperation with other products.
This has been key to continued develop. Another major factor has been the use of an increasingly
common marketing model on the Internet, of providing a free, or shareware base version,
of software to individuals and educational institutions.
Let us now shift to looking at the details of some of this technology.
Interactive Forms
The purpose of interactive forms in html is to provide techniques that will.
- Display a form which can be filled in by the person reading it.
- Submit the form to a program which can analyse it.
- Using the content of the form to take some action.
- Return a page to the user depended on the content of the form.
The form might be used to -
- Submit information to be manually processes eg a suggestions form.
- Pass data to a program which may analysis in some way and return it to the reader
eg a calculator
- Use the data to interrogate a database and return the result of the search eg and
interface to a library catalogue
- Provide an interface to update a database system eg and end user maintained phone
list.
This ability to provide a graphical interface to an arbitrary program opens up a
degree of platform independence for end users. A web form completed on a web browser
on a machine running one operating system can be submitted to a program on a second
machine with a different operating system running where the web server resides. The
form, the server and the CGI mechanism provides a way to deliver software specific
to one operating system to any other..
Browsers that don't support forms
Older browsers do not support forms. While their number are declining they are still
not insignificant but should rapidly become so. Within institutions the deployment
of more recent versions of software is a difficult support problem as older hardware
will have insufficient capability to run it. An alternative, where a suitable unix
host is available, is to provide lynx software which is a character based WWW clients
which can be accessed via telnet and therefore is even available to dumb terminals.
The loss of graphics through such access will, for some servers, make them virtually
unusable however. Another alternative is to proved a text based form wich can be
printed and mailed of copies and emailed.
Forms tags
A detailed discussion of the tags used to define forms is beyond the scope of this
brief paper which will concentrate on the capabilities of these forms however a brief description is appropriate.
Within an html document a section containing a form is identified by a tag "<FORM>"
which will indicate the location of the script (by a URL) of the program which will
process the result of a submitted form.
A form may be made up of a number of input fields which may be a number of types -
Text
: Such a field will take in typeable characters to pass to the server. A fixed length
field may be specified or a window.
Password:
Where typing will hidden by the browser.
Checkbox
: Which will display a checkbox of a number of alternatives which can be selected.
Radio
: Which will display a radio button for of a number of mutually exclusive alternatives
only one of which can can be selected.
Submit
: A special field which indicates that entry is complete.
Reset
: Which clears all the fields in the form back to their starting values.
Associated with each input field there is -
- A name which is passed back to the processing program
- There may be a default value which can be overwritten when the form is filled in
- A limitation on the size of the field (which may be of multiple lines.
- A scroll able lists which allow you to present multiple, fixed options to the user.
You can choose to display this as a pop-up or pull-down menu, which means that the
user can only select one item or as a scrolling list, where the user can select several items, not necessarily contiguous.
Fields can also be hidden and used to pass back preset default values which can be
set by the writer of the form to identify which version (or location) of a number
of similar forms is returning the data received by a script.
The following diagram illustrates the use of these various elements.
URLs
Data to be submitted to documents via hypertext links is passed in the form of addenda
to the URL for the script handling the data.
Three types of data can be passed.
ISINDEX
: This does not need a form to be used. ISINDEX is used to pass a single string to
a database and which until recently was used as an interface to WAIS databases.
The data is passed appended to the URL with a question mark as a delimiter eg
http://some.host/path/script.cgi?my+data
spaces and coded as '+' symbols and some other characters are also coded
The interface to the Innopac system for instance passes data in this way.
GET
: The form equivalent of ISINDEX, while supported, this method of data passing is retained for back compatibility and should be
avoided.
PUT
: This is the format currently used by forms. data from a form is passed appended to
the URL for the script after a "?" character. The data is passed in pairs consisting
of the name of the data fields and the data itself. Each pair is separated by an
ampersand and the name and data by an equal sign. The special characters =, & and % and
any eight bit characters are as hexadecimal preceded by a percent sign. As with
ISINDEX spaces are coded by plus signs.. Normally you do not have to worry about
this as special routines exist to decode this input. A URL can also include another variable
called a path variable. This cannot be set dynamically from a form and is useful
for sending information which is specific to a form such as identifying which form
is calling the script if there are a variety of forms which use the same script.
CGI scripts
Definition
The Common Gateway Interface is the standard which lets programs be interfaced to a web server and was developed
at NCSA.which provides extensive documentation on its use. In this way a web browser on any platform can be interfaced to a program on a machine
running a web server. This allows for cross platform use of software and can provide a GUI front end to any program. Within limits such an interface
can be hand tailored by end users to suit their purposes. Yahoo provides an extensive
set of links to CGI resources
Languages
CGI scripts are still program and programming skills are still required to achieve
sophisticated results. On the other hand there are a range of packages available,
for instance written in perl, which simplify the writing of the more common form handling scripts. It theory
a wide range of languages can be used. Typically on unix machines the languages
are, perl, C, tcl and a variety of shells. For the Mac there are interfaces for Applescript, perl and a number of more specialised languages including direct interfaces to programs
such as Filemaker Pro. On Windows typical languages are Visual basic, C and perl.
In operation, a call to a cgi script , first collects any parameters which might be
attached to the URL such as those coming from a form, and passes them to the script.
The script then takes whatever action the author required. Normally this would
involve getting the server to send data back to the client, usually in html. As this is
not the only type of file which can be returned the header information sent from
the server must also include information on the file type via a MIME tag.
Alternatives
There are alternatives to using cgi as "glue" between program and server. A number
of database vendors for instance are building web servers into their products enabling
direct interrogation of them over the net via a web browser. The development of Java
also will provide an alternative to cgi scripts and this is explored below.
Another alternative is "on the fly" conversion where a specialised server can be used
to convert documents held to html or other formats when requested from the network
but retaining them in their original format eg Microsoft Word. verity Corporation
in the US and Softlaw in Canberra are following this route.
Image maps
Image maps provide a method of using a graphical, rather than a textual method of
accessing hyperlinks. Rather than associating a hyperlink with a particular sequence
of text it can be associated with an area on a displayed graphic. A possible use
for this is in a diagram which associates further information with particular components
via hyperlinks. An interesting distributed example of this is in the Virtual World Tourist which provides a distributed graphical index based on maps to many of the world's
web sites. The Australian portion of this being maintained by David Green at Charles Sturt University.
While this is some times though to be an example of interactivity it is no more so
than a hypertext menu. On the other hand some interesting visual effects can be
obtained. HTML 3 allows the processing of image maps at the browser level and this
has been implemented in the current beta version of Netscape 2 so the use of this will probably
become much more widespread. The introduction of Java is likely to create substantial
new opportunities for interactive graphical techniques.
Dynamic documents - push, pull
Only supported by Netscape this provides a simple concept of dynamic document where a succession of pages can
be loaded automatically following on from the first. This could be used to -
- Give simple animation by sending a succession of images.
- Run a slide presentation
- Update information from a dynamic source like a news service
Within the HTML 3 specification there is a tag , <META>, where information about the document can be stored (such
as cataloguing information). In this tag you can specify a period in seconds after
which the document should be refreshed ie downloaded again and updated. This can
be set to specify another document and the second in turn can call a third and so on. As
this is instigated by the client it is called "Client Pull".
The server can also force multiple loads by the client by telling the client to stay
connected and sending multiple part documents. This is "Server Push". Examples of
this can be found at the Home Pages server
VRML
The Virtual Reality Markup Language is used to describe three dimensional spaces and
objects which can be accessed, displayed and explored via World Wide Web. Silicon
Graphics has been very active in the development of the standard and it is an area
of active development. In many ways VRML is like the web was two years ago before the
release of good viewers and editors for desktop platforms. Background on VRML can
be obtained for the VRML FAQ and the specifications are available at vrml.wired.com.
The application of this technology in libraries could range from the virtual tours
of library buildings to the creation of new visual forms of catalogue where document
or subjects would float in a three dimensional space and searching would involve
finding the correct neighbourhood and observing which items are nearby. In the short term
the lack of people familiar with 3-D graphics will probably limit its impact than
other in commercial ventures.
Java
Currently it is difficult to open a computer magazine, or quite often business magazines,
without seeing a mention of Sun's Java language and their Hotjava web browser. Early
in December last year Microsoft conceded that its technology would not be ready in time and licensed Java. Netscape intend to build it into their browser. The reason
for this hype is clear as the concepts embodied in Java are opening up a new computing
paradigm.
Developed at Sun, the Java language provides a mechanism where the web can download small applications
called applets, in the same way that the web downloads any file, which can be run
on the local machine regardless of the operating system involved
. Not only can these applets add animation but they can also be used to extend the
capability of the browser so that it can handle, for instance, new data types or
even new network protocols. This should greatly expand the capability of what can
be performed across the network. For instance it opens up the possibility of obtaining software
across the network as needed in small functional parts rather as huge fully featured
applications. It will also reduce the importance of the operating system
Discussion
This technology is clearly going to have powerful implication for libraries. Not
just through new forms of documents which will need to be access or acquired but
also through interfaces which the library provides to its services. It seems certain
that form based interfaces to databases of all kinds will rapidly become the standard way
of delivering information to our clients and will replace telnet and propriety interfaces.
This needs to be considered with another standard which has been supported by libraries for many years - z39.50 - which finally seems to be reaching a stage of active use. While and early version
of z39.50 is widely deployed in the form of WAIS the current version is only just
now reaching widespread deployment with support from LC, RLG and OCLC but more importantly with the endorsement of the US Government through the Government Information Locator
Service (GILS) which mandates it use for Federal agencies.
Catalogues and databases
Already major library ILMS vendors and library database suppliers are providing web
interfaces or z39.50 interfaces. The latter can be access via gateway servers and
the National Library is considering mounting such a gateway. Access to z39.50 is
also being built into Netscape clients with the Windows client already being in alpha test. Some examples are -
Use of forms for administration
Jim Robertson maintains a list of forms applications in libraries from which some of the examples below are drawn.The Yale Medical Library has made extensive use of forms.
An obvious application of forms technology is in taking users requests and ILL is
an obvious place to start. The University of Idaho is one of many which takes requests in this way. Closer to home the REDD project at the university of Queensland has created a whole document request, scanning and
delivery system on PCs with innovative integration of forms technology.
Suggestion boxes have been implemented at LSU Libraries
and although the University of Waterloo's form for questions about service cannot
be accessed off site the way answers are displayed is quite interesting.
Examples of reference use via online search request form are at Benjamin Feinberg Library and the University of Alaska has an "Ask a Reference Librarian"form. Asking reference questions is at the University of Washington.
The University of Kansas Medical Centre has reserve request forms. and purchase requests
at taken at the University of Washington.
Future of cataloguing and indexing
While there are many attempts being made to look at integrating electronic documents
into the traditional processes of the library, and in part the National Library/CAUL
IDA project is addressing that, as is the OCLC Internet cataloguing Project In parallel with these efforts there has been a rapid development of automated web
robots which are scanning the network generating sophisticated keyword indexes with
a surprisingly good level of performance considering the lack of selectivity applied
to the material to which the index is applied. Considering the relatively low cost of
generating such indexes, the high cost of traditional cataloguing applied to the
same material or even selectively can be questioned.With Lycos already now indexing 17 million items the future of such centralised services is
open to question and distributed services like Harvest show more promise and this is being pursued in a joint project involving ANU, the
National Library, ADFA, Charles Sturt University and University of Queensland. This
has not stopped major vendors like Dec entering this area with their new index engine
Alta Vista
Even for the cataloguing of print material the creation of centralised databases like
an Australian version of Ohiolink or the National Library's World1 are open to question
where the network provides opportunities for distributed solutions. Many of the
constraints introduced by the need to deploy widely common software which will work
on any platform which have made such developments difficult in the past may be eliminated
by Java.
ILMS systems
It seems likely that these development cannot leave ILMS systems unaffected. The established
systems are all based on technology that reached its peak in the eighties designed
around a centralised database served by dumb terminals. These older systems do not use client server techniques nor do they recognise the existence of the network
as many have found when trying to pass locational information concerning networked
devices or getting system to print to networked printers which have been designed
to print to dedicated printers attached to terminals. The rapid expansion of Intranets and
the migration of workgroup software to the web cries out for the next generation
of ILMS systems to be designed round this technology.
The development of other standards exchange of information across the Internet such
as EDI and the boost to component software provided by Java suggests to me that the
days of the monolithic ILMS system may be numbered and a more modular approach may
be available.
Conclusions - Interactivity and Libraries
The Internet and libraries are both tools to assist communication between those who
have information that they wish to convey and those that wish to receive that information.
The Internet especially via World Wide Web is opening up new media of communication which challenges the traditional role of libraries. The library community will
have to decide to what extent they will support -
- The use of and access to email based electronic conferences
- Web based BBS and other group work systems
- Control and archiving of dynamic documents
- Access to not just to documents but to authors (mailto URLs in the OPAC form instance.
To ignore these is to be left behind and have other groups take up the running. New
ways of delivering information will require new ways of oranisation information and
providing access to it. These thinngs will challenge both libraries and the profession
of librarianship.
Tony Barry
7 Jan 1996
Feedback on the Paper
To give effect to some of the ideas in this paper a feedback form follows. This raises
a particular problem with such documents as they require a program at the server
end. As this paper may be mounted by VALA it is not possible to guarantee that such
a program could be used. Instead it used a feature - a mailto URL - which is not supported
by all browsers. Such are the compromises we must make at the present time. The form
will use email to transmit the completed form from the browser.
Further paper based reading
While anything in printed form in such a fast moving field will always be out of date
a large number of books have been appearing on web publishing and html. Most of
these are fairly elementary. Current material can always be found on the net at
the W3 consortium site as well as there being useful collections of links at the Yahoo server and at Sun for Java developments.
Three very recent books do give quite good coverage.
Tittlel, Ed and Mark Gaither, Mecklermedia Official Internet World 60 minute guide
to Java, Foster City, IDG Books Worldwide, 1995, ISBN 1-56884-711-4
Lemay, Laura, Teach yourself more web publishing with HTML in a week, Indianapolis,
Sams.net, 1995, ISBN 1-57521-005-3
Tittle, Ed Foundations of World wide web programming with HTML & CGI, 1995, Foster
City, IDG Books Worldwide. ISBN 1-56884-703-3.