DOCUMENT DELIVERY
Network Publishing on the Internet in Australia
Head, Centre for Networked Access to Scholarly Information
Australian National University
Current discussion of document delivery technology and
options are operating under print-based assumptions that there are
physical documents to copy and that these will be delivered from a
centralised service. As electronic publishing using World Wide Web
and its successor technologies become more prevalent the model is
likely to be very different. In this model documents or components
of documents will be mounted on servers which may also be the author's
work station. These documents can be accessed remotely over a network
and the contents viewed on local machines and then manipulated.
This paper addresses the effect electronic publishing
via networks is likely to have on the library profession and libraries.
While other forms of electronic publishing, particularly CD-ROMs,
are being heralded as the great growth industry of the future in the
Prime Minister's Cultural Statement, they, like books, are artefacts.
Their organisation and impact are unlikely to be as great as that
flowing through from network developments.
We are leaving the period when communication was dominated
by paper and moving to one which is electronic. In the dissemination
of information we also appear to be at a watershed where the dominance
of large central organisations, delivering information to relatively
passive recipients, is being challenged by a new information model
driven by the economics of silicon-based products, where individuals
and small groups are empowered to generate information services that
formerly were the domain of larger bodies.
Supporting structures
In the present climate of environmental awareness it is surprising
that print on paper as a communication mechanism does not attract
more criticism. A communication technology based on the wood chip
mill and the effluent of paper making factories is environmentally
flawed. Other than grumbles about packaging, newspapers that are too
large and the needs for paper recycling few suggest that the whole
concept of print on paper as a communication medium should be replaced.
A technology has now arisen which potentially can do so for a wide
range of print products.
Few question the efficiency of print yet the huge infrastructure required
to make print work is all around us. Booksellers and libraries exist
as institutions in the form that they do because books are artefacts
and the bulk of the work of booksellers, librarians and publishers
derives from the physical form of the communication medium. These
three groups largely exist to eliminate the deficiencies in print-based
communication and fill in the functions that the technology does not
perform.
The discussion that follows does not address CD-ROMs as these are
static artefacts and do not introduce the range of new issues as do
networks. While they are electronic in nature, they can be regarded
as roadfill on the information superhighway, being useful for static
material and remote areas lacking network connectivity, and fulfil
the same type of functions that floppy discs do now.
Networked communication also has its deficiencies but they are quite
different to those of print. The supporting professions and industries
required to make this form of communication work are likely to be
quite different to those required for print on paper.
What is the nature of networked publication?
The capabilities that exemplify the challenge provided by networked
publishing are those delivered by gopher and World Wide Web technology.
They are `best practice' as far as present electronic publishing is
concerned. They offer multimedia, are global in scope and have an
ability to link information across multiple machines. The radical
differences manifest themselves in many ways.
Distribution
Distribution mechanisms are built into the technology. In many ways
such publishing is more like a community notice board or a library
reserve collection as only one copy is needed which everyone can see
and copy. The act of publishing has effectively placed the document
directly to the shelves of the network wide library.
Convergence of function
In effect, the warehouse of the publisher, the stock of the bookseller,
the shelves of the library and even the manuscripts of the author,
become the same -- the document available on the network.
Dynamic nature
We are so used to print documents as static, it is difficult to consider
a situation where this constraint of print is removed. Many of our
procedures for producing a publication are based upon the achievement
of quality and finality of content because the version once printed
can no longer be improved.
Volatility
We use databases that are modified on a continuous basis such as library
catalogues. This capability can now be extended to any document that
needs frequent updates such as encyclopedias, loose-leaf services
etc. But this also extends to textbook material that can be continuously
updated rather that being produced in new editions with further print
runs. We need to change our viewpoint to one of saying that everything
should be continuously updated rather than thinking of only making
changes by the creation of a new document. We need good reasons for
material to be maintained in a dated form.
Librarians are therefore faced with providing control over a document
that is dynamic and whose content can change over time. For instance
an author may find that early conclusions on a subject were incorrect
and reverse them. The implication of this is that the stable world
in which an item can be catalogued once and that cataloguing shared
via bibliographical utilities is gone. `Catalogue' entries will need
to be checked against the original document from time to time to ensure
that they are still accurate and the concept of an `edition' will
become unstable.
Link to databases and models
Print documents are passive. Not only do hypertext documents let you
switch from place to place instead of following the linear sequence
of a book; they can also link to dynamic data constructs such as maps
with hot spots or interactive documents such as `live' models where
the reader can enter their own test data. This is in addition to the
usual unprintable media types.
Citing
The growth of knowledge and scholarship is based upon the acknowledgment
of the work of others by being able to cite that work. This allows
the reader to verify that the work has been used appropriately by
using the citation to locate a copy of the source work, usually in
a library, and verify its contents. In a dynamic situation the document
may have moved or changed. However, if a hypertext link to the document
is made instead of a citation then the actual text becomes available
and this bypasses the library as the intermediary supplier.
Archiving
While most publishers will keep some back stocks of their output and
archival copies, libraries have generally taken on the role of providing
long-term central storage of publications and conservation has been
a central concern. The long-term storage of electronic material is
more complex. Across the network mirroring arrangements of copies
at remote sites are being established to ensure primarily a reduction
in access time and network traffic, but also security for the data
mirrored. Almost exclusively this is taking place outside the formal
library system that, while expressing concern about the problem, has
taken little action to solve it. This however is consistent with the
approach taken by all but a handful of institutions to wards the long
term health of acid based paper.
Caching technology
Caching technology, driven by the need to preserve network bandwidth,
is rapidly developing. Rather than collect electronic documents, based
upon individual selection decisions, copies retrieved over the network
are automatically held locally in a cache server while reading software
used by individuals is the automatic first port of call when a remote
item is required. The local cache is checked first and a copy delivered
from there if available. If not, the copy is retrieved from the remote
location, delivered to the user and held in the cache for the next
inquiry. Electronic collection development in a sense becomes a by-product
of the network engineering.
Lack of bounds
In a static print medium the concept of a document is well defined.
It has a physical form and boundary. We are a little more troubled
by journals but we accept a continuously expanding journal with individual
issues that are the `real' items. But electronic documents on the
net give a range of new problems. They do not have 500 years of convention
to establish a set of agreed formats to simplify the description so
the formats are not yet stable. Worse, through hypertext, a single
`conceptual' document may be made up of many interlinked individual
files that not only may be the work of many authors but may be mounted
on many machines not even in the same country let alone the same institution.
The boundaries of a document become imprecise, many distributed parts
making up a `virtual' whole.
Cataloguing the network?
There have been debates on a number of mailing lists (go4lib, web4lib,
pacs-l) about the desirability or otherwise of cataloguing network
resources. OCLC in the US has done some work on this as part of the
USMARC Advisory Group;
OCLC Internet Resources Cataloguing Experiment
and in Britain the CATRIONA
project. Because of the problems mentioned
above it is questionable whether a traditional cataloguing approach
will work. It will certainly have a great deal of trouble in scaling
to the global network that is in effect one library. The prime problems
are:
- A catalogue entry must be rechecked on some regular basis
if done at the user end of the information cycle.
- The location of the material is irrelevant to the need
and ability to access it, so potentially each library might be interested
in all material on the network.
It is not completely facetious to suggest that the level of detail
required in descriptive cataloguing is because a user needs sufficient
information about an item to decide whether to expend the effort to
try and get access to a copy. In a networked environment when this
effort is small the requirement for complex descriptive cataloguing
codes are greatly reduced.
There has been a variety of attempts to provide subject access to
network resources based upon a variety of automated techniques. Almost
totally, these have not emanated from the library community and have
come under frequent criticism. As most of these projects have been
experiments, maintained often by a single enthusiast, or at best a
small group, it is not surprising that they have been less than perfect.
What is however amazing is that these techniques have been able to
regularly regenerate keyword indexes to material housed in thousands
of sites across the world numbering in terms of size the contents
of a major research library in a period of days at most, at negligible
cost.
Much of the failure of these indexes rests upon what they were indexing
-- material obtained from the published source. By doing this the
whole problem of trying to maintain access to a highly dynamic corpus
of information is greatly reduced. These indexes would be far better
if the publishers were able to add information that could be fed into
these indexes.
Classification
There have been a variety of attempts to provide a classified approach
to network resources. At the
ANU the library's gopher has a section
organised by the Library of Congress classification. While a number
of other servers have attempted arrangements based up library classifications
such approaches will not scale up to the full Internet. This is again
because of the dynamic nature of the material that shifts location,
dies or changes in content and quality. Classifications in a limited
subject domain are more common where the corpuses to be dealt with
are more restricted. These seem mostly to be arranged by home grown
classifications or arrangements. Within a restricted domain it is
reasonably easy to devise a scheme of greater rationality than the
traditional library classifications. The classification mechanism
that might scale involve delegation across institutions and this is
the pattern followed by
CERN for its distributed WWW subject approach.
Filtering and quality control
Once an electronic document is written the cost of electronic publishing
only corresponds to the cost of the network traffic and some fraction
of the overheads of supporting the server machine, although some would
add the costs of external edition to achieve uniformity to a desired
standard. As any modern networked desktop workstation is now capable
of acting as a server the overheads are slight. With AARNet's current
and proposed charging the cost of serving a document from universities
is virtually nil. Already because of the lack of cost and the power
given by the hypertext format we are seeing an explosion in electronic
publishing using WWW. It does lead to the prospect of each author
becoming a publisher -- an explosion in vanity presses, a rapid
increase in stylistic variation between documents and experimentation
and an overall drop in quality.
With print material, publishers, journal referees and libraries ensured
that only quality information was readily available. On the network
these constraints on publishing are released. Despite concerns expressed
about retrieval, the easing of these constraints will make the filtering
of quality information out of the material available, the main and
central problem in networked publishing and digital libraries. There
are no easy automated solutions to assessment of quality.
What new support structures?
The environment of networked publishing has the following features:
- Distribution is provided by the pre-existing network infrastructure.
- Material once written and formatted can be published,
as is, on the network with low-level desktop machine. Even that used
by the author with easy-to-use public domain software and Mosaic and
its derivatives can be used to view this material.
- Web robots can collect information from selected machines
and generate globally available indexes.
- Software able to include authentication and charging into
services is starting to be available for commercial services.
- Publicity about new publications can be communicated by
e-mail to target groups at low cost.
- Any individual or group can create annotated lists of
useful services that can act as a gateway to those services --
effectively virtual libraries.
What are the support organisations that will be needed
to make this pattern work?
At least two groups, which may be integrated in one organisation,
would seem to need to be required. The first is the publisher/cataloguer.
For the reasons given the only viable place for cataloguing information
to be inserted is at the publishing stage by the publisher. In this
model the cataloguer/indexer becomes the publicist for the publications
-- as a result of the cataloguing work they become easily retrievable.
The second grouping is the gateway provider. This group selects material
within a restricted domain on the basis of quality and provides a
value-added service for information within that domain. This group
needs to combine a combination of skills including those of bibliographers
and reference librarians, and also subject-based skills. This group
would organise access to quality information. Two examples of interesting
models follow.
The American Mathematical Society
has adopted a model where an interdisciplinary
group of mathematicians, publishers, librarians and computer specialists
has mounted Mathematical Reviews and Current Mathematical
Publications as well as the full text of all their publications
back for 50 years by way of hypertext with a Mosaic front end. The
traditional journal approach provides the filtering, and Mathematical
Reviews provides the indexing information. They have taken control
of the literature of their discipline and may be offering a commercial
service shortly.
Firenet, hosted by ANU,
is a cooperative set of World Wide Web and
gopher servers for discipline specialists in the field of fire management.
In this case librarians have not yet been involved. Publications in
this area are locally mounted and the central server provides a view
of the `quality' material and it is managed by the professional group
themselves.
What does this mean to libraries?
The library as a place exists because books are physical artefacts.
In view of the low use of much material that libraries collect (research
libraries in particular) there will be little incentive, other than
conservation, to retrospectively convert all material. The Japanese
Diet Library is reputed to be planning to convert 5 million items
to machine-readable form but this may because it is in Tokyo which
is likely to suffer a major earthquake. The reconstruction of the
destroyed Sarajevo library in virtual form is also a special case.
We should see libraries continuing their present functions for many
decades. In addition some formats such as linear fictions are very
suited to a print format.
Libraries are also likely to continue their role as access points
to information, especially to those who are disadvantaged in terms
of network access.
Librarians however could see some new opportunities where they may
be in competition with other groups.
In network publishing librarians may well have a role in designing
information systems to ensure good retrieval of the material published.
In particular they should have a valuable role in ensuring that various
automated web robots can effectively retrieve the material.
Librarians will also have opportunities in the provision of value
added service on the network aiding in the design of servers which
point to and deliver quality information in a coherent manner. In
this area librarians have already been very active. Many WWW and gopher
servers have been produced by libraries and library consortia and
this trend is likely to continue. In doing this however libraries
will be in conflict with individuals in the discipline who feel correctly
that they are the best arbiters of quality. Librarians need to get
back to their roots in information management and work with the specialists
to make this approach a valuable one for the profession.
Information Online & On Disc 95