Publishing on the Internet with World Wide Web

Paper presented to
CAUSE in Australasia '94:10-13, July
by
Antony Barry


Contents


Introduction

The last two years has seen a revolution in techniques and equipment required to publish electronically. From initially the domain of a major organisation and the plaything of a few technically minded we have moved to a situation where there are approaching 10,000 information servers deployed at over 2,000 institutions. The resources needed to publish electronically have also changes markedly as the capacity of workstations grew supported by the peer to peer networking of the internet. A few years ago major machine resources were needed to deliver significant material over networks and graphics required propriety software and thence limitations of what platforms could view the published material. We now have a situation where the current base level desktop machines, with at most, the addition of some main memory, can, and have delivered publications and services of even national or international importance.

What is a document?

This technological revolution also challenges our concept of what a document might be. Steeped in the limitations of print technology we think of a document as a static artifact produced after much labour and effort and created whole and forever unchanging. While revised and new editions may be produced they in their turn are seen as discrete items. In contrast, an electronic 'document' can, and perhaps should be, dynamic, modified and improved as more information comes to hand. A number of the 'pages' produced under the World Wide Web protocol are of this form normally showing the data of last modification.

Academic reward system

The academic reward system is based up on the quantity and quality of the output of printed publications therefore the inertia in the print based Journal system is immense. A major social change will be needed before the idea of malleable documents becomes acceptable for establishing academic worth. Before electronic publications come into the mainstream electronic publications must be recognised as academically valid and dynamic electronic publications do not fit easily in to the existing model. For them to be recognised we may need to adopt a viewpoint that it is not past publications that are a measure of an academics worth but the number of existing publications' that are still being developed and maintained with new material.

Network based communication

Eric Wainwright in his paper to the VALA conference last year[1] reminded us that Libraries exist in their present form because books are physical artifacts but the the function they serve was that of communication. This applies equally to publishing. This has important implications as we shift to a world where the principal form of communication becomes based on networks.

Prior to the invention of printing literacy was unusual and scholars were few in number and scattered. Written communication between them would travel by foot or pack animal and to send the same material to a number of recipients could only be accomplished by laborious copying.

The reinvention of printing in Europe 500 years ago was a revolution in human communication which led, among other things, to the publishing system we know today. In particular, 300 years ago the scholarly journal was invented which has been the main form of communication for research findings. The roots remain however in the need to communicate research findings between individuals so that the process of scholarship and research could continue. The journal was invented as it is more efficient to aggregate a number of small writings into one publication than to publish them separately. This has advantages in-

As the need to publish faster has become necessary various other expedients have been used to challenge the journal such as conference proceedings, and the dissemination of preprints and other 'grey' literature.Over the last few years the journal literature has been characterised by The continual creation of new and usually more specialised journals Our long experience with print can blind up to it's limitations. We forget that the large industries that support print - publishers, booksellers and libraries largely exist to perform those functions that print as a communication medium cannot.. The networked systems that are rapidly developing can provide us with a communication medium which in many arenas will perform better than can print but also will perform some of those things that the industries above do now. It is far too early to be able to predict what the outcome for these supporting industries will be. Certainly they will change, perhaps merge but they will not remain static.

So far network publishing has made little impact in what could be loosely called the monograph market. Most of the material which has been published are smaller items, This may not remain true in the future. Already there are experiments in the dissemination of course notes and textbooks, being publish locally at my own institution in the fields of Art History, Contract law and Forestry. The discussion below however is directed more towards the journal literature.

Email groups

The journal literature is a method by which a scholar communicates thoughts and finding to peers in a discipline. The slowness of publication has meant the creation, particularly in science, of 'letters' journals which can respond faster to new announcements. The emergence of focused email and newsgroups on the internet based on shared common interests allow many to many communication within a community normally servers by journals. It is not uncommon in this environment to find draft of papers being posted for comment or the addresses from where they can be downloaded. The 'invisible' college has become visible and now has a much more powerful tool at it's disposal.

Deaggregation of journals

For the reasons given above journals as a grouping of papers on a given topic were created. We are seeing a variety of trends towards deaggregation of their contents as a consequence of network communication.

Various services which provide access to databases of journal tables of contents over the net which started with the Colorado Alliance of Research Libraries Uncover service, bypass end user and library subscription to journals as they provide an alternate source of journals articles. To the end user they appear as a database of articles that can be searched.

The internet is also seeing the appearance of electronic collections of material which have been submitted for publication in print, often in the form of preprint and 'reprint' databases providing a directly available source of the information which by passes the journal. To the end user they appear as a database of papers. Unlike the contents services they are free and the actual step of printing the journal is not necessary to their existence.. Knowing an authors institution, which often amounts to knowing their email address, it may be possible to obtain a copy of a required item from an information service at the institution, direct from a server on the author's desk or be emailing a request to them.

A number of institutions and groups are now bypassing the step of printing material completely and there are approaching a thousand electronic 'journals' with an increasing number being produced via formal refereeing and careful control of quality. So far almost none of these have appears in the secondary sources - the abstracting and indexing services.

New forms of publishing

As I have indicated above our concept of 'publication' based on the limitations of print is of a static document. With this restriction removed there are experiments with a number of different patterns some of which are discussed below. The bulk of these are using the World Wide Web technology.

Some use a "database" approach There are examples where 'documents' are created which are continually updated as more information comes to hand. Examples of these are directories and listings of various kinds but they may also contain commentary. Often these are created as a free information service. When quoting from them it is wise to cite the date of the information as they may be different next time examined. The upper levels of many Web servers fall into this pattern.

The creation of a centralised servers set up on a cooperative basis which 'points' at accredited material mounted locally or on other selected servers chosen by an editorial group is another approach which seems to be growing[2] . This could in part take up the refereeing function of journals. The Firenet network is an example of this[3] .

Groups sharing a common interest my group together, each publishing their own material, but one of the providing a server which has links to the publishing of others allowing them to give a group view of their joint _ material and material published by other on the net of interest to them. The joint server acts in a manner similar to a bibliography, a library catalogue and a document delivery service. The most effective forms of these services are based on the World Wide Web protocol. A Journal model is particularly suited to the situation when on ongoing steam of new material, in part independent of the previous material, need to be published. A paper based model forces the aggregation of sufficient material to form an issue prior to publication. Some electronic equivalents work in the reverse mode. Material is published as it comes to hand. Older material may be archived from the back of the current issue in a form more akin to traditional issues. Some electronic journals do away with the whole concept of issue and publish each item as it comes to hand. Others publish their table of contents directly with instructions how to pick up the issues from a server. This is typical of BITNET Listserv based titles.

The most radical change is to hypertext forms of publication using World Wide Web that print cannot emulate. The linear form of a paper publication can be dispensed with. Cross reference to other parts of the text can be hyperlinks, footnotes are hyperlinks to the associated text and citations to other documents may connect to that document elsewhere on the network. Normally in such documents the authors name is a hyperlink to their 'home' page on a server which may also include a list of their other publications and the text.

Not only is the structure of a document changed but the structure of the literature is affected by this linking ability which allows the work of one author to tie the work of a second author to which they refer via a hypertext link. The citation pattern can be built into the structure of the network as is the delivery of the cited works.

The Future

What sort of pattern is developing on the network? Authors or groups of authors or their institutions can publish material direct to the network where it can be accessed without intermediaries by the readership. This means that much of the need for and organisation required to perform the distribution function is absorbed into the network. The affects serial subscription agents. The reduction in the need for specialist journal publishers impacts on their future. The need to aggregate material into a large enough corpus to be economic to publish on paper is eliminated. This will affect journal publishers..

By giving the end users direct access to material over the network when it is published is in effect is allowing authors to publish direct to the shelves of the library or to the desks of their colleagues

The emergence of cooperative networks which provide pointers to material deemed to be of interest to the group and thus provide a central access point to that material performs in part the function of Journal referees and libraries.

By being able to put hypertext links from a publication to the text of the works cited will impact upon the secondary literature as will the ability of the various 'network walking'[4] indexing schemes which scan the net for material, index it and create databases to access the source directly.

The Main Network Protocols

While email lists and newsgroups share some features in common with newsletters these are of a too ephemeral nature for serious pub;lication Email lists have however been extensively used in the dissemination of electronic journals.

Ftp

The longest deployed mechanism for publication is anonymous File transfer protocol (FIP) archives. Initially used principally as a means of publishing public domain software it has largely retained this role and has not expanded much further. It was seat for the first of the global indexing systems, archie.

Gopher

In the last two years the gopher system, developed at the University of Minnesota has been deployed at the majority of campuses as the core of their campus information systems. While the gopher protocol can deliver any type in practical terms it has been limited it it's use to delivering plain unformatted text files, graphics files and databases of these arranged in a hierarchical menu. Gopher introduced two radical innovations the ability for one gopher server to point at a menu of another and Gateways into other information servers. This technology is now mature, available on virtually all platforms and in the public domain.

World Wide Web (WWW)

Initially developed in CERN in Switzland that aim of WWW was to develop hypertext between documents on the internet. Like gopher, links are possible between different serving machines but unlike gopher documents can contain formatted text and embedded within them links to other documents. They can also contain images and links to other media. WWW uses the Hypertext Transfer Protocol (Hl~P) to deliver documents which are formatted in Hypertext Markup Language (HTML). Most viewing software for WWW comes preconfigured to be able to retrieve the mPntation for the svstem.

Initially the deployment of WWW was slow due to the difficulty in editing HlML documents and the lack of good client software for Macintosh and Windows. Primitive but quite usable HTML editors now exist for Mac, PC and X systems and with the release of Mosaic software by NCSA in November last year the number of servers jumped from 270 then to over 5,000 in June.

Unlike gopher WWW can deliver documents containing images. Like gopher there are gateways to material published in other ways and this includes material published via gopher itself so that a Web document can contain hypertext documents to anything published by gopher or other protocols.WWW is now the dominant form of publishing on the internet. It continues to develop rapidly and now included the ability to capture information interactively from users through screen based forms and supports interactive graphical displays with hypertext links associated with particular parts of the graphic.

Mosaic

Mosaic software developed at the University of Illinois, National Centre for Supercomputing Applications, is the Swiss Army Knife of the Internet. While basically viewing software for WWW it can also be used to view material published by the other protocols. It's significance lies in it's timely arrival to make WWW readily usable and the close development and similarity of interface between the X-Windows, Mac and Windows versions. Version two has recently been released for all platforms introducing more powerful features. Other suppliers have introduced similar software (eg EINet) and commercial versions are expected. There are trade reports that Microsoft will incorporate Mosaic into their next operating system Chicago..

WWW is currently deployed in Australia[5] at-

As the software used for WWW is in the public domain and delivery of information over AARNet is currently free for publishers it provides a cheap and highly effective method of global publication. It's future use will depend upon it's level of adoption for accredited publication but considering the great advantages in it's use this should be not too long delayed. It is early days in the deployment of network publishing and it is not yet clear what the long terms effects will be but it does not seem unreasonable to expect that we are seeing the start of a revolution as big as that which followed the invention of printing.


1 Wainwright, Eric Towards a National Networking Strategy. gopher:/ /gopher.latrobe.edu.au/00/Library%20Services/VALA%20Conferenc e%20Papers/Wainwright.txt

2 Green, D.G. (1994). Network publishing and the World Wide Web.AARNET Newsletter 3 http://life.anu.edu.au/people/dgg/aarnet.html

3 Green, D.G., Gill, A.M. and Trevitt, A.C.F. (1993). FireNet - an international network for landscape fire information. Wildfire Quarterly Bulletin of the International Association of Wildland Fire 2(4), 22-30 http: / /life.anu.edu.au/firenet/fire.html

4 Koster, Martijn World Wide Web Wanderers, Spiders and Robots http://web.nexor.co.uk/mak/doc/robots/robots.html

5 Green, David Australian WWW Servers, http://www.csu.edu.au/links/ozweb.html