Viewed on the basis of print material the lack of formal catalogues and other indexing services not only is the internet disorganised but any hope of providing organisation in the traditional manner is slight. However a variety of networked information retrieval tools have emerged which fulfill many of the features of traditional finding tools in a library.
Most publishers issue their own catalogues of material they produce usually limited to material that is in print. The publishing trade issues various "books in print listings" usually limited to one country or language.
Libraries maintain catalogues. In this country alone there are about 2000 libraries each with a catalogue of material it holds. The National Library tries to keep a maintain a union catalogue of all their holdings, certainly of the major libraries, as does the national libraries of other countries.
Within specific subject disciplines abstracting and indexing services endeavor to maintain services to give access to the journal literature. There would be over a thousand of these services. Within each country there may be a national service of this kind which attempts to index local publications.
There are hundreds of years of experience and effort which has built up this complex array of overlapping services which ensure that most material of worth can be identified and located. It should be noted that each of these is selective in some way in what it covers. For a particular question the choice of service chose to locate relevant material is important.
The analogues of many of the print bases services being developed. In one area where the net provides a service which is impractical in the print world and that is the provision of a fairly comprehensive index through automated means across all material.
Paper and printing are expensive as is the distribution of printed products. The capital expenditure required to produce, sell and distribute a best seller or popular journal are such that only large organisations with significant expertise can afford to do so. It is also a business which has significant risks, as copies need to be printed and paid for, before sales are assured. This has certain consequences.
The production and distribution of material is limited by economic factors. publishers will not produce something that will not sell. This limits what is produced.
As production, like other media, goes through the bottleneck of a few organisations, government are able to place controls on what may be distributed to ensure that community tastes are not offended.
The appetite of people to read exceeds their capacity to pay for the printed material. This simple fact together the physical size of books goes a long way to explaining why libraries exist in the form that they do.
The economics of print make the production of larger volumes more cost effective than the very small. The journal exists to aggregate many small items on a similar topic to achieve an economic size.
In a networked world these things all come into question.
A further filtering step also applies. the secondary services, catalogs and abstracting and indexing service are also selective in what the cover and sometime the depth of indexing applied to the material.
This will result in the availability of -
Closer scrutiny of an electronic document will be required to determine its status than would be the case of print
It will be possible to find the material on a subject sought and winnow out the clearly irrelevant "false drops" from the system by hand. There will still remain a major problem that of filtering what is retrieved for quality and suitability which indexes do not address. This is not a problem which can be easily automated.
While the network has reduced the controls and filters which block the publication of material it has not yet provided comprehensive systems which will assist the viewer of the publications to screen chaff from wheat. The task of filtering quality material has been largely shifted from the publisher to the consumer.
The first uses exclusion and filters certain hosts or URLs. This is the approach used by services which seek to exclude objectionable material (however defined) from the end user. These services normally apply at the client.
The second is inclusive and provides information on "approved" sites rather than the reverse. Such services seem so far to be networked based.
Some examples of how access can be filtered follow.
"SurfWatch is a new type of software which helps parents, educators and employers reduce the risk of children and others uncovering sexually explicit material on the Internet."<URL:http://www.surfwatch.com/surfwatch/>
"CensorMan is a series of Perl scripts which allows you to use a web browser to censor particular URLs."<URL:http://www.schnet.edu.au/~lukeh/samples/cm-demo.html>
"Three major players in the Internet software market are spearheading an industry-wide effort to "create and implement standards that will enable parents,<URL:http://www.phillips.com:3200/sample.htm>educators, and other adults to 'lock out' access to inappropriate materials" on the Internet. The I information Highway Parental Empowerment Group (IHPEG) was formed last week by Microsoft Corporation, Netscape Communications, and Progressive Networks in an effort to show legislators that the Internet community can regulate itself --without help from Washington."
"CYBERsitter gives parents the capability to block or be alerted to access of adult-oriented pictures and pornography on the Internet as well as all the popular on-line services. Additionally, CYBERsitter will block access to these types of files from the computer's own hard disk, floppy disks and CD-ROM drives.<URL:http://www.rain.org/~solidoak/cybersit.htm>CYBERsitter works by secretly monitoring all computer activity and when the child tries to download or view an adult-oriented picture, the process is automatically aborted, and/or an alert to the parent is generated for later viewing.
CYBERsitter can also block access to games, personal files or specific programs on the computer that the parents may want to keep children from accessing."
"CIRCIT is conducting a research project for the Schools Council of the National Board of Employment, Education and Training. Our task is to bring together information from schools about their experiences with students' access to the Internet, whether the exposure to controversial materials has proved to be a problem, and what strategies schools are using to deal with the issue. We will be doing this in various ways, including communicating with interested parties using the Internet, via telephone interviews and by site visits to a small number of schools. The project is intended to run until about the end of October 1995."<URL:http://teloz.latrobe.edu.au/circit/schome.html>
OCLC, one major players in the development of services for libraries has set up its Internet Cataloging Project
" to create, implement, test, and evaluate a searchable database of USMARC format bibliographic records, complete with electronic location and access information (USMARC field 856), for Internet-accessible materials."<URL:http://www.oclc.org/oclc/man/catproj/catcall.htm>
Many Vendors of library systems are now offering web server capability to their OPACs and the ability to add URLs to catalogue records instead of call numbers. With this capability libraries can treat material published on the network in much the same way as they do paper publications but make it deliverable via the library catalogue rather than just serving up a citation and a location.
It is unclear how well these manually approaches will hold up compared with some other approaches in the longer term.
Briefly his proposal is to open up physics publishing to a base unrefereed level in a form of preprint archive with items which can be modified with a higher level to which items could be promoted at any time based upon community agreement rather than refereeing.
"In the future, we will add a system of seals of approval (SOAPs). This mechanism keeps readers from being at the mercy of editors. Any article submitted (that does not violate copyrights or present offensive material) will eventually be available to readers, but readers could tell the server which articles to send based on the presence of the seal of approval of some body or individual"<URL:http://www.halcyon.com/jensen/encyclopedia/more/GlEnVolunteer.html> The newsgroup supporting this initiative now appears to be moribund. <URL:news:comp.infosystems.interpedia>
The SOAP concept however had the virtue that it could be extended to a form of refereeing service independent of an publishing server and could be used to validate any URL and which may have been similar to some of the thinking behind the next two schemes.
"We present a prototype environment that facilitates the publishing of documents on the Web by automatically generating meta-information about the document, communicating this to a local scalable architecture, e.g WHOIS++"
<URL:http://www.igd.fhg.de/www/www95/papers/72/publish/publishing.html>
"an architecture, called "ComMentor", which provides a platform for third-party providers of lightweight super-structures to material provided by conventional content providers. It enables people to share structured in-place annotations about arbitrary on-line documents."<URL:http://www-diglib.stanford.edu/rmr/TR/TR.html>
In another paper "Beyond Browsing: Shared Comments, SOAPs, Trails, and On-line Communities" where they describe -
"a system we have implemented that enables people to share structured in-place annotations attached to material in arbitrary documents on the WWW. The basic conceptual decisions are laid out, and a prototypical example of the client-server interaction is given. We then explain the usage perspective, describe our experience with using the system, and discuss other experimental usages of our prototype implementation, such as collaborative filtering, seals of approval, and value-added trails. "<URL:http://www-diglib.stanford.edu/diglib/pub/reports/brio_www95.html>
"A Web of SINs - the nature and organization of Special Interest Networks" which explores the idea more fully. <URL:http://www.csu.edu.au/links/sin/sin.html>
<URL:http://harvest.cs.colorado.edu/harvest/>
The former potentially allows customisation of the meta information fed into the system. The latter envisaged the collection of specialised indexes chosen from what is collected. If these could be coupled with the provision of reviewing information from the same or other sources it might be possible to combine selectivity based upon content from the publisher as well as quality indications provided from a trusted source.
Is there a way to assist the transmission of judgments about material to assist those who seek the information to not only the items which might contain the words which relate to their topic of interest but also filter what might be found by more general criteria of content? Can systems be built which might allow the reader of material to feed back into the network views as to the usefulness of the material read, such that future readers can benefit from those views?
As an information publishing system, the central problem of the network will not be retrieval of information, but filtering what is retrieved to select that which is useful. This is a problem to which developers should address their skills as without filtering mechanisms network users will be swamped by the relevant but unuseful material that they retrieve.