Open Access Publishing Comes of Age

December 22, 2006


With the announcement by computer scientist Stevan Harnad (and his Southampton University colleague Robert Tansley) of the October 2000 release of the GNU EPrints package, departments and organisations had what they needed to create cutomised open-source eprint servers.

Mark Muldoon

The electronic preprint archive arXiv (http://arxiv.org/) celebrated its fifteenth birthday last June. As befits a creature of the rapidly evolving World Wide Web, arXiv combines a surprising maturity with vigorous ongoing growth. From a humble start as an e-mail reflector designed to distribute preprints among theorists in high-energy physics, arXiv---originally established at Los Alamos and now hosted at Cornell---has developed into a huge repository, mirrored at 18 sites around the world, holding more than 385,000 articles (as of September 2006), and delivering more than 40 million full-text downloads per year. Furthermore, the rate of submissions is still growing essentially linearly, from a few tens of articles per month in 1992 to about 4000 per month in the summer of 2006.

But more important than the sheer volume of papers is the gradual emergence of open access publishing as a means of distributing excellent new research. Consider, for example, what is arguably the most significant mathematical discovery of this century---namely, Grigori Perelman's completion of Richard Hamilton's programme for the resolution of the Poincaré conjecture. Although others have since published expanded versions of his proof via traditional channels, Perelman's own account appears only in a series of preprints submitted to arXiv. Statements like the following, made by arXiv founder Paul Ginsparg back in the brash youth of open access publishing [2], now appear to have been prophetic:

A major lesson we learn is that the current model of funding publishing companies through research libraries (in turn funded by overhead on research grants) is unlikely to survive in the electronic realm. . . . The essential question at this point is not whether the scientific research literature will migrate to fully electronic dissemination, but rather how quickly this transition will take place.

Of course, scientific publishing of the ordinary "chemicals on slices of dead trees" variety, despite having entered a period of technological upheaval, is still a large and highly profitable enterprise: It is probably premature to declare the ink-and-paper journals obsolete. Indeed, most scientific publishers now offer electronic versions of their journals, and many have adopted copyright policies that explicitly recognise the right of authors to archive their work in public repositories.*

In part this is a preemptive response to moves, contemplated by various state funding bodies, to require free public access to the results of publicly funded research.

Such wider questions have stirred up considerable debate [1] about the ownership of scientific results and the channels through which they are distributed. For example, although arXiv is certainly the largest and most successful open access archive, the software that drives it has never been in the public domain. Indeed, through the late 1990s, universities, departments, research groups, and government agencies that wanted to establish their own archives had no easy mechanism for doing so.

That changed in October 2000, when, in a brief note [4] Robert Tansley and Stevan Harnad, computer scientists at the UK's Southampton University, announced the release of the GNU EPrints package (http://www.eprints.org/). In their note, they quote an anonymous participant in the second meeting of the Open Archive Initiative:

Open Archiving will not get off the ground until the day I can go to a website, download open-archiving software, then say MAKE ARCHIVE, and an interoperable . . . archive is up and running, ready to be filled.

GNU EPrints is an open-source eprint server that is genuinely easy to set up and run. As the software is open-source, one can also customise the system easily†, taking advantage of a lively and helpful developer community that supplements the online documentation. Take up has been brisk: Southampton's Registry of Open Access Archives lists some 200 sites (as of Sep-tember 2006) running GNU Eprints and another 170 using DSpace, a somewhat more recent open-source archive package developed jointly by MIT and Hewlett-Packard.

Both of these packages provide convenient Web-based interfaces for both author and reader. Ultimately, however, the most important service they offer may be one that is invisible to users: access for automated searching of the bibliographic data and the text of articles. The possibilities for exploiting this sort of data-mining can be glimpsed [3], dimly, in such projects as the PubMed Central database (http://www.pubmedcentral.nih.gov), which already contains a quarter of a million articles and is growing rapidly. The articles have been automatically indexed and cross referenced in such a way that a user can easily, for example, jump from an article that mentions an organism to the relevant taxonomic or genomic database. Although it is harder to envision a similar thing in applied mathematics, where terminology is less standardised, it is not at all hard to imagine the usefulness of a data-mining engine that could accept a query like "fixed-point theorems applied to option pricing" and instantly return a list of relevant results.

Although youngsters like arXiv, EPrints, and DSpace are having a huge impact on scientific publishing, perhaps the last word should go to the Royal Society of London, publisher of the oldest peer-reviewed journal in the world. A recent policy statement emphasises that, in the excitement surrounding open access archives, one should not lose sight of the main point of scientific publishing: to establish and maintain institutions that facilitate communication among working scientists. After careful consideration, the society decided to make all its articles freely accessible one year after their initial appearance and, beginning last June, to offer authors the option of paying a fee and making their work freely available from the date of publication.

References
[1] B. Bachrach, R.S. Berry, M. Blume, T. von Foerster, A. Fowler, P. Ginsparg, S. Heller, N. Kestner, A. Odlyzko, A. Okerson, R. Wigington, and A. Moffat, Who should own scientific papers?, Science, 281 (1998), 1459–1460.
[2] P. Ginsparg, Winners and losers in the global research village, Electronic Publishing in Science I, Proceedings of the joint ICSU Press/UNESCO Conference, Paris, 1996, R. Elliot and D. Shaw, eds., http://people.ccmr.cornell.edu/~ginsparg/blurb/pg96unesco.html.
[3] P. Ginsparg, As we may read, J. Neuroscience, 26 (2006), 9606–9608.
[4] R. Tansley and S. Harnad, Eprints.org software for creating institutional and individual open archives, D-Lib Magazine, 6 (2000), http://www.dlib.org/dlib/october00/10inbrief.html#HARNAD.

Mark Muldoon is a senior lecturer in the School of Mathematics at the University of Manchester.

*The details of this recognition vary widely: Many publishers allow authors to archive their own versions of the final, post-referee papers, but not to distribute the published, journal-formatted version. Others, notably the IEEE and SIAM, require that authors distribute only the publisher's version.

†The author set up his departmental repository (http://eprints.ma.man.ac.uk/) and added a number of local modifications, all in about ten days of programming.


Renew SIAM · Contact Us · Site Map · Join SIAM · My Account
Facebook Twitter Youtube