October 21 to 27, 2013 is Open Access Week, an international campaign now entering its sixth year. OAW promotes free, immediate, online access to research results, and the rights to reuse that information. The modern Web has brought about an unprecedented access to information, but along with it come the technicalities of whether a reader has the rights to apply that information in their own work and research. The open access movement strives to rectify that. As much data as possible should be available to everyone, and useable by all. However, we are a long way from that ideal, and there are still many obstacles to overcome.
The Roots of Open Access
The ideas behind Open Access are as old as science itself, although perhaps they were best expressed by Ranganathan in his Five Laws of Library Science in 1931. These laws are the basic philosophy behind keeping a library; or any collection of information. The five laws are:
- Books are for use
- Every person, his or her book
- Every book, its reader
- Save the time of the reader
- The library is a growing organism
The laws embody the ideals of open access. Books are of no use if librarians lock then away, or hide them in back rooms gathering dust. Visitors to the library may need to find some specific information, and they wish to do so as efficiently as possible. Any information is valuable to someone, and the collection must have the room, the resources and the support for it to grow. These laws say a great deal about their value, by readily applying or adapted to the modern Internet without too much rewording. Today sees a tremendous opportunity for a world where we may share and use information as never before.
Obstacles to Open Access
Sadly, and particularly more so recently, open access is not all that easy to come by. There is a mindset even among some academics that they should “own” or “control” the data they collect as part of their work. The focus on information is often not about how useful that information could be, but exactly who has the right to use it. In many fields, data does indeed have associated financial value or security implications; unfortunately, that often means even the most benign and useful data can often be locked away for no reason other than for the owners to hoard miserly.
Recently, “big data” has become a buzzword in areas such as marketing and commerce. Huge corporations that have the resources are constantly collecting every possible detail about every transaction, and mining that data for competitive advantage. Meanwhile, we carelessly give away details about our personal likes and habits through social media, without realizing (or perhaps not caring) that this data builds customer profiles that shape our retail experiences. This is how we end up with the stories about how Target can guess if their “guest” is pregnant, or the apocryphal tale that grocery stores display beer next to diapers. It takes a lot of data, and a huge amount of deep analytics, for even the smallest edge over the competition. But the edge is there, and so it has become fashionable to assume any data has a price.
That way of thinking ends up putting obstacles in areas far from consumerism. One would hope that open access was the norm when it came to scholarly research; in truth, we still have a long way to go. Suppose, for instance, you were researching the geographic distribution of American bullfrogs, and, as it happens. the Global Biodiversity Information Facility promises open access to that sort of data. In theory, you could download all that data and plot it on a map. In practice, it’s not that easy. The author of that post discovered the data had 65 different licenses; a mere 4% of the data came with licenses that were clearly open access. Many were unclear and would need contacting the individual publishers. But over 70% of the data came with a license that explicitly said something along the lines of “the data cannot be repackaged or redistributed” – in other words, something as simple as plotting it on a map is not allowed. Not exactly open access.
Access Means Success
Perhaps this assignment of value to data is the big obstacle. Researchers and knowledge workers have invested significant time and effort in collecting the data. Why then should that valuable data be made freely available? Surely credit should go where it’s due? In truth though, this is the wrong way to look at datasets. Scientific findings published in research papers have value, not from the raw data itself, but from the conclusions drawn from it. Papers can then be properly cited and attributed. But there is nothing to prevent the release of raw data under a separate license that would support its reuse. Other researchers would then be able to repeat the experiments. Perhaps even more dismaying is the amount of collected data never used, such as the results of failed experiments. Even that data has value to someone; every book its reader. The data only becomes valuable when it is available for use by others.
Plenty of services and licensing options make open access policies fairly easy to carry out. Typically, this means checking the data has no private or sensitive information. If not, there is no reason not to publish the dataset openly. The accompanying papers may have their own license, but the data can have its rights waived separately. Datasets released as public domain or under a similar “no rights reserved” license such as Creative Commons Zero give the most freedom for reuse. No part of this process reduces the moral obligation of a future user to acknowledge a contribution. The license makes it very clear that a researcher can freely build upon or reuse the information. Services such as figShare simplify this process. Assign published data the most liberal license possible, make it searchable and open for sharing, and tag it with the correct citation for future works.
Perhaps this seems all very highbrow and academic; perhaps you wonder why this may matter to anyone other than scholarly researchers and copyright lawyers. However, open access can have an impact in all of our lives; it is already changing the resources we have in education. Coincidentally with Open Access Week, you may have seen the media coverage about “Dinosaur Joe“, the most complete known specimen of a Parasaurolophus. The news and video you saw most likely mentioned that the credit for its discovery goes to a high school student – certainly a worthy news item. However, there’s more about Joe – he is, in fact, the world’s most open access dinosaur. Every piece of scan data the research team collected; every CT scan, every computer model, all the 3D data – is free for all to use. The same resources that were available to the authors of the paper about Joe are available to anyone; to look at, test the paper’s conclusions, even recreate the fossil with a 3D printer if you wished. While you could take a trip to southern California to see Joe at the Raymond M. Alf Museum of Paleontology, a student anywhere in the world could visit the data instead, and investigate it in far more detail than the museum display. Open access, along with the collaboration opportunities the modern Web offers, can advance science and education like never before. Every dataset you publish, no matter how small or incomplete or imperfect, is a wheel someone else doesn’t have to reinvent – and future scientists will have greater opportunities to stand on the shoulders of giants.Open Access logo from openaccessweek.org (CC-BY 2.0). Big Data image from DARPA. As a work of the U.S. federal government, the image is in the public domain. Skeleton of Parasaurolophus from Farke et al. (2013). Open Access via PeerJ.