Data deposit is the new, shiny thing in scholarly communications. Except that no researcher wants to actually do it. This is despite a general consensus, at least within the sciences and social sciences, that depositing one’s data within a discipline-specific or institutional repository is the way to accomplish three very important things: a) to enable replication of results, b) to advance the state of research, in part by allowing outside researchers to use the same data to answer a different set of questions, and c) to make the available to the public information that it funded through taxes. In spite of these worthy goals, researchers don’t want to share/deposit their data for the simple reason that they haven’t any incentive to do it.
Image that you are a scientist of some sort. You’re busy doing a million things, and these are million things that you’ve prioritized to do, things that advance your objectives in some way; perhaps the activitites dovetail with your drive for tenure, application for grant money, or ambition for some other professional recognition. Why would you drop one of these things to concentrate on working on data management, data sharing issues instead? You grant agency may say that you’re encouraged or even required to do it, but unless the agency either reward you for actually sharing your data or penalize you for failing to adhere to whatever standards they set, then them are just words on a page, something for you to pay some lip service to but hardly an activity for you to devote any more time or money than you can absolutely get away with. This isn’t criminal or amoral in some way. This is just a reality of how busy people with lots of things on their plates behave; it’s the professional equivalent of not watering your lawn because you have other, higher-valued things to be doing even though if your lawn were well watered and nicer, it would be increase the property value for the entire neighborhood.
All that is prologue. What follows is one way that I think can rectify this problem. I propose that institutional repositories set a price for people who want to reuse data deposited in their care. This money — plus a substantial matching amount — is then paid to the researchers whose deposited data are being used. This way, scientists who have valuable data that a lot of people want to reuse will get a large monetary reward for making their data available. They will be properly incentivized to take the time to publish their data. Ditto for scientists who do not have such demand for their data: they will get the signal to not put in the effort to make available data that other people do not want anyway.
In essence, what I’m proposing is a market for data, a market in which the production and consumption of data are subsidzied by the matching grant accompanying the price paid for the use of such data. The subsidy is provided so that a high price for data isn’t a deterrent for empirical research but so that the final amount paid to the data depositer (price + matching amount) is substantial enough to effectively entice him to work on data publication instead of the next highest-valued task. The price isn’t set at zero for the purpose of “separating the wheat from the chaff”. That is, the price people pay for the data indicates their value, and so a zero price may lead to indiscriminate downloading of the data without much thought for their potential value.
This is a basic framework that if implemented (ha!) will require more thought to resolve issues that will no doubt arise. (For instance, what price should be charged for a dataset? Should it be a uniform price, or one that varies by a multitudes of factors: number of observations, rarity of similar data, size of the field in which the data was produced . . . etc.?) On the flip side, more bells and whistles can also be added: samples of data available for free to help researchers ascertain the value of the data, discounts for student-researchers . . . etc