Scholarly Communications: Data Sharing in the Humanities

Scholarly communications is a field that concerns itself with the dissemination of academic research. (It’s also an area that I’m very much looking to position myself to start a career in.) Currently, the biggest challenges within the system of scholarly communications are the business models of commercial publishers, increasingly viewed by the research community as being unsustainable, and the legal issues surrounding the intellectual property in the care of libraries. However, a third area of scholarly communications that is seeing a lot of exciting changes involves data sharing.

The reasons for sharing data are myriad, but put briefly, data readily available for re-use enables everything from verification and replication of old results to the advancement of the frontier of research by allowing people to ask and answer new questions. The latest propulsion to share one’s data is coming from funding agencies such as the National Science Foundation which is now conditioning its grants on researchers putting in place plans for “data management and sharing of the products of research”.

Institutions like the NSF, the Wellcome Trust, the Protein Data Bank, and prestigious scientific journals that mandate/encourage the release of data and other research in- and output have made an impact in the fields of medicine, science, and social sciences. In contrast, the humanities have not engaged in any systematic effort to disseminate the raw building blocks to their research.

I got a first-hand account of the view humanist researchers hold on the concept of data sharing when a graduating Ph.D. student from the UC history department came to talk to our class. During her presentation to a room of twenty people, the historian made the following statements:

1) She will never share her data, e.g. the notes that she made on the primary sources that go into supporting her analysis. Not now, not after she gets tenure, not ever.

2) Nobody in her field shares data.

3) Peer review is done by reading the final product, sans examination of intermediary materials (since none will be provided), and relying on the reputation of the researcher or the researcher’s adviser to vouch for the credibility of what one is reading.

4) Reasons for not ever sharing data include “my data are what I bring to a job; I won’t get hired unless these are kept proprietary and secret” and more incredibly “if I share my data, somebody may use the data to go completely against my own research”.

I’m completely stunned by proud utterance of these trust-destroying statements. Unless I misunderstood or the comments are not representative of how the discipline works, this researcher has expressed the most anti-science sentiment I’ve never heard within the halls of higher education. By “science“, I do not mean that carried out with beakers and test tubes, but rather the accretion of human knowledge by systematic study. By “science”, I mean the process by which one scholar builds on the works of those who came before her and leaves behind what she learned for future generations.

Science is, therefore, the anti-thesis of a scheme in which the research community relies mainly on the reputation of an academic adviser to evaluate the advisee’s work. A scheme in which one’s work is deemed of value because one’s adviser is reputable implies, by deduction, that one’s adviser is reputable because one’s adviser’s adviser has vouched for him, and he, in turn, has been deemed trustworthy based on the reputation of his predecessor, and so on & so forth ad nauseum. Whatever are the appeals of such monarchical inheritance, such a scheme isn’t science! Science is the anti-thesis of “it’s turtles all the way down“!

Science is a process in by which actualities get sieved out of plausibilities, in which truths are ferreted out of hypothesis, in which dead-ends and wrong-turns are corrected by probing eyes and inquisitive minds. Science more easily delivers these desirable outcomes when scientists do not actively obstruct its progress by, for example, refusing to make their data public because then “others can go completely against [my  own] research”, i.e. “prove me wrong”.  Although science has mechanisms in place to avoid tragic farces like the Sokal affair, it works better when its practitioners do not hold only strategic concerns in mind — how can I get hired?, how do I make sure that I’m the one to publish this paper? — or at least manage to align their strategic concerns with good research. Science works better when scholars realize that they should, at the very least, “talk the talk” and not boldly proclaim to be doing their best to hinder its progress by making it as difficult as possible for somebody else to uncover their mistakes. Science works better still if its practitioners can mean what they say.

Or, did I misunderstand?

