Cornell University

Data Sharing: An Important Step in the Scientific Method

Chair: Chuck Humphrey
Organization: University of Alberta
Concurrent: C2
Location: Hotel School Auditorium, Statler
Date: Wednesday June 2nd, 3:45-5:15pm

Copyright and "Facts": Issues in Licensing and Redistribution for Social Science Data Professionals Presentation

Presenter: San Cannon
Organization: Federal Reserve Board
Abstract:
While data and statistics have always been the backbone of empirical research, they are now important intellectual property even outside the halls of academia. Everything from the global financial system to shipping gifts for the holidays depends on data: in today’s digital world, information and data are crucial commodities. But often there are strings attached to the access, usage and reporting of data, even if the data are “free.” Researchers may compile a dataset but what rights do they have for the use of those data? This paper will outline some of the issues and considerations of which data professionals in the social science need to be aware. Copyright, licensing, redistribution and intellectual property rights are now important issues that data users, and those who support them, need to understand early on in the research process.


Reproducibility of Computational Results: Opening Code and Data Presentation

Presenter: Victoria Stodden
Organization: Yale Law School
Abstract:
Scientific computation is emerging as absolutely central to the scientific method, but the prevalence of very relaxed practices is leading to a credibility crisis. Reproducible computational research, in which all details of computations — code and data — are made conveniently available to others, is a necessary response to this crisis. Questions emerge regarding scientists' incentives and motivations to share. This talk presents results from a survey of computational scientists to determine the factors that facilitate code and data sharing and those that create barriers. One major result finds that sharing is done for reasons other than direct personal gain, but when scientists choose not to reveal data or code this is due to perceived personal impact. A second major finding is the prominence of Intellectual Property concerns with regard to not sharing code and data. Solutions to the various barriers are discussed, including how the "Reproducible Research Standard" (Stodden 2008), which proposes a licensing structure consonant with scientific norms, can thus encourage open sharing in scientific research.


Barriers to Data Sharing: New Evidence from a US Survey Presentation

Presenter: Amy Pienta
Organization: ICPSR
Co-Authors:  George Alter and Jared Lyle, ICPSR
Abstract:
Recent studies demonstrate that the majority of social science data is not preserved or shared through social science data archives and other formal archival arrangements. This motivates further investigation about the various ways researchers share their data (including more “informal” data sharing) and the factors that underlie their data sharing behavior. We developed a survey to collect information from principal investigators (PIs) of federally funded research grants in the US about their experiences with data sharing (n=1,021). We also collected information about various factors that might be related to data sharing behavior including: normative data sharing practices in their discipline, perceived barriers to data sharing, rank/tenure, institutional type, gender and so on. We find that while only 12% of the PIs have archived their data, 45% have shared their data outside the immediate research team. Being in a discipline that favors data sharing is positively associated with the likelihood that a PI shares his or her research data. Perceived barriers to data sharing reduce the likelihood one shares data. Other factors associated with data sharing include rank/tenure status and duration of the grant. Implications for data archives are also discussed.