CISER Data & Reproduction Archive Collection Policy

Policy Volume: DA
Responsible Executive: Senior Data Librarian
Responsible Office: Cornell Center for Social Sciences
Revised: 2017-10-20; 2020-09-16

POLICY STATEMENT

The CISER Data & Reproduction Archive is integral to CCSS Research Support’s overall mission to anticipate and to support the evolving data needs of Cornell social scientists and economists throughout the entire research process and data lifecycle. The goals of the archive are to make social science data available to researchers, while at the same time making data more Findable, Accessible, Interoperable, and Reusable (See The FAIR Data Principles). Towards those goals, archive staff can help researchers appraise, deposit, publish, make accessible, and preserve both research and secondary social science data. We also feature a custom-built catalog to maintain and preserve archive metadata along with our Results Reproduction (R-Squared) packages.

Background

The CISER Data Archive was established in 1982 to be a centralized information center and clearinghouse for the acquisition, storage, and access of machine-readable data for Cornell social science researchers. The archive’s content has gone through many changes through the years. We originally started as a clearinghouse for ICPSR, Roper, and government data, making files more accessible and usable for researchers. In more recent years, we have focused on adding to data collections in particular areas, collecting research data by individual researchers both at Cornell and beyond and archiving our Results Reproduction packages. Since the mid 2010s, most studies were assigned DOIs, and the archive catalog went through many additional improvements such as added search fields and expanded categories and keywords. In 2014 the archive earned the Data Seal of Approval, followed by the CoreTrustSeal. Also in 2014, the first Census download center was added to the archive, and in 2016 the first Results Reproduction code and data packages were added. In 2019-2020, the archive underwent a major upgrade, further expanding the download permissions system, adding additional DDI-compliant metadata, and making studies further discoverable to external researchers using Google’s dataset search. In 2020 the archive was rebranded as the CISER Data & Reproduction Archive.

Users

Core clientele for the data archive are Cornell faculty, graduate students, and research staff in schools and colleges that support CCSS Research Support, namely:

  • College of Agriculture and Life Sciences
  • College of Arts and Sciences
  • College of Human Ecology
  • SC Johnson College of Business
  • School of Industrial and Labor Relations

CCSS also supports users from other Cornell schools whose interdisciplinary researchers have a need for social science data. Finally, we have publicly available data studies and most R2 packages to support the broader research community outside of Cornell.

Data in Scope

CCSS Research Support houses an extensive collection of research data files in the social sciences with particular emphasis on data that matches the interests of Cornell researchers. CCSS intentionally uses a broad definition of social sciences in recognition of the interdisciplinary nature of Cornell research. Our archive includes data studies which fall into three broad access categories: available to the public, available to the Cornell community, and restricted data available through various application processes.

CCSS Research Support collects and maintains digital research data files in the social sciences, with a current emphasis on Cornell-based social science research, Results Reproduction packages, and potentially at-risk datasets. Our archive historically has focused on a broad range of social science data including data on demography, economics and labor, political and social behavior, family life, and health.

The collection includes, but not exclusively so, federal or state censuses, files based on administrative records, public opinion surveys, economic and social data from national and international organizations, and studies compiled by Cornell researchers.

The CISER Data & Reproduction Archive acquires or accepts data for any geographic area. The historical collection focused on data related to New York State and the United States with some international datasets as well.

High Priority Data

Considering the changing landscape for data archives and the ease of access of many data sources, CCSS has refocused its collection priorities to the following areas:

  • Demand – Datasets and studies required by CU researchers and unavailable elsewhere.
  • Results Reproduction packages
  • At-Risk Social Science Data – We are committed to seeking out and acquiring datasets and studies available from social science researchers and unavailable elsewhere. We are particularly interested in political conflict and political behavior data.
  • International Data
  • Local NYS and Cornell research data
Data NOT in Scope

The following are the general criteria for data that are out of scope of CCSS Research Support’s data collection. CCSS reserves the right to archive any dataset which we believe will be useful to Cornell researchers.

  • Non social science data/science data
  • Data with prohibitive costs
  • Data within proprietary software or subscription databases
  • Data availability: Unless data studies are part of the legacy collection for CU researchers or purchased for CUL or other entities, data available in a trusted repository will not be archived.
  • Direct identifiers: The CISER Data & Reproduction Archive will not accept data which contains personal identifiers, except in such cases where these data are part of the public record. Datasets held in the archive are primarily public-use versions. Our consultants can assist in de-identifying datasets for public use. For restricted access and limited use data products CRADC provides secure access.
  • Copyright: CCSS only accepts data in which we have the right to curate, disseminate and preserve a copy of the data.
  • The CISER Data & Reproduction Archive reserves the right to reject datasets that are deemed to be inadequately documented, potentially disclosive, acquired or generated illegally, or suspected or known to contain inaccuracies.
Data Curation

Process: New additions to the CISER Data & Reproduction archive follow an internally modified version of the Data Curation Network CURATED Workflow, which involves the following steps: Check files/code, Understand the data, Request missing information, Augment metadata, Transform file formats for reuse, Evaluate for FAIRness, and Document curation activities. [ 1 ]

Documentation: Where possible data studies are accompanied by comprehensive documentation: codebooks, file layout maps, technical notes, questionnaires, reports, and errata in open and accessible formats. Non-digital documentation is often available when machine-readable documentation is not. In cases where documentation is insufficient, CCSS works with data producers to ensure that data files are usable and understandable by generating additional contextual information.

File formats: CCSS prefers file formats in the LoC list of recommended formats. The formats are commonly used within the social science and economics domain, have open specifications, and are independent of specific software, developers or suppliers. CCSS will however accept data
regardless of physical format as long as they are convertible to supported and accessible file formats suited for long-time preservation for use by the entire Cornell community.

[ 1 ] Data Curation Network. “The DCN Curation Workflow.”
https://datacurationnetwork.org/resources/workflows/. Accessed 27 AUG 2020.

Purchasing Data

Data acquisition is primarily demand driven. The CISER Data & Reproduction Archive will attempt to acquire any set of data required by faculty members in accordance with organizational policies regarding cost, quality, restrictions, and expected future use by a broad constituency of social science and economics users. Using the same criteria, data are also acquired for students of those faculty who are engaged in substantive social science or economic research. Pro-active collection development is undertaken in anticipation of demand.

CCSS makes efforts to confirm that data were collected in accordance with legal and ethical criteria in place at the time and place of its collection, especially review by Ethical or Institutional Review Boards (IRB). Where this information is unavailable, the professional judgement of the Data Librarian and the Director will be used to decide on the inclusion of such data, taking into account the relative risk (usually low) associated with the data.

Due to Contractual agreements between Cornell University and the Inter-university Consortium for Political and Social Research (ICPSR), the Qualitative Data Repository (QDR), and the Roper Center for Public Opinion Research, members of the Cornell Community are entitled to obtain any of the data offerings of the Consortium, Repository, and the Center. The CISER Data & Reproduction Archive serves any and all members of the Cornell community in terms of data acquisitions from the Consortium, regardless of subject area.

When a data request is initiated by an individual, the requester will be asked to provide the staff with a description of the data, written justification for the purchase of the file, and a cost estimate for data acquisition. Criteria are based on the likely usage, how well the purchase fits with our mission and scope, and price. It may be recommended that the requester go directly to another funding source, such as his own department, the library, another agency, or cooperate in pooling resources.

The CISER Data & Reproduction Archive works with Library Collection Development staff, faculty, and departments to secure full or matching funding, especially in cases where a dataset has a potential audience representing more than one academic department. CCSS also collaborates with Cornell libraries and other information services at Cornell to assure that collection content and access are not duplicated, so long as CCSS clients can use data and material from those units with reasonable effort. When acquiring material, the archive must consider not only content but format and delivery criteria to fulfill its mission and meet the needs of its clientele.

POLICY TO REVIEW PROCESS

CCSS will review these policies every three years in conjunction with the CoreTrustSeal certification process or any future certification process.

CONTACTS

If you have questions about specific issues regarding this policy, contact the following CCSS Research Support Staff: