Data Archive Collection Policy

Policy Volume: DA
Responsible Executive: CISER Senior Data Librarian
Responsible Office: Cornell Institute for Social and Economic Research
Revised: 2017-10-20; 2020-09-16

Policy Statement

The CISER Data Archive is integral to CISER’s overall mission to anticipate and to support the evolving data needs of Cornell social scientists and economists throughout the entire research process and data lifecycle. The goals of the CISER Data Archive are to make social science data available to researchers, while at the same time making data more Findable, Accessible, Interoperable, and Reusable (See The FAIR Data Principles). Towards those goals, CISER Archive staff can help researchers appraise, deposit, publish, make accessible, and preserve both research and secondary social science data. We also feature a custom-built Data & Reproduction Archive to maintain and preserve data and our Results Reproduction (R-Squared) packages.

Background

The CISER Data Archive was established in 1982 to be a centralized information center and clearinghouse for the acquisition, storage, and access of machine-readable data for Cornell social science researchers. The archive’s content has gone through many changes through the years. We originally started as a clearinghouse for ICPSR, Roper, and government data, making files more accessible and usable for researchers. In more recent years, we have focused on adding to data collections in particular areas, collecting research data by individual researchers both at Cornell and beyond and archiving our Results Reproduction packages. Since the mid 2010s, most studies were assigned DOIs, and the archive catalog went through many additional improvements such as added search fields and expanded categories and keywords. In 2014 the archive earned the Data Seal of Approval, followed by the CoreTrustSeal. Also in 2014, the first Census download center was added to the archive, and in 2016 the first Results Reproduction code and data packages were added. In 2019-2020, the archive underwent a major upgrade, further expanding the download permissions system, adding additional DDI-compliant metadata, and making studies further discoverable to external researchers using Google’s dataset search. In 2020 the archive was rebranded as the CISER Data and Reproduction Archive.

Users

Core clientele for the data archive are Cornell faculty, graduate students, and research staff in schools and colleges that support CISER, namely:

  • College of Agriculture and Life Sciences
  • College of Arts and Sciences
  • College of Human Ecology
  • SC Johnson College of Business
  • School of Industrial and Labor Relations

CISER also supports users from other Cornell schools whose interdisciplinary researchers have a need for social science data. Finally, we have publicly available data studies and most R2 packages to support the broader research community outside of Cornell.

Data in scope

CISER houses an extensive collection of research data files in the social sciences with particular emphasis on data that matches the interests of Cornell researchers. CISER intentionally uses a broad definition of social sciences in recognition of the interdisciplinary nature of Cornell research. Our archive includes data studies which fall into three broad access categories: available to the public, available to the Cornell community, and restricted data available through various application processes.

CISER collects and maintains digital research data files in the social sciences, with a current emphasis on Cornell-based social science research, Results Reproduction packages, and potentially at-risk datasets. Our archive historically has focused on a broad range of social science data including data on demography, economics and labor, political and social behavior, family life, and health.

The collection includes, but not exclusively so, federal or state censuses, files based on administrative records, public opinion surveys, economic and social data from national and international organizations, and studies compiled by Cornell researchers.

The Data Archive acquires or accepts data for any geographic area. The historical collection focused on data related to New York State and the United States with some international datasets as well.

High priority data

Considering the changing landscape for data archives and the ease of access of many data sources, CISER has refocused its collection priorities to the following areas:

  • Demand – Datasets and studies required by CU researchers and unavailable elsewhere.
  • Results Reproduction packages
  • At-Risk Social Science Data – We are committed to seeking out and acquiring datasets and studies available from social science researchers and unavailable elsewhere. We are particularly interested in political conflict and political behavior data.
  • International Data
  • Local NYS and Cornell research data

Data not in scope

The following are the general criteria for data that are out of scope of CISER’s collection. CISER reserves the right to archive any dataset which we believe will be useful to Cornell researchers.

  • Non social science data/science data
  • Data with prohibitive costs
  • Data within proprietary software or subscription databases
  • Data availability: Unless data studies are part of the legacy collection for CU researchers or purchased for CUL or other entities, data available in a trusted repository will not be archived.
  • Direct identifiers: CISER Data Archive will not accept data which contains personal identifiers, except in such cases where these data are part of the public record. Datasets held in the archive are primarily public-use versions. Our CISER consultants can assist in de-identifying datasets for public use. For restricted access and limited use data products CRADC provides secure access.
  • Copyright: CISER only accepts data in which we have the right to curate, disseminate and preserve a copy of the data.
  • The CISER Data and Reproduction Archive reserves the right to reject datasets that are deemed to be inadequately documented, potentially disclosive, acquired or generated illegally, or suspected or known to contain inaccuracies.

Data curation

Process: New additions to the CISER Data & Reproduction archive follow an internally modified version of the Data Curation Network CURATED Workflow, which involves the following steps: Check files/code, Understand the data, Request missing information, Augment metadata, Transform file formats for reuse, Evaluate for FAIRness, and Document curation activities. [ 1 ]

Documentation: Where possible data studies are accompanied by comprehensive documentation: codebooks, file layout maps, technical notes, questionnaires, reports, and errata in open and accessible formats. Non-digital documentation is often available when machine-readable documentation is not. In cases where documentation is insufficient, CISER works with data producers to ensure that data files are usable and understandable by generating additional contextual information.

File formats: CISER prefers file formats in the LoC list of recommended formats. The formats are commonly used within the social science and economics domain, have open specifications, and are independent of specific software, developers or suppliers. CISER will however accept data
regardless of physical format as long as they are convertible to supported and accessible file formats suited for long-time preservation for use by the entire Cornell community.

Purchasing data

Data acquisition is primarily demand driven. The Data Archive will attempt to acquire any set of data required by faculty members in accordance with organizational policies regarding cost, quality, restrictions, and expected future use by a broad constituency of social science and economics users. Using the same criteria, data are also acquired for students of those faculty who are engaged in substantive social science or economic research. Pro-active collection development is undertaken in anticipation of demand.

CISER makes efforts to confirm that data were collected in accordance with legal and ethical criteria in place at the time and place of its collection, especially review by Ethical or Institutional Review Boards (IRB). Where this information is unavailable, the professional judgement of the Data Librarian and the Director will be used to decide on the inclusion of such data, taking into account the relative risk (usually low) associated with the data.

Due to Contractual agreements between Cornell University and the Inter-university Consortium for Political and Social Research (ICPSR), the Qualitative Data Repository (QDR), and the Roper Center for Public Opinion Research, members of the Cornell Community are entitled to obtain any of the data offerings of the Consortium, Repository, and the Center. CISER Data Archive serves any and all members of the Cornell community in terms of data acquisitions from the Consortium, regardless of subject area.

When a data request is initiated by an individual, the requester will be asked to provide the staff with a description of the data, written justification for the purchase of the file, and a cost estimate for data acquisition. Criteria are based on the likely usage, how well the purchase fits with our mission and scope, and price. It may be recommended that the requester go directly to another funding source, such as his own department, the library, another agency, or cooperate in pooling resources.

The Data Archive works with Library Collection Development staff, faculty, and departments to secure full or matching funding, especially in cases where a dataset has a potential audience representing more than one academic department. The Data Archive also collaborates with Cornell libraries and other information services at Cornell to assure that collection content and access are not duplicated, so long as CISER clients can use data and material from those units with reasonable effort. When acquiring material, the Archive must consider not only content but format and delivery criteria to fulfill its mission and meet the needs of its clientele.

Policy review process

CISER will review these policies every three years in conjunction with the CoreTrustSeal certification process or any future certification process.

Contacts

If you have questions about specific issues regarding this Data Collection Policy, contact the following CISER Staff:

Responsibilities

The following are major responsibilities each party has in connection with this policy.

CISER Director / CISER Data Librarian – Interpret this policy and provide clarification and education and implement operational and business processes to facilitate compliance.

Any employee found to have violated this policy may be subject to disciplinary action, up to and including termination of employment.


References

[ 1 ] Data Curation Network. “The DCN Curation Workflow.”
https://datacurationnetwork.org/resources/workflows/. Accessed 27 AUG 2020.