Data Archive Preservation and Storage Policy

Policy Volume: DA
Responsible Executive: CISER Senior Data Librarian
Responsible Office: Cornell Institute for Social and Economic Research
Revised: 2014-04-03; 2020-11-05

POLICY STATEMENT

The data preservation function is integrated into the operations and planning of CISER and throughout the management stages of the research data lifecycle in order to support Social Science and Economic research at Cornell University.

REASON FOR POLICY

The fundamental purpose of CISER’s Data Archive is to select, preserve and make available for use primary and secondary data, documentation and metadata, in discipline recognized digital formats that remain suitable for research in perpetuity. The data preservation and storage policy is guided by a variety of community-driven standards, (e.g. Open Archival Information Systems (OAIS) reference model, Trusted Repositories Audit and Certification (TRAC), CoreTrustSeal (CTS), Data Documentation Initiative (DDI)), that represent an international body of knowledge and expertise pertaining to various issues within digital preservation.

POLICY GUIDELINES

These guidelines address the effective implementation of procedures for the preservation of CISER’s digital collections within the context of the CISER Data Archive Collection Policy. CISER reserves the right to review the scholarly and historical value of and user accessibility into the data preservation characteristics.

Data Integrity

Upon receipt of new digital content, the Archive staff process the data and documentation, assess that confidentiality concerns are addressed, in collaboration with the data producer fix errors if necessary, convert data formats, and run a checksum. The metadata pertaining to each data file is stored in a SQL database. (A backup of the SQL database is taken every evening and is retained for a finite period.) Provenance notes are maintained, which relate back to the original deposited version, as part of the metadata for any alterations made in the preservation and dissemination versions.

To ensure that the digital content remains identical and accessible, automated tasks are run to verify checksums. The results are compared to the metadata, held within the SQL database, to validate data integrity. If degradation of any digital content is detected, CISER would endeavor to reinstate the original version from a backup copy.

Data Normalization

Evaluation of new content types and software/format obsolescence is an ongoing process. It is expected that normalizing the Data Archive collection by migrating to updated content types when new formats become widely available will occur seamlessly. When new formats are created from data files either through migration into new file formats or through creating new file formats for dissemination, the old files are retained alongside. Version control is stored as part of the metadata, as referenced in the CISER Data Archive Versioning Policy.

Management of Storage Infrastructure

The preservation of the Data Archive is dependent upon CISER’s storage infrastructure. Thus, management of the storage infrastructure is designed to accommodate scalability, reliability, and sustainability, in accordance with quality control specifications and security regulations. In light of increasing user demand and changing technologies, CISER staff routinely monitors technical developments and evaluates potential archival solutions that will both streamline and enhance CISER data preservation practices.

Adequate storage capacity for all Data Archive holdings is maintained. In addition, unlimited capacity from external media is available. The disk storage maintains a RAID 6 configuration and all infrastructures are protected by uninterrupted power supplies (UPS).

All data  are backed up on a daily basis via the University’s offering of EZ-backup, which also provides off-site storage. EZ-backup makes use of IBM’s Tivoli Storage Manager.

Security

CISER is committed to taking all necessary precautions to ensure the physical safety and security of the Data Archive holdings that it preserves. The storage infrastructure is housed in the University data center. The data center features uninterrupted power supplies (UPS), fire prevention and protection system, physical intruder prevention and detection systems and environmental control systems. In addition, the server racks that house the CISER’s disk storage are equipped with unique keys.

Policy Review Process: CISER will review these policies every three years in conjunction with the CoreTrustSeal certification process or any future certification process.

Related Documents