Cornell University Cornell University CISER

CISER Data Archive

Locating and Using Archive Data on the CISER Research Computing System




This page tells you how to find information about data in CISER's archive and how to locate individual files.   General information about what datasets CISER acquires is in our collection development policy (on the Cornell University Library web site).

If you can't locate a specific dataset or one that meets your need, please contact the data archivist for assistance.

Checking Archive Holdings

CISER's collection database allows you to identify a specific study and associated files:

  • Search our catalog; for example, by title or title words, principal investagator, issuing agency, ICPSR number.  A compound search option is also available.  Consult the search hints for help with constructing a search query.
  • Browse our catalog by subject areas, including 19 broad categories and 57 subcategories.

Lists of recent additions to the archive on the CISER file server and on CD-ROM/DVD are also available.  

Accessing and Using Archive Data on File Server

Data files, most documentation, and other files (for example, SAS, SPSS, and Stata programs) are housed on the CISER file server. Some data require facility with a statistical software package that can  read and manipulate raw data files or software-specific datasets.  

There are several ways Cornell users can use or obtain data archive holdings:

From CISER's compute nodes (for those having CISER computing accounts)

Social science faculty, students, and research staff are eligible for accounts on our computing environment. CISER Computing Basics has helpful information on using the system.

The archive's holdings catalog lists the directory locations and files associated with each study.  See below for an example of how this information appears in the catalog.

Once you log onto a node, archive files are located in the U:\ArchiveData directory.  For example, these files comprise the World Values Surveys (CISER codebook SIND-071):

\ArchiveData\sind\071\da2790     data file
\ArchiveData\sind\071\cb2790     codebook
\ArchiveData\sind\071\sp2790     SPSS program
\ArchiveData\sind\071\sa2790     SAS program

Usually, you don't have to copy archive files to your own user space or a CISER computing node to use them.  Most documentation files can be viewed from their original location using Notepad, Wordpad, or the Adobe Acrobat Reader.  You can read raw data files from their original location within your program (using the infile statement in SAS,  the  file handle in /name =  statement in SPSS, or the use command in Stata).    Here is a simple SAS program that extracts three variables from the World Values Survey data file:

data mylib.world;
   infile 'u:\ArchiveData\sind\071\da2790' lrecl=352;
input

survey 1
country 2-3
religion 214-215;
run;

 
SAS, SPSS, and Stata input programs can be copied to your own user space and edited, or you can cut and paste the needed sections as you write your program. 

Via a mapped drive to the CISER file server (for those having CISER computing accounts)

This method is handy for moving archive files to your local machine or to use software installed on your office or home desktop. Detailed information on how to map the CISER file server is provided elsewhere on the web site.

Downloads from catalog links (all Cornell users)

Current Cornell faculty, staff, and students can download most archive files to their own machines from links in our online catalog.  Files are downloaded in ZIP compressed format. Use WinZIP, Stuffit Expander, 7-Zip, or similar utilities to open them.

The example below provides a non-working example of what the catalog file information looks like.  The information hyperlink icon takes you to the download page, which describes this service in more detail.

 

Current Population Survey, November 2003: Tobacco Use Supplement

U.S. Bureau of the Census  -- June 2006 -- Washington: The Bureau, 2006 [producer].   Note: Also known as the Tobacco Use Special Cessation Supplement. Co-sponsored by the National Cancer Institute and the Centers for Disease Control and Prevention.   Codebook: CPH-011(2003).

File Information: information hyperlink

Type of file Directory \ File Name
Records
LRECL
RECFM
Size / Size Zipped
Codebook U:\ArchiveData\cph\011\cps_febjunnov_03.pdf
n/a
33,373
V
3 MB / 2 MB
File Layout U:\ArchiveData\cph\011\cpsnov03.txt
6,128
53
V
168 KB / 32 KB
Data U:\ArchiveData\cph\011\cpsnov03.dat
156,869
1,384
C
217 MB / 19 MB

Finding Printed Documentation and Reports

The archive has printed documentation for most datasets: codebooks, data dictionaries, technical reports, and questionnaires.  These are arranged according to an in-house subject scheme.  You can browse our shelves, plus use the online catalog to find documentation for which we have no print equivalent. Archive copies are intended for on-site use only. You might also find copies at Cornell University Library locations or on the internet.

Archive Data Files in other Formats

We own selected datasets on CD-ROMs or DVDs. You can use the archive's catalog to find these.  The following example illustrates a study owned on CD-ROM:

County and City Data Book 2000

U.S. Bureau of the Census. Washington, DC : The Bureau, 2003 [producer]. Washington, DC: The Bureau, 2003 [distributor]. Files on CDROM#: 773.

Some may be borrowed for use outside the archive, others must be used in the archive on a public machine.  Please ask staff for help with locating these items.