- What is the Data Archive?
- Where do the data come from?
- Data Greatest Hits
- Where do Archive data live?
- How to find data?
- What numeric file formats does the Archive own?
- What documentation file formats does the Archive own?
- Accessing Archive studies
- Who are those people in the Archive?
- Other Cornell data sources
- Help using Archive resources
- Credits
What numeric file formats does the Archive own?
Numeric files come in a wide variety of formats. Most are ascii text because it's reasonably "generic;" that is, it's useable across operating systems, computing platforms, and software. Some archive files are formatted by use with a specific statistical software, although these files can be converted for use with other software.
ASCII Text Files
Ascii text can be column-aligned or character-delimited. Below is an example of column-aligned text: 8 records from a dataset called the WABC-TV/New York Daily News Race Relations Poll, conducted in January 1988. Each line represents a survey participant's responses to poll questions and, in this case, two records contain the responses of one person in the survey.
00001 207023301 0000097 02
0000211311212211312111188855 009 42 01
00002 207023301 0000097 02
0000320311222121311111128112 005 41 01
00003 107023301 0000104 02
0000411323211221312112128821 013 12 01
00004 207023301 0000097 02
How do you find out what the numbers mean and how they correspond to answers to poll questions? A codebook gives you this information. For example, this is a question in the survey:
Who do you think is the most important black leader in New York City today? 1 (06-07)
| Laura Blackbourne | 01 |
| Calvin Butts | 02 |
| Mary Smith Campbell | 03 |
| David Dinkins | 04 |
| Herbert Daughtry | 05 |
| Hazel Dukes | 06 |
| Denny Farrell | 07 |
| Floyd Flake | 08 |
| Richard Green | 09 |
| Roy Innis | 10 |
| Jesse Jackson | 11 <== |
1 (06-07) tells us that the answers
to this question are found in record one for each survey respondent in
columns 6 and 7. (This is indicated in green
text in the above example.) Therefore, we can see three
of the respondents (represented by the above records) answered "Jesse
Jackson" to this question.
Ascii format files can also be comma- or tab-delimited. Here's an example of three records from the State Energy Data System Consumption Estimates file covering 1960 to 1995. The records are comma-delimited, with the values defined by commas.
MI,ESCCP,6380.97191,6771.73201,7285.98594,7865.462,8532.504,9124.053,9880.086
MI,ESICP,12481.514,12198.785,13976.818,15633.541,17130.008,19350.259,21525.311
MI,ESRCP,8727.583,9179.687,9550.697,9955.708,10573.206,11309.202,12325.638
This particular file can be loaded into spreadsheet
software that interprets comma-delimited files (such as Excel, SAS, SPSS,
or dBase), provided the number of fields or records is not larger than
the software can accommodate. You will also have to get the column
headings (variable names) from the file's documentation.
Software-Specific Formats
Some of our files are formatted for use with a specific software package. For example, here is an excerpt from a SAS file:
A^PAPA^PA@B BUC0M- BdAM-0A^PA^PAPB^WA^PA^PA^PA^PA0A^PA^PA^PA^PA0A
A A A A^PA A A
A^PA^PA A A A
A^PA^PA0A A@A0A^PA^PAPA^PA^PB^WBVC^VBnAM-PA^PA^PAM-^@B0A^PA^PA^PA
^PA^PA^PA^PA0A^PA0A0A
A A0A^PA A^PA^PA^PA^PA A A A0A0A^PAPA^PA@B3BWC>pBnAM-^@A^P
A^PA@B?A0A0A A0A^PA^PA^PA^PA@A0A
A A A A A^PA A0A^PA^PA A A A A A A A A0A^PAPA^P
APB9BXC ^PBnAM-^PA^PA^PAM-^@B!A
A A A0A^PA^PA^PA0A0A A0A A^PA A^PA A A^PA A A A
^PA^PA^PA^PA A A^PA^PAPA^PA
B^UBYC^^M-PC^[0AM- A^PA^PAM-^@B^XA^PA^PA^PA A0A^PA0A
^PA0A A A^PA^PA A0A^PA0A^PA
A A A0A0A^PA^PA0A0A A A0A^PAPA^PA^PB^WBZC&pC^[0AM-^P
A^PA^PApB^SA^PA0A^PA
A A0A^PA0A0A A0A A0A A A A^PA A A^PA A A^PA A A^PA^PA^PA A
A A^PA^PAPA^PA^PAM-@B[C)@C^^M-PAM-`A^PA^PAM-^@BBA0A
A^PA^PA0A^PA A A A^PA0A A A
^PA A0A0A A A A0A^PA^PA^PA0A^PA^PAPA^PAPB^YB\BM-DC^^M-PAM-pA^PA^PB.A^PA^PA^PA0A
Interpretation of this file requires SAS statistical software. We have files formatted for use with SPSS, Excel, dBase, and Stata. How do you know when a data file is in a specialized format? File information in the catalog database contains this information; for example,
Type of File: SAS
Transport file v6.08
Directory\Filename: U:\ArchiveData\econ\007\wave01.trn
Technote: USE CIMPORT
Logical Record Length (LRECL): 1775
Number of Records: 11966
Record Format (RECFM): V
Bytes (compressed): 118250
Bytes (uncompressed): 957280
If you have questions about the format of a file, ask Archive staff or a HelpDesk computing consultant.
Most archive files formatted for SAS and SPSS are in a "transport" or "portable" file format. They can be copied safely for use on different operating systems or versions of the statistical software.
You can also use "translation" software to convert one file format to another. You can use StatTransfer on the CISER Research Computing nodes, and some computing labs on campus have StatTransfer or DBMSCopy selected machines.