CISER header image
Cornell University Cornell University CISER

CISER Data Archive: Tutorial

What numeric file formats does the Archive own?

Numeric files come in a wide variety of formats.  Most are ascii text because it's reasonably "generic;"  that is, it's useable across operating systems, computing platforms, and software. Some archive files are formatted by use with a specific statistical software, although these files can be converted for use with other software.

ASCII Text Files

Ascii text can be column-aligned or character-delimited.  Below is an example of column-aligned text:  8 records from a dataset called the WABC-TV/New York Daily News Race Relations Poll, conducted in January 1988. Each line represents a survey participant's responses to poll questions and, in this case, two records contain the responses of one person in the survey.

0000111211212211311111888822                             007           42 01
00001           207023301                                         0000097 02

0000211311212211312111188855                             009           42 01

00002           207023301                                         0000097 02

0000320311222121311111128112                             005           41 01

00003           107023301                                         0000104 02

0000411323211221312112128821                             013           12 01

00004           207023301                                         0000097 02

How do you find out what the numbers mean and how they correspond to answers to poll questions?  A codebook gives you this information.  For example, this is a question in the survey:

Who do you think is the most important black leader in New York City today?                   1 (06-07)

Laura Blackbourne 01 
Calvin Butts 02
Mary Smith Campbell 03
David Dinkins 04
Herbert Daughtry 05
Hazel Dukes 06
Denny Farrell 07 
Floyd Flake 08
Richard Green 09
Roy Innis 10
Jesse Jackson 11 <==


1 (06-07) tells us that the answers to this question are found in record one for each survey respondent in columns 6 and 7.  (This is indicated in green text in the above example.)   Therefore, we can see three of the respondents (represented by the above records) answered  "Jesse Jackson" to this question.

Ascii format files can also be comma- or tab-delimited.  Here's an example of three records from the State Energy Data System Consumption Estimates file covering 1960 to 1995.  The records are comma-delimited, with the values defined by commas. 

MI,ESCCP,6380.97191,6771.73201,7285.98594,7865.462,8532.504,9124.053,9880.086
MI,ESICP,12481.514,12198.785,13976.818,15633.541,17130.008,19350.259,21525.311

MI,ESRCP,8727.583,9179.687,9550.697,9955.708,10573.206,11309.202,12325.638

This particular file can be loaded into spreadsheet software that interprets comma-delimited files (such as Excel, SAS, SPSS, or dBase), provided the number of fields or records is not larger than the software can accommodate.  You will also have to get the column headings (variable names) from the file's documentation.
 

Software-Specific Formats

Some of our files are formatted for use with a specific software package.  For example, here is an excerpt from a SAS file:

A^PAPA^PA@B BUC0M- BdAM-0A^PA^PAPB^WA^PA^PA^PA^PA0A^PA^PA^PA^PA0A A A A A^PA A A
 A^PA^PA A A A A^PA^PA0A A@A0A^PA^PAPA^PA^PB^WBVC^VBnAM-PA^PA^PAM-^@B0A^PA^PA^PA

^PA^PA^PA^PA0A^PA0A0A A A0A^PA A^PA^PA^PA^PA A A A0A0A^PAPA^PA@B3BWC>pBnAM-^@A^P

A^PA@B?A0A0A A0A^PA^PA^PA^PA@A0A A A A A A^PA A0A^PA^PA A A A A A A A A0A^PAPA^P

APB9BXC ^PBnAM-^PA^PA^PAM-^@B!A A A A0A^PA^PA^PA0A0A A0A A^PA A^PA A A^PA A A A

^PA^PA^PA^PA A A^PA^PAPA^PA B^UBYC^^M-PC^[0AM- A^PA^PAM-^@B^XA^PA^PA^PA A0A^PA0A

^PA0A A A^PA^PA A0A^PA0A^PA A A A0A0A^PA^PA0A0A A A0A^PAPA^PA^PB^WBZC&pC^[0AM-^P

A^PA^PApB^SA^PA0A^PA A A0A^PA0A0A A0A A0A A A A^PA A A^PA A A^PA A A^PA^PA^PA A

A A^PA^PAPA^PA^PAM-@B[C)@C^^M-PAM-`A^PA^PAM-^@BBA0A A^PA^PA0A^PA A A A^PA0A A A

^PA A0A0A A A A0A^PA^PA^PA0A^PA^PAPA^PAPB^YB\BM-DC^^M-PAM-pA^PA^PB.A^PA^PA^PA0A

Interpretation of this file requires SAS statistical software.  We have files formatted for use with SPSS, Excel, dBase, and Stata.  How do you know when a data file is in a specialized format?   File information in the catalog database contains this information; for example,

Type of File:   SAS Transport file v6.08
Directory\Filename:   U:\ArchiveData\econ\007\wave01.trn
Technote:    USE CIMPORT
Logical Record Length (LRECL):   1775
Number of Records:   11966
Record Format (RECFM):   V
Bytes (compressed):   118250
Bytes (uncompressed):  957280

If you have questions about the format of a file, ask Archive staff or a HelpDesk computing consultant.

Most archive files formatted for SAS and SPSS are in a "transport" or "portable" file format. They  can be copied safely for use on different operating systems or versions of the statistical software.

You can also use "translation" software to convert one file format to another. You can use StatTransfer on the CISER Research Computing nodes, and some computing labs on campus have StatTransfer or DBMSCopy selected machines.

previous   next