home | schedules | software | help | who we are | about | workshops | links | data access | contact us | print version

<  October 2009 >
Su Mo Tu We Th Fr Sa
        1
2
3
4
5
6
7
8
9
10
11
12
13
14
15
16
17
18
19
20
21
22
23
24
25
26
27
28
29
30
31

Reserve a classroom


Schedule for
10/03/2009


Main Lab
140 Prospect St.
Room 101
No Events Today

Rosenkranz Hall
115 Prospect St.
Room 01
No Events Today

Consultant's Desk
140 Prospect St.
Room 100
12- 3:30pm Jeremy Green
3:30- 7pm Adrian de la Garza


schedules

software

help

who we are

data access

about

workshops

links

Help Using SSDA Data Files: SAS



Introduction

The instructions and examples below explain how to use data files from a computer directly connected to the StatLab NT Server.

Using ASCII Data Files (.dat, .raw , .txt or .csv files)

Column-delimited ASCII files (usually with the file extension .dat or .raw) and Delimited ASCII files (usually with the file extension .txt or .csv) can be read by the statistical packages on the StatLab NT Server. The sections that follow provide information for reading in ASCII files in SAS for Windows. Since many of the ASCII files on the StatLab NT Server are too large to convert to SAS datasets, we recommend that users create extracts and write them to diskette, zip disk, their personal drive (y:\ ), or to the c:\temp directory of the StatLab computers. Note the c:\temp directory can only be used for temporary storage, as files may be deleted by other users.

Using SAS System Data files

The system data files available are ready for analysis using STATLAB software, with little additional data management. The sections below provide instructions for accessing system data files in SAS (typically extension .sd2 or .sas7bdat). If the analysis is designed to use other software, these data files are readily convertible to other data formats using the file conversion software StatTransfer or DBMSCOPY also available at the STATLAB (Start>Programs>Statlab Packages).



Using ASCII Data Files with SAS

If ASCII data is "delimited" (there is a marker such as a comma separating columns, typically files with the extension .txt or .csv), use the program StatTransfer to transform data into a SAS (.sd2) dataset. SAS can also import such files (select File> Import from the main screen or open file from the Analyst (SAS's user friendly format) screen. However, these programs are erratic and are not currently recommended.

If data is organized is "Column-delimited" (variables are in fixed positions and are of fixed length, typically with the file extension .dat or .raw), there are two options for reading in ASCII data: write a program to create a temporary SAS dataset (which can be saved) or use EFI, SAS's external file interface.

Program to create a temporary SAS dataset

When reading in the ASCII data with the DATA step in SAS, be sure to include the lrecl (logical record length) if the data are longer than 132 columns. This information can be found in the codebook for the file(s).

The following program will create a temporary SAS dataset.  This program is useful for extracting variables for analysis from large ASCII files, which do not require extensive transformations of variables.

DATA abcd;
     INFILE 'h:\ssda\directory\filename' lrecl=nnn;
     INPUT variable column_location;
     transformations and/or recodings;
     RUN;
The program lines above do not automatically save the resulting SAS file. When you have created a SAS dataset, you can use this dataset for the whole interactive SAS session. You do not have to recreate the dataset for each submit . But as soon as you finish a SAS session, all temporary SAS datasets are deleted. If you have a large data file, it will take some time to create the SAS dataset at the beginning of each interactive session. To avoid this, create a permanent SAS dataset with the following program:
LIBNAME abcd "A:\";
     DATA abcd.mydata;
     INFILE 'a:mydata.asc';
     INPUT variable column_location $ ;
     transformations and/or recodings ;
     RUN ;

For information on running procedures, consult the SAS for Windows help page.

Using EFI

Before Starting the process, open or print out the codebook for your datafile (typically a text or .pdf file). This codebook will provide the information you need to separate the data into variables. Also, if you do not already have a SAS library name defined (library is SAS's term for a datapath), you should create one now. Open SAS. Select View>Explorer. Select File>New. Enter a library name (for example "mylibrary") and either browse for or type in a directory path (for example y:\ or c:\temp). Now you have a library, SAS will know where to direct your files and you will know where to find them.

To use SAS's interface to import ASCII data select File>Import Data.

1. Select "User Defined format". Select NEXT.

2. Browse or type in file location (for this example H:\SSDA\0060\cpsvs72.dat). Select NEXT.

3. Select a Library in which your data will be stored ("mylibrary") and type a name for your dataset into the "Member" space. Select NEXT.

4. Select FINISH.

5. A window will appear as below. You will be able to see the first few rows of data for reference; however it is at this point that the codebook is useful. First, for most data sets you will need to click the Options button and Change "Style of Input" to Columns and select OK. This will allow you to delimit variables when they are in fixed positions and of fixed lengths. [Select Style of Input as "List" for data separated by delimiters (e.g. commas); however in this case you would be better off using StatTransfer to create your SAS dataset.] The initial screen will change to allow for the input of information on data positions. (Begin, End, and Length). Second, using the codebook you should enter the variable name into the field name window and the description into "descriptive label". Select whether the variable is a string (character based) or numeric. Now either enter the beginning and ending positions into the boxes provided or use your mouse in the data window (the boxes will fill automatically) to delimit the variable. Finally select "ADD" and the variable will appear in the window to the right of the data. Repeat the second step until all variables are defined. The informat and format boxes should change automatically to adjust for the variable length; however if they do not you can click on the arrows to the right to adjust.

When you have delimited all the variables you need, select file>save to save the dataset. Alternatively, you can close the box and SAS will automatically ask you to save your dataset.

 


Using SAS Transport Files (.xpt files)

These files are transport files for use in SAS to transfer SAS data files across computer platforms. In general, if data files are large, it is recommended that variables be selected using the KEEP command and written onto a smaller file on the c:\temp directory, diskette or zip disk.

If the data can be analyzed as is, without extensive subsetting or transformations of the variables, use the following program lines:

        LIBNAME IN XPORT 'h:\ssda\directory';
        PROC procedure_name DATA=IN.statdata ;
        RUN;
If the data require extensive manipulation before analysis, use the following program lines to implement transformations and to save selected variables in a permanent SAS data file:
        LIBNAME IN XPORT 'h:\ssda\directory';
        LIBNAME OUT 'a:\' ;
        DATA OUT.mydata ;
        SET IN.statdata ;
        transformations_here ; 
        KEEP variable_names ;
        RUN;
For additional information on running procedures, consult the SAS for Windows help page.


Using SAS System Files (.ssd or .sd2 files)

These files are ready for analysis in SAS for Windows. In general, if data files are large, it is recommended that variables be selected using the KEEP command and written onto a smaller file on the c:\temp directory, diskette or zip disk.

If the data can be analyzed as is, without extensive subsetting or transformations of the variables, use the following program lines:

        LIBNAME IN 'h:\ssda\directory';
        PROC procedure_name DATA=IN.statdata ;
        RUN;
If the data require extensive manipulation before analysis, use the following program lines to implement transformations and to save selected variables in a permanent SAS data file:
        LIBNAME IN XPORT 'h:\ssda\directory';
        LIBNAME OUT 'a:\' ;
        DATA OUT.mydata ;
        SET IN.statdata ;
        transformations_here ; 
        KEEP variable_names ;
        RUN;

©2007 Yale University
Social Science Statistical Laboratory
Certifying Authority: Themba Flowers
lm: Fri Apr 11 11:14:36 EDT 2003