home | schedules | software | help | who we are | about | workshops | links | data access | contact us | print version

<  November 2009 >
Su Mo Tu We Th Fr Sa
1
2
3
4
5
6
7
8
9
10
11
12
13
14
15
16
17
18
19
20
21
22
23
24
25
26
27
28
29
30

Reserve a classroom


Schedule for
11/23/2009


Main Lab
140 Prospect St.
Room 101
8:30am- 5pm No Classes

Rosenkranz Hall
115 Prospect St.
Room 01
8:30am- 5pm No Classes

Consultant's Desk
140 Prospect St.
Room 100
10am- 1:30pm Jennifer Green
1:30- 5pm Taylor Arnold


schedules

software

help

who we are

data access

about

workshops

links

SAS Basics

INDEX

The two phases of SAS operation

SAS has two distinct "phases" of operation, known as the DATA step and the PROC step.
The purpose of the DATA step is to get your data into SAS's memory, and perform necessary transformations prior to analysis. The PROC step then normally does some specific analysis (procedure) on the data entered in the DATA step. There is a wide enough variety of SAS PROCedures to cover most users demands.

The DATA step: getting data into SAS

The many details of the DATA step can be found in the SAS Language Guide. Below is a typical DATA step. It is assumed that the data are in an external raw data file, also called Text or ASCII file.
  DATA           temporarysasdatasetname ;
       INFILE  'c:\user\john\mydata.txt' ;
       INPUT   income taxes state $ city $ ;
       loginc  = log(income) ;
       more transformations and recodings ;
     RUN ;
Let's look at these statements one at a time:
The DATA statement tells SAS that a DATA STEP is starting. The temporarysasdatasetname is an internal name (max. 8 characters) by which SAS remembers the data entered in this DATA step, i.e. DATA hw1;
The INFILE statement identifies the external file which contains your raw data.
The INPUT statement tells SAS how to read in the data from the external file. The options available on the INPUT statement are quite complex, but the bottom line is that SAS can read in just about any raw data file you may wish to feed it. If your data are in free format, (variables are separated by one or more blanks), you can use free form INPUT. For example, if you have three variables A, B, C, and your data file looks like this:
        11  1 3
       19 2   42
          20 3 204
         19 1 179
Then entering: INPUT A B C ; will read them in free format into SAS's memory. If you have a character variable, you should append a $ appropriately on the INPUT as in the state $ example above.
If your data are delimited by TABs instead of blanks (i.e. you saved it as an Text file in Quattro Pro or Excel), you have to append the option DLM='09'x to the INFILE statment (the TAB is ASCII hex09 !) Suppose instead that your data are not delimited by blanks or TABs, but look like this:
       11AZ3
       19WI2
       20CT204
       19NY79
You can use column oriented input as follows:
         INPUT A 1-2 B $ 3-4 C 5-7 ;    
There is one aspect to the INPUT statement which is useful to know now. Suppose you have so many variables for each observation that they have been entered on more than one line. An explicit way to tell SAS this is to use line specifiers on the INPUT statement. Lines are specified by pound signs (#) followed by numbers. For example, your data looks like this:
       BUBA 2 5 7 19
       13 16 19 2 3 3
       ROOSTER 13 16 44 34
       3 5 10 10 2 1
then the following INPUT statements could each read the data:
       INPUT     #1 NAME A B C D
            #2 E F G H I J ;
If you don't want to create a separate file containing your data, but want to enter the data right in the program file, use the CARDS statement:
       DATA mydata;
             INPUT age sex $ score;
          CARDS;
            18 F 3
            20 M 4
            ... more data, one line for each record      
            ;
          RUN;

Modiying an existing dataset

If you want to modify your dataset mydata, i.e. deleting all observation, where sex is M, you define a new dataset female:
       DATA female;
          SET mydate;
          If sex='M' then delete;    
       RUN;
Be sure you always choose a new name for the modified dataset. If you choose the same name as the original dataset and anything goes wrong, your original dataset is gone!

Permanent SAS datasets

When you have created a SAS datset, you can use this dataset for the whole interactive SAS session. You don't have to recreate the dataset for each submit . But as soon as you finish a SAS session, all temporary SAS datasets are being deleted. If you have a large data file, it will take some time to create the SAS dataset at the beginning of each interactive session. To avoid this, create a permanent SAS dataset:
        LIBNAME A "A:\";
        DATA a.mydate;
          INFILE      'a:mydata.asc' ;
          INPUT  income taxes state $ city $ ;

The LIBNAME statement tells SAS, where to store or find the permanent SAS datasets. You have to submit a LIBNAME statement only once per session. The dataset name of a permanent SAS dataset has always two parts. The first part shows the location, using the name you give it in the LIBNAME statement. The second part is the actual dataset name. In our example, you will find a file mydata.ssd on your floppy disk.

The PROC STEP: doing analyses

The PROC (procedure) step is where you tell SAS what kinds of analysis you want done on the data entered and manipulated in the DATA step. Normally it follows immediately after the DATA step. Consider a data set containing three numeric variables. You wish to calculate their means, then sort the data and print it out. The following commands would ask SAS to do this for you.
          DATA TEST;
            INFILE 'A:RAW.DAT';
            INPUT A B C;
          PROC MEANS;
          PROC SORT; 
            BY  B  A;
          PROC PRINT;
          RUN;
If you want to use the permanent SAS dataset, you have created before:
        LIBNAME a 'a:\';
          PROC MEANS data=a.mydata;
          RUN;
There are SAS procedures for almost any statistical analysis you would like to perform. For a detailed description of the syntax and the options use online help (available in SAS for Windows and UNIX) or check the manuals, SAS Procedures Guide for basic procedures like PROC PRINT, SAS/STAT User's Guide for most statistical procedures like PROC GLM. If you use a procedure from a specific SAS modul (like PROC SYSLIN which comes with the SAS/ETS module), you have to check the appropriate manual.

Miscellaneous items

  • Missing values: Often you will have missing values for one or another variable. These should always be indicated to SAS by a single period "." in the appropriate place(s) in your data file.
  • Comments: You can sprinkle comments in your SAS command file. Simply code an asterisk "*" in the first column of a line, write your comment, and end it with a semicolon. You can also use /* ... */ to surround your comments anywhere in a SAS command file.
  • Page & Line Sizes: SAS by default will write its LOG and OUTPUT window results with a page length of 25. Should you subsequently wish to print some of these results, you'll find that SAS has generated an annoying amount of blank space on each page, since a page is 66 lines long. To avoid this, the following options command could be placed (at the top) of a command file:
           OPTIONS LINESIZE=80 PAGESIZE=60 ;
    

lm: December 20, 2001