SAS Basics
INDEX
- The two phases of SAS operation
- The DATA step: getting data into SAS
- Modifying an existing dataset
- Permanent SAS datasets
- The PROC STEP: doing analyses
- Missing values, Comments, Page&Line Size
The two phases of SAS operation
- SAS has two distinct "phases" of operation,
known as the DATA step and the PROC step.
The purpose of the DATA step is to get your data into SAS's memory, and perform necessary transformations prior to analysis. The PROC step then normally does some specific analysis (procedure) on the data entered in the DATA step. There is a wide enough variety of SAS PROCedures to cover most users demands.
The DATA step: getting data into SAS
- The many details of the DATA step can be found in the
SAS Language Guide. Below is a typical DATA step. It is assumed that
the data are in an external raw data file, also called Text or ASCII file.
DATA temporarysasdatasetname ; INFILE 'c:\user\john\mydata.txt' ; INPUT income taxes state $ city $ ; loginc = log(income) ; more transformations and recodings ; RUN ;Let's look at these statements one at a time:
The DATA statement tells SAS that a DATA STEP is starting. The temporarysasdatasetname is an internal name (max. 8 characters) by which SAS remembers the data entered in this DATA step, i.e.DATA hw1;
The INFILE statement identifies the external file which contains your raw data.
The INPUT statement tells SAS how to read in the data from the external file. The options available on the INPUT statement are quite complex, but the bottom line is that SAS can read in just about any raw data file you may wish to feed it. If your data are in free format, (variables are separated by one or more blanks), you can use free form INPUT. For example, if you have three variables A, B, C, and your data file looks like this:11 1 3 19 2 42 20 3 204 19 1 179Then entering:INPUT A B C ;will read them in free format into SAS's memory. If you have a character variable, you should append a $ appropriately on the INPUT as in thestate $example above.
If your data are delimited by TABs instead of blanks (i.e. you saved it as an Text file in Quattro Pro or Excel), you have to append the optionDLM='09'xto the INFILE statment (the TAB is ASCII hex09 !) Suppose instead that your data are not delimited by blanks or TABs, but look like this:11AZ3 19WI2 20CT204 19NY79You can use column oriented input as follows:INPUT A 1-2 B $ 3-4 C 5-7 ;
There is one aspect to the INPUT statement which is useful to know now. Suppose you have so many variables for each observation that they have been entered on more than one line. An explicit way to tell SAS this is to use line specifiers on the INPUT statement. Lines are specified by pound signs (#) followed by numbers. For example, your data looks like this:BUBA 2 5 7 19 13 16 19 2 3 3 ROOSTER 13 16 44 34 3 5 10 10 2 1then the following INPUT statements could each read the data:INPUT #1 NAME A B C D #2 E F G H I J ;If you don't want to create a separate file containing your data, but want to enter the data right in the program file, use the CARDS statement:DATA mydata; INPUT age sex $ score; CARDS; 18 F 3 20 M 4 ... more data, one line for each record ; RUN;
Modiying an existing dataset
- If you want to modify your dataset mydata, i.e.
deleting all observation, where sex is M, you define a new dataset female:
DATA female; SET mydate; If sex='M' then delete; RUN;Be sure you always choose a new name for the modified dataset. If you choose the same name as the original dataset and anything goes wrong, your original dataset is gone!
Permanent SAS datasets
- When you have created a SAS datset, you can use this
dataset for the whole interactive SAS session. You don't have to recreate the
dataset for each submit . But as soon as you finish a SAS session, all
temporary SAS datasets are being deleted. If you have a large data file, it
will take some time to create the SAS dataset at the beginning of each
interactive session. To avoid this, create a permanent SAS dataset:
LIBNAME A "A:\"; DATA a.mydate; INFILE 'a:mydata.asc' ; INPUT income taxes state $ city $ ;The LIBNAME statement tells SAS, where to store or find the permanent SAS datasets. You have to submit a LIBNAME statement only once per session. The dataset name of a permanent SAS dataset has always two parts. The first part shows the location, using the name you give it in the LIBNAME statement. The second part is the actual dataset name. In our example, you will find a file mydata.ssd on your floppy disk.
The PROC STEP: doing analyses
- The PROC (procedure) step is where you tell SAS what
kinds of analysis you want done on the data entered and manipulated in the DATA
step. Normally it follows immediately after the DATA step. Consider a data set
containing three numeric variables. You wish to calculate their means, then
sort the data and print it out. The following commands would ask SAS to do this
for you.
DATA TEST; INFILE 'A:RAW.DAT'; INPUT A B C; PROC MEANS; PROC SORT; BY B A; PROC PRINT; RUN;If you want to use the permanent SAS dataset, you have created before:LIBNAME a 'a:\'; PROC MEANS data=a.mydata; RUN;There are SAS procedures for almost any statistical analysis you would like to perform. For a detailed description of the syntax and the options use online help (available in SAS for Windows and UNIX) or check the manuals, SAS Procedures Guide for basic procedures like PROC PRINT, SAS/STAT User's Guide for most statistical procedures like PROC GLM. If you use a procedure from a specific SAS modul (like PROC SYSLIN which comes with the SAS/ETS module), you have to check the appropriate manual.
Miscellaneous items
-
- Missing values: Often you will have missing values for one or another variable. These should always be indicated to SAS by a single period "." in the appropriate place(s) in your data file.
- Comments: You can sprinkle comments in your SAS command file. Simply code an asterisk "*" in the first column of a line, write your comment, and end it with a semicolon. You can also use /* ... */ to surround your comments anywhere in a SAS command file.
- Page & Line Sizes: SAS by default will write
its LOG and OUTPUT window results with a page length of 25. Should you
subsequently wish to print some of these results, you'll find that SAS has
generated an annoying amount of blank space on each page, since a page is 66
lines long. To avoid this, the following options command could be placed (at
the top) of a command file:
OPTIONS LINESIZE=80 PAGESIZE=60 ;
lm: December 20, 2001
