home | schedules | software | help | who we are | about | workshops | links | data access | contact us | print version

<  November 2009 >
Su Mo Tu We Th Fr Sa
1
2
3
4
5
6
7
8
9
10
11
12
13
14
15
16
17
18
19
20
21
22
23
24
25
26
27
28
29
30

Reserve a classroom


Schedule for
11/24/2009


Main Lab
140 Prospect St.
Room 101
8:30am- 5pm No Classes

Rosenkranz Hall
115 Prospect St.
Room 01
8:30am- 5pm No Classes

Consultant's Desk
140 Prospect St.
Room 100
10am- 1:30pm Jeremy Green
1:30- 5pm Jeremy Green


schedules

software

help

who we are

data access

about

workshops

links

Proc Logistic, Proc Probit and Proc Catmod in SAS

This note discusses the proper interpretation of the results of logit (logistic regression) and probit analyses performed with SAS's Logistic, Probit and Catmod procedures. For some inexplicable reason, SAS sets up these analyses differently from what is described in standard textbook treatments and differently from the way they are handled by other statistical packages. Unfortunately, this SAS quirk is not discussed either prominently or clearly in the relevant SAS documentation. But forewarned is forearmed--read on.

When performing a logit or probit analysis, we are usually interested in what factors influence the probability of an outcome Y, where Y as two possible values. In the social sciences, the values of the Y variable are typically assigned such that 1 represents a response or event which we are interested in explaining and 0 a non-response or non-event. For example, if we were interested in studying what factors explain retirement decisions using a cohort of 65-year-olds, we would set Y=1 if the individual were retired and 0 otherwise. The sign of the coefficient on one of our independent variables would then tell us whether an increase in that variable increased or decreased the probability that a person would retire before age 65.

In looking at retirement decisions, we would define p=Prob(Y=1) for the purposes of our analysis. Here is where the problem lies. SAS always assigns p to be the probability of the lower value of Y, in our case Prob(Y=0). Thus, given the way we have defined the Y variable, SAS will use the independent variables to try to explain the decision to remain in the workforce--exactly the opposite of what we intended! All of our coefficient estimates will be correct in absolute value, but will have the wrong sign.

Fortunately, rather than just thinking about everything in reverse, there are simple ways to convince SAS to produce output according to social scientific coding conventions.

PROC LOGISTIC

PROC LOGISTIC descending;
  MODEL <dep var>=<ind vars>;
or, equivalently,
PROC SORT;  by descending <dep var>;

PROC LOGISTIC order=data;
  MODEL <dep var>=<ind vars>;

PROC PROBIT

PROC SORT;  by descending <dep var>

PROC PROBIT order=data;
  CLASS <dep var>;
  MODEL <dep var>=<ind vars>;

PROC CATMOD

PROC SORT;  by descending <dep var>

PROC CATMOD order=data;
  DIRECT <ind vars>;
  MODEL <dep var>=<ind vars>;

lm: December 20, 2001