Objectives:
If your response variable is binary and you want to analyze the relation between the binary variable and other variables, logistic regression can deal with that.
Assume the binary variable, y, only contains two value 0 and 1.
Here is the logistic regressions expression:
So, the outcome of logistic regression is the probability of getting the response variable equals to 1.
The is the odds ratio.
The general code can be expressed as following:
PROC LOGISTIC DATA=SAS-data-set ;
CLASS variables ;
MODEL
response=predictors ;
UNITS
independent1=list;
ODDSRATIO <label> variable ;
OUTPUT OUT=SAS-data-set keyword=name ;
RUN;
CLASS names the
classification variables to be used in the analysis. The CLASS statement must
precede the MODEL statement. By default, these variables will be analyzed using
effects coding parameterization. This can be changed with the PARAM= option.
MODEL specifies
the response variable and the predictor variables.
OUTPUT creates
an output data set containing all the variables from the input data set and any
requested statistics.
UNITS enables
you to obtain an odds ratio estimate for a specified change in a predictor
variable. The unit of change can be a number, standard deviation (SD), or a
number times the standard deviation (for example, 2*SD).
ODDSRATIO
produces odds ratios for variables even when the variables are involved in
interactions with other covariates, and for classification variables that use
any parameterization. You can specify several ODDSRATIO statements.
Here is an example:
proc logistic
data=sasuser.Titanic alpha=.05 plots(only)=(effect oddsratio);
model
Survived(event='1')=Age / clodds=pl:
Survived=Age;
run;
SAS output:
The Model
Information table describes the data set, the response variable, the number of
response levels, the type of model, the algorithm used to obtain the parameter
estimates, and the
number of
observations read and used.
The Number of
Observations Used is the count of all observations that are nonmissing for all
variables specified in the MODEL statement. The ages of 263 of these 1309
passengers cannot be determined and cannot be used to estimate the model.
The Response
Profile table shows the response variable values listed according to their ordered
values. By default, PROC LOGISTIC orders the response variable alphanumerically
so that it bases the logistic regression model on the probability of the
smallest value. Because you used the EVENT=option in this example, the model is
based on the probability of surviving (Survived=1). The Response Profile table
also shows frequencies of response values.
The Model Fit
Statistics provides three tests:
AIC is Akaikes A information
criterion. SC
is the Schwarz criterion.
-2 Log L is -2 times the natural log of the likelihood. -2 Log L, AIC,
and SC are goodness-of-fit measures that you can use to compare one model to
another. These statistics measure relative fit among models, but they do not
measure absolute fit of any single model. Smaller values for all of these
measures indicate better fit.
The Testing
Global Null Hypothesis: BETA=0 table provides three statistics to test the null
hypothesis that all regression coefficients of the model are 0.
The Analysis of
Maximum Likelihood Estimates table lists the estimated model parameters, their
standard errors, Wald Chi-Square values, and p-values.
It can be expressed in the below.
The Wald
chi-square and its associated p-value tests whether the parameter estimate is
significantly different from 0. For this example, the p-values for the variable
Age is not significant at the 0.05 significance level (p=0.0696). It cannot be
concluded that Age is not important in a multivariate model.
The odds ratio and confidence interval can be found in the following
table.