Prediction stata

As the expenditure increases by $1000, the odds of having a cellar decrease by 34%.įor every additional 1000-square feet, the odds of having a cellar increases by 150%. Now the interpretation are more reasonable. Note once again that the model fit characteristics haven’t changed we’ve fit the same model, just with different units. logit cellar dollarel1000 totsqft1000 i.regionc female, orĭollarel1000 |. Instead, let’s scale the variables and re-fit the model. Due to the non-linear relationship between the predictors and the outcome, we cannot simply multiply the odds ratios. Those coefficients are really close to 1 due to scaling: a $1 increase or 1-sqft increase is irrelevant. For every 1 house of expenditure $e$ which has a cellar, you’d expect 0.999583 houses at expenditure $e+1$ to have a cellar. Consider the coefficient on energy expenditure, 0.999583. This means that for every 1 female respondent who has a basement in their house, you would expect 0.8643 male respondents to have a basement.įor continuous predictors, its the odds as the value of the predictor changes. The interpretation of odds ratios can be tricky, so let’s be precise here.įor categorical predictors, the interpretation is fairly straightforward. So a significant odds ratio will be away from 1, rather than away from 0 as in linear regression or the log odds. A value of 1 represents equal odds (or no change in odds). Odds ratios null hypothesis is at 1, not at 0. Notice that the “chi2”, “PseudoR2”, “z” and “P>|z|” do not change - we’re fitting the same model! We’re just changing how the coefficients are represented. We can ask Stata to produce these with the or option.Ĭellar | Odds Ratio Std. To add any interpretability to these coefficients, we should instead look at the odds ratios. All we can say is that “As square footage increases, the probability of a house having a cellar increases.” However, we cannot nicely interpret these coefficients, which are known as the “log odds”. We see that square footage and energy expenditure have significant coefficient (positive and negative respectively), and there appears to be no gender effect. The coefficients table is interpreted in almost the same way as with regression. We’ll discuss measuring goodness of fit below.

It is not uncommon to get pseudo- $R^2$ values that are negative or above 1. There have been various pseudo- $R^2$’s suggested, and Stata reports one here, but be careful assigning too much meaning to it. When we move away from linear regression, we no longer get an $R^2$ measure. In this model, we reject the null that all coefficients are identically 0. Instead of an ANOVA table with a F-statistic to test model significance, there is instead a “chi2” ( $\chi^2$, pronounced “ky-squared” as in “Kyle”). However, almost every other type of regression lacks a closed form solution, so instead we solve it iteratively - Stata guesses at the best coefficients that minimize error 10, and uses an algorithm to repeatedly improve those coefficients until the reduction in error is below some threshold.įrom this output, we get the “Number of obs” again. That is because for OLS we have a “closed form solution” - we just do some quick math and reach an answer.

When you try this yourself, you may notice that its not quite as fast as regress. Logistic regression Number of obs = 4,231 logit cellar dollarel totsqft_en i.regionc female

(1,455 real changes made, 1,455 to missing) Let’s run a model predicting the presence of a cellar based on square footage, region and electricity expenditure. We can fit a logistic regression using the logit command in State. We’ll talk about a few link functions and the regression models they define. If the link function is the identify function, $f(x) = x$, the GLM simplifies to ordinary least squares. The function, $f()$, is called the “link” function. Therefore, even though the function $f()$ may not be linear, the model is still linear - hence “generalized linear model”. Non-linear regession refers to something such as Note that this is still linear in $X$ (the right-hand side). We can modify this by allowing the left hand side to be a function of $Y$, We can generalize the model from ordinary least squares to allow a non-linear relationship between the predictors and the outcome, which may fit different outcomes better. For example, if the response is a binary indicator, an OLS model fit may predict an individual has a negative response. If the outcome variable is not continuous, while OLS will usually be able to be fit, the results may be unexpected or undesired.

3.2.2 Interactions, categorical variables, margins, predict.

3.1.2 Categorical Variables and Interactions.2.7.1 Relationship is linear and additive.