In today’s post I want to take a closer look at how to interpret the multinomial logistic regression output of STATA and for this I am going to use the example of smoking behaviour. In Econometrics the multinomial logit model (MLM) is used if the dependent variable on the left-hand side of the equation has several discrete alternatives as opposed to a binary variable (0=no, 1=yes). Furthermore, the independent variables on the right-hand side can be based on chooser-specific data (e.g. gender, education or income) but also choice-specific data. However, I am going to focus only on choice-specific data today. My model is therefore going to explain how the respondents’ characteristics affect their choice of an alternative among a set of alternatives.

In particular the model is going to establish which characteristics of respondents are determinants of smoking behaviour. For this I have obtained the European Social Survey 7.0 which was conducted in 2014. I am using edition 1.0 which was released on 28 October 2015. Among the aims of the ESS are monitoring changes in public attitudes as well as developing a series of European social and attitudinal indicators. The seventh round of the survey covered 22 countries and 28,221 individuals. The survey consists of an hour-long face-to-face interview with core sections as well as rotating modules. The core sections cover the socio-demographic profile of the respondents as well as things like social trust, political interest, socio-political orientations and human values. In the seventh round the two rotating modules covered (1) social inequalities in health and their determinants and (2) respondents’ attitudes towards immigration. One of the questions in the first rotating module assesses respondents’ smoking behaviour. Respondents were asked which of the following descriptions best described their smoking behaviour:

- I smoke daily
- I smoke but not every day
- I don’t smoke but I used to
- I have only smoked a few times
- I have never smoked

This allows me to construct a detailed multinomial logit model in which the first and second answer define current smokers, the third answer equals former daily smokers and the fourth former party smokers, while the fifth answer to the question defines respondents that have never smoked.

In terms of the independent variables, I include a range of demographic control factors, namely age, gender, ethnicity, immigration and family status, education, income, employment status. In addition, I include two dummy variables for mild and significant depression as well as four dummy variables for various levels of alcohol consumption. For more information regarding the coding of the variables, please refer to the coding overview below.

**Coding overview**

*Smoking:*never smoked=1, former party smoker (I have only smoked a few times)=2, former daily smoker (I used to smoke)=3, current smoker (I smoke daily or I smoke but not every day)=4*Low_educ (ref.):*1 if lower secondary education or less, 0 otherwise*Medium_educ:*1 if upper secondary education, 0 otherwise*High_education:*1 if post-school education (vocational or tertiary), 0 otherwise*Low_income:*1 if income in 1^{st}– 3^{rd}decile, 0 otherwise*Medium_income:*1 if income in 4^{th}– 6^{th}decile, 0 otherwise*High_income:*1 if income 7^{th}– 10^{th}decile, 0 otherwise*Age:*age in years*Female:*1 if female, 0 otherwise*Employed:*1 if respondent was employed in the past 7 days, 0 otherwise*Children:*1 if children currently living at home, 0 otherwise*Minority:*1 if respondent belongs to a minority ethnic group in country, 0 otherwise*Immigrant:*1 if respondent was not born in the country*No depression (ref.):*1 if respondent felt depressed none or almost none of the time in the past week, 0 otherwise*Mild depression:*1 if respondent felt depressed some of the time in the past week, 0 otherwise*High depression*: 1 if respondent felt depressed most of the time or all/ almost all of the time in the past week, 0 otherwise*Daily drinker:*1 if respondent consumes alcohol every day, 0 otherwise*Frequent drinker:*1 if respondent consumes alcohol several times a week, 0 otherwise*Weekly drinker:*1 if respondent consumes alcohol once a week, 0 otherwise*Monthly drinker:*1 if respondent consumes alcohol 2-3 times a month or once a month, 0 otherwise*No/Infrequent drinker (ref.):*1 if respondent consumes alcohol less than once a month or never, 0 otherwise

**Summary statistics**

Before proceeding to the estimation of the MLM, let’s take a quick look at the summary statistics to ensure that there are no coding errors. With the help of the – **sum** – command STATA produces an overview on mean, standard deviation as well as minimum and maximum for each of the variables. The table shows no anomalies except for age. When examining the outlier of 114, one can immediately see that age is likely to be a coding error due to the respondent being in paid work and not being retired. Therefore, age is recoded to missing for this observation. There are still 5 observations with an age of 100 or older. However, a closer look at their responses suggest that they are valid as all of them are retired.

In my sample 42 percent of the respondents said that they have never smoked while around 10.8 percent are former party smokers, 23.4 percent are former daily smokers and the remaining 23.8 are percent are currently smoking. In terms of education, 24.42 percent received only little education, 37.84 percent have upper secondary education (medium) and 37.74 percent have post-school education (high). Almost 30 percent of the respondents fall into the lower three income deciles (low income), Around 32 percent fall into the 4^{th} to 6^{th} income decile (middle income) while the remainder of the respondents (38 percent) fall into the high income category. It should be noted that the decile cut-offs vary between countries, so that these income categories will differ in what exact amount of money they represent among countries. However, the interpretation does not change, because respondents compared themselves to national standards and have low, medium or high incomes compared to the population in their respective country.

The age of respondents ranges from 14 to 104 years. The median age is 49 and therefore very close to the mean. Around 68 percent of the respondents in the sample are between 30 and 68 years old. There are slightly more females (52 percent) in the sample than males. 53 percent of the respondents said that they were employed during the last 7 days. Almost 33 percent of the respondents have children living at home, 5.65 percent belong to a minority, and 10.75 percent were born in another country. 67.74 percent of the respondents said that they felt depressed none or almost none of the time in the past week. On the other hand, 26.55 percent felt depressed some of the time (mild depression) and 5.7 percent felt depressed most of the time or all/almost all of the time (high depression). In the sample, 6.34 percent consume alcohol every day (daily drinker). 16.61 percent drink alcohol several times a week (frequent drinker), while 19.56 percent of the respondents drink alcohol once a week (weekly drinker). Monthly drinkers (2-3 times a month or once a month) are 24.26 percent of the respondents while the remainder are infrequent drinkers that either drink less than once a month or never (33.22 percent).

**Regression results**

I will present both the regression results in form of coefficients as well as the relative risk ratios (RRR). Let’s begin with the coefficients and a general analysis of my model.

First, it can be seen that the model includes only 22,018 observations as STATA deletes incomplete cases list-wise. Second, the Likelihood Ratio Chi-Square Statistic is 3182.55. The corresponding LR Chi-Square Test tests the assumption that the coefficients of all independent variables are jointly equal to zero. The probability of obtaining an LR Test statistic of 3182.55 or more if all coefficients were jointly equal to zero (the null hypothesis) is practically zero. It can be concluded that the model as a whole is significant.

Thereafter one can interpret the significance of the coefficients. In the first panel ‘former party smoker’ minority and mild depression are significant at the 10 percent level. Immigrant is significant at the 5 percent level and medium and high education, high income, age, female, children and all drinker dummies are significant at the 1 percent level. In the second panel ‘former daily smoker’ the variables medium and high income are significant at 5 percent level. The dummy variables on education and drinking behaviour as well as age, female, children, minority and high depression are significant at 1 percent level. In the third panel ‘current smoker’ the dummy variables on depression, drinking behaviour and income are all significant at 1 percent level. Also high education, age, female, employed and children are significant at 1 percent level.

The sign of the coefficients can be interpreted as follows: A positive coefficient indicates increased odds for the outcome 2 over 1, outcome 3 over 1, or outcome 4 over 1. A negative coefficient indicates decreased odds for the outcome 2 over 1, outcome 3 over 1, or outcome 4 over 1. The regression result always has to be interpreted relative to the base outcome, which is that the respondent has never smoked. For example, higher incomes increase the odds of being a former party smoker or former daily smoker but decreases the odds of being a current smoker compared to the odds of having never smoked, ceteris paribus. Similarly, having obtained more education is associated with an increase in the odds of being a former party smoker or former daily smoker (only medium educ) but with a decrease in the odds of being a current smoker compared to the odds of having never smoked everything else held constant. Being female reduces the odds of all outcomes compared to the odds of the base outcome, ceteris paribus. The other coefficients can be interpreted in a similar fashion. However, the numbers cannot be interpreted easily. This is why it is common to turn to relative risk ratios instead.

**Relative Risk Ratios**

Relative risk ratios (RRR) can be interpreted in a similar manner to odds ratios in the ordinary logit model. They are merely the exponentiated MLM coefficients from the regression output above. STATA can compute them automatically if one adds RRR or RR to the – **mlogit** – command:

*mlogit smoking medium_educ high_educ medium_income high_income age female employed children minority immigrant mild_depression high_depression daily_drinker frequent_drinker weekly_drinker monthly_drinker, baseoutcome(1) rrr*

Let’s start with the RRRs in the first panel. The risk of being a former party smoker vs. never smoker for respondents with medium education compared to respondents with low education is 1.20 times greater, i.e. 120 percent. Likewise the risk of being a former party smoker vs. never smoker for respondents with high education compared to respondents with low education is 1.37 times greater, i.e. 137 percent, ceteris paribus.

The risk of being a former party smoker vs. never smoker for respondents with a high income relative to respondents with a low income is about 18.85 percent higher, holding everything else constant. The risk of being a former party smoker vs. never smoker falls by about 1.7 percent for each additional year of age, all else being equal. The risk of being a former party smoker vs. never smoker is 21.71 percent lower for females relative to males, when everything else is held constant. The risk of being a former party smoker vs. never smoker is 14.63 percent lower for respondents living with children compared to respondents that do not. The risk of being a former party smoker vs. never smoker is 20.59 percent lower for members of a minority ethnic group compared to non-minorities, ceteris paribus. Similarly, the risk is 17.01 percent lower for immigrants compared to non-immigrants.

Having a mild depression increases the risk of being a former party smoker vs. never smoker by about 10.25 percent compared to no depression. Lastly, the risk of being a former party smoker vs. never smoker is 106.34 percent higher for daily drinkers, 105.48 percent higher for frequent drinkers, 88.62 percent higher for weekly drinkers and 75 percent higher for monthly drinkers compared to less frequent drinkers, ceteris paribus. Hence, drinking alcohol more frequently increases the odds of being a former party smoker over being a never smoker significantly and shows that alcohol and cigarette consumption tend to go together. It does not have to be a causal relationship but one could argue that social drinking induces individuals to at least try smoking once in their lives.

The RRRs in the second panel can be interpreted as follows. Having obtained a medium level of education compared to low education levels increases the odds of being a former daily smoker vs. never smoker by 14.22 percent, ceteris paribus. Likewise, earning a medium income or a high income increases the odds in favour of former daily smoking over never have smoked by 11.64 percent and 12.96 percent, respectively. An additional year of age increases the odds of being a former daily smoker instead of having never smoked by 1.85 percent, holding everything else constant. The odds of being a former daily smoker vs. never smoker is 38.95 percent lower for females compared to males. The risk of being a former daily smoker rather than a never smoker increases by 15.43 percent if the respondent has children living at home compared to respondents without children at home. The risk of being a former daily smoker vs. never smoker is 36.24 percent lower for minorities compared to non-minorities. High levels of depression increase the odds in favour of being a former daily smoker relative to never having smoked by 25.55 percent. The risk of being a former daily smoker vs. never smoker is 213.84 percent higher for daily drinkers, 158.73 percent higher for frequent drinkers, 80.64 percent higher for weekly drinkers and 59.67 percent higher for monthly drinkers compared to less frequent drinkers, ceteris paribus. Again, drinking alcohol more frequently increases the odds of being a former daily smoker over being a never smoker significantly and confirms the view that alcohol and cigarette consumption tend to go together. Respondents that drink more than once a week are predicted to be former daily smokers rather than never smokers.

In the third panel the RRRs can be interpreted as follows. Having obtained a high level of education compared to low education levels decreases the odds of being a current smoker vs. never smoker by 45.84 percent, ceteris paribus. Likewise, earning a medium income or a high income decreases the odds in favour of currently smoking over never having smoked by 23.78 percent and 41.65 percent, respectively. An additional year of age decreases the odds of currently smoking over having never smoked by 1.82 percent, holding everything else constant. The odds of being a current vs. never smoker is 32.96 percent lower for females compared to males. The risk of being a current smoker vs. never smoker is 38.27 percent higher for respondents currently employed to respondents not currently employed.

Medium levels of depression compared to no depression increase the odds in favour of being a current smoker relative to never having smoked by 31.49 percent. High levels of depression increase the odds in favour of being a current smoker relative to never having smoked by 122.13 percent. The risk of being a current smoker vs. never smoker is 289.93 percent higher for daily drinkers, 166.62 percent higher for frequent drinkers, 82.29 percent higher for weekly drinkers and 41.33 percent higher for monthly drinkers compared to less frequent drinkers, ceteris paribus. Again this confirms that drinking alcohol more frequently increases the odds of being a former daily smoker over being a never smoker significantly and confirms also that alcohol and cigarette consumption tend to go together. Respondents that drink more than once a week are predicted to be current smokers rather than never smokers.

The risk of being a current smoker rather than never smoker increases by 11.90 percent if the respondent has children living at home compared to respondents without children at home. This does not infer a causal relationship in the sense that children cause people to smoke. However, this finding is troublesome in the sense that the RRR was expected to be negative, i.e. that having children at home decrease the odds in favour of being a current smoker over never smoker. The command – **adjrr children** – can be used to shed light on the relationship between having children living at home or not and the smoking outcomes. Respondents with children at home are 4.17 percent less likely to be a never smoker than respondents without children at home. This group is also 18.08 percent less likely to be a former party smoker compared to respondents not living with children. However, this group is 9.57 percent more likely to be former daily smokers compared to respondents not living with children at home and this group is also 6.55 percent more likely to be current smokers compared to respondents not living with children at home. In terms of absolute differences, respondents with children at home are 1.52 percentage points more likely to be current smokers than respondents without children at home, on average. They are also 2.33 percentage points more likely to be former daily smokers, on average.

**Measures of fit**

After having described the findings the – **fitstat** – command can shed light on the overall goodness of fit.

For example, the adjusted count R-squared measures the proportion of correct predictions beyond the baseline model (IDRE, 2011). It shows that the percentage of correct predictions beyond this baseline model is 8.6 percent. Hence, while my variables turn out to be significant at the margin, the overall decision to smoke or having tried smoking remains still largely random and is not captured in the model. There might be other factors that could do a better job and should be included in the model.

**Marginal effects**

STATA allows for the computation of marginal effects with the help of the – **margins** – command. Marginal effects differ for discrete and continuous variable where the former are discrete changes, i.e. from 0 to 1, and the latter are instantaneous rates of change. Marginal effects are commonly calculated at the means of the independent variables. Therefore STATA first presents all means before printing the results.

First of all it can be noted that only age is a continuous variable. All other variables are binary and take only a value of 0 or 1. The marginal effects for all those variables therefore show how P(Y=1) changes as these independent binary variables change from 0 to 1 while all other variables are held constant at their means (Williams, 2016). For example, the predicted probability of being a current smoker compared to never having smoked is 0.158 greater for daily drinkers and 0.102 greater for frequent drinker if you take two hypothetical respondents evaluated at the means. Another example is education. For two hypothetical respondents evaluated at the means of the sample, having obtained high education reduces the probability of being a current smoker, i.e. the predicted probability of being a current smoker is 0.118 smaller for individuals in the high education group compared to individuals in the low education group. In contrast, the negative effect of secondary education is a lot smaller and less significant.

**Regression diagnostics: ****I. Multicollinearity**

The model can be tested for collinearity with the –** collin** – command, which would cause standard errors to be inflated.

There are different rules of thumb for detecting multicollinearity. The most rigorous is probably a Variance Inflating Factor (VIF) of greater than 2 and therefore a tolerance of lower than 0.5 (1/VIF). However, my model does not suffer from inflated standard errors and a mean VIF of 1.37 is pretty good.

**II. Tests of independent variables**

The – **mlogtest** – command allows for testing for independent variables. There is the option for a likelihood ratio test (lr) as well as a wald test (wald). Both test the null hypothesis whether all coefficients associated with the given variables are in fact zero (Williams, 2015).

Both tests reject the null hypothesis for all variables at the 1 percent level except for the immigrant variable. For this one the null hypothesis can be rejected at the 5 percent level. Hence each variable’s effects are highly significant in the model.

**III. Tests for combining dependent categories**

The – **mlogtest** – command also allows for testing whether the categories of the dependent variable should in fact be combined. Again, there is the option for a Likelihood-Ratio test (lrcomb) as well as a Wald test (combine). The null hypothesis is that all coefficients except intercepts associated with a given pair of alternatives are in fact zero, meaning that the alternatives can be collapsed for a more efficient estimation (Williams, 2015).

Overall, both the LR and the Wald Test confirm that none of the categories should be combined. They are significant at the 1 percent level. It can be concluded that the outcomes are distinguishable with respect to the variables included in the model.

**IV. Tests for independence of irrelevant alternatives**

Lastly, the –** mlogtest** – command can test the independence of irrelevant alternatives (IIA) assumption which is crucial for the multinomial logit model. If violated one can revert to an alternative specific multinomial probit or a nested logit model. Both relax the IIA assumption (IDRE, 2010). The test for IIA is either based on a Hausman test, a suest-based Hausman test or a Small-Hsiao test. All of the three tests work in a similar manner; for each alternative in the model they drop the individuals that choose that particular alternative and then re-estimate the model with the alternatives that remain (Allison, 2012). Because I have 3 alternatives in my model (beyond the base outcome of never having smoked), the tests proceed in three steps. They first drop being a former party smoker (fparty), then drop being a former daily smoker (fdaily) and lastly drop being a current smoker (current). If the IIA assumption were to hold, the results of the restricted model (2 alternatives) should not differ from the unrestricted model (3 alternatives).

It should be noted that the tests have been criticized, because they are typically inconclusive or even contradictory. In case you want more information on this, Peter Allison (2012) has devoted a complete blog post to the drawbacks of the three tests. One of the major criticisms is that the Small-Hsiao test results in different outcomes every time because it splits the sample into two halves and also the Hausman test results in different outcomes if one changes the base category (Sarkisian, n.d.). This is why it is often recommended to instead focus on the Hausman test which uses seemingly unrelated estimation (SUE) as methodology (Long and Freese, 2005).

In STATA one can obtain the three tests with the command – **mlogtest, iia**. Firstly, the Hausman test does not provide me with anything because the Chi2<0 and therefore my model does not meet the asymptotic assumptions of the test. Second, the suest-based Hausman test provides strong evidence against independence of irrelevant alternatives in the sample. It rejects the null hyptothesis at the 1 percent level. However the third test, i.e. the Small-Hsiao test of the IIA assumption, cannot reject the null hypothesis that the odds are independent of other alternatives. It contradicts the results of the suest-based Hausman test. As noted earlier, this is in line with the major criticisms toward IIA testing. To ensure that the violation of the IIA assumption does not interfere with my results, I should consider running an alternative specific multinomial probit or a nested logit model. However, I’ll leave this for another blog post in the future!

Thanks for reading,

Jasse

**Data and Documentation**

ESS Round 7: European Social Survey (2015): ESS-7 2014 Documentation Report. Edition 1.0. Bergen, European Social Survey Data Archive, Norwegian Social Science Data Services for ESS ERIC.

ESS Round 7: European Social Survey Round 7 Data (2014). Data file edition 1.0. Norwegian Social Science Data Services, Norway – Data Archive and distributor of ESS data for ESS ERIC.

**Inspiration for the Model**

Brown, D.C. (n.d.). Models for Ordered and Unordered Categorical Variables [pdf]. *Population Research Center. *Retrieved from: https://www.utexas.edu/cola/prc/_files/cs/Multinomial_Ordinal_Models.pdf

**References**

Allison, P. (2012, 8 October). How Relevant is the Independence of Irrelevant Alternatives? *Statistical Horizons.* Retrieved from: http://statisticalhorizons.com/iia

IDRE (23 April, 2010). Stata Data Analysis Examples: Multinomial Logistic Regression. *Institute for Digital Research and Education.* Retrieved from: http://www.ats.ucla.edu/stat/stata/dae/mlogit.htm

IDRE (2011, 20 October). FAQ: What are pseudo R-squareds? *Statistical Horizons. *Retrieved from: http://www.ats.ucla.edu/stat/mult_pkg/faq/general/Psuedo_RSquareds.htm

Long, J., and Freese, J. (2005). *Regression Models For Categorical Dependent Variables Using Stata* (2nd ed.). College Station, TX: Stata Press.

Sarkisian, N. (n.d.). *Sociology 704: Topics in Multivariate Statistics – Multinomial Logit* [pdf]. Retreived from: http://www.sarkisian.net/sc704/mlogit.pdf

Williams, R. (2015, 21 February). *Post-Estimation Commands for MLogit* [pdf]. Retrieved from: https://www3.nd.edu/~rwilliam/stats3/Mlogit2.pdf

Williams, R. (2016, 23 January). *Marginal Effects for Continuous Variables* [pdf] Retrieved from: http://www3.nd.edu/~rwilliam/xsoc73994/Margins02.pdf