What Makes A Musician? Econometrics

Today my post is about econometrics; in particular gettting a handle on the Logit model. I took it as an opportunity to investigate why some people have come to play a musical instrument and others haven’t. In order to do so I needed to find a dataset which also includes questions about an individual’s free time activities, i.e. a survey that asks respondents whether they play an instrument for leisure. This is why I obtained the National Survey of Culture, Leisure and Sport 2014-2015 from the UK Data Service. The survey is carried out by the Department of Culture, Media and Sport and its partner organisations Sport England, English Heritage and Arts Council England. Since the 2012/13 survey, the study also includes longitudinal elements and for the 2014/15 survey the target was to achieve a sample size of 10,000 respondents, equally split between longitudinal and new respondents. This study asks people a large range of questions concerning their cultural activities, events they participate in, as well as hobbies and sports. It has also introduced a section about what people were doing when they were growing up recently, which will come handy for constructing a variable on whether a person played an instrument during childhood.

Summary statistics

Let’s start with an overview on the data. The dataset includes 9,817 observations of which 5,480 are female (55.82 percent). Regarding the demographics, the mean age in the sample is 53 years while the median lies at 54 years. The minimum age is 16 (91 observations) and the maximum age is 100 (1 observation). The standard deviation is around 18.53, meaning that around 68 percent of the observations are between 35 and 72 years old. More than half of the respondents are married, 20 percent are single, while around 12 percent are either widowed or divorced. Around 5.5 percent of the respondents are lone parents with live-in children and around 18.4 percent of the respondents have live-in children and currently live with a partner. Almost 32 percent of the respondents have obtained higher education and professional/vocational equivalents as their highest qualification level, meaning that the other 5,409 respondents have obtained qualifications lower than this. In addition, 48.57 percent of the respondents are currently in paid work. The remainder are for example unemployed, students, retired, sick people, people looking after family, or people in training schemes.

In their free time, 914 out of the 9,817 respondents play a musical instrument. This is equal to 9.31 percent of the sample. Besides, 171 respondents (1.74 percent) have written music in the 12 months preceding the questionnaire while the remainder of 9,646 have not. Also around 12 percent of the observations have participated in painting, drawing, printmaking or sculpture in the 12 months preceding the interview. Overwhelmingly, almost 66 percent of the respondents said that they have read for pleasure in the last 12 months, which excludes newspapers, magazines or comics. In terms of music making, 427 of the respondents have done singing as performance or rehearsal/ practice in the last 12 months, which is equal to 4.35 percent of the sample. 2,206 respondents played a musical instrument, acted, danced or sang when they were growing up. Growing up is defined as the period from around age 11 to age 15. On the other hand, 2,621 of the respondents did not participate in such activities. Another 4,990 observations were not asked this question in their longitudinal questionnaire. Hence this will reduce the sample to an effective size of 4,827 observations at most later on.

The Logit Model

This leaves us with 11 dependent regressors and the independent dummy variable of playing an instrument (1=yes) for the Logit model. 9 of the dependent regressors are binary themselves. In addition, age is measured in 4 categories with the 16-24 age category as reference group. This will show whether people are less likely to play an instrument when getting older compared to the youngest generation. Parental status has 3 categories with no live-in children as reference category. It assesses whether people with children are less likely to play an instrument due to time constraints, especially when they are lone parents.

Coding Overview

Instrument 1 if playing an instrument
Gender 1 if female
  • 16-24 (ref.)
  • 25-44
  • 45-64
  • >65
Marital status 1 if (de facto) married
Parental status
  • No live-in children (ref.)
  • Lone parent with live-in children
  • Partnered with live-in children
Education 1 if higher education
Work 1 if currently in paid work
Written music 1 if written music in the last 12 months
Read books 1 if read for pleasure in the last 12 months
Painting 1 if participated in painting, drawing, printmaking or sculpture in the last 12 months
Singing 1 if sang in front of an audience or practiced singing in the last 12 months
Childhood instrument 1 if played a music instrument, acted, danced or sang when growing up (age 11-15)

Regression Results

The table below summarizes the initial regression results obtained by estimating a Logit model in STATA, where one, two and three asterisks indicate significance at 10, 5 and 1 percent level, respectively.


Logit Model 1

Logit Model 2

 Playing an instrument OR STDE   OR STDE
female 0.34 0.04 *** 0.34 0.04 ***
age 25-44 0.67 0.14 * 0.66 0.14 *
age 45-64 0.78 0.16 0.77 0.16
age above 65 0.47 0.11 *** 0.48 0.11 ***
married 0.87 0.12 0.89 0.13
lone parent with children 0.70 0.19 0.71 0.19
partnered with children 0.79 0.15 0.80 0.15
higher education 1.52 0.19 *** 1.58 0.20 ***
paid work 0.83 0.11 0.82 0.11
read books 1.37 0.20 ** 1.39 0.20 **
painting 1.81 0.26 *** 1.76 0.26 ***
singing 4.36 0.85 *** 4.22 0.83 ***
write music 29.63 12.13 *** 29.47 12.09 ***
singing*writing music 0.16 0.09 *** 0.16 0.10 ***
instrument as child 5.25 0.75 *** 5.35 0.77 ***
_cons 0.06 0.01 *** 0.06 0.01 ***
Number of obs 3912 3827
LR chi2(15) 608.32 608.39
Prob > chi2 0.00 0
Pseudo R2 0.22 0.23

In my first model 10 regressors turn out to be significant at least at 10 percent level, of which 8 are highly significant at 1 percent level. The significant variables and their odds ratios can be interpreted as follows: In the sample females are 197.47 percent less likely to play an instrument relative to males, ceteris paribus. People in the age group 25 to 44 are 49.65 percent less likely to play an instrument compared to people aged 16 to 24 everything else being equal. People aged above 65 are even 113.83 percent less likely to play an instrument relative to people aged 16 to 24, while the age group 45 to 64 is not statistically significant. This penalty for the age group 25 to 44 might well be explained by busy schedules due to work and family commitments while the age group 45 to 64 regains more flexibility for example once children left the household. The penalty for people above 65 might derive from deteriorating health conditions, for example eyesight to read music or hearing loss. While being in paid work does not turn out to be significant, higher education does have a positive impact, i.e. people with the highest level of education are 51.76 percent more likely to play an instrument relative to their peers with lower education levels.

The next significant set of variables are other leisure activities as predictors for playing an instrument. People that read books for leisure are 37 percent more likely to play an instrument relative to people that do not read for pleasure. People that paint etc. are 81 percent more likely to play an instrument relative to people that do not. People that sing are 336 percent more likely to play an instrument, highlighting that this is a strong predictor for playing an instrument. This is a rather intuitive finding as people with a talent or interest for singing are more likely to be interested or talented in playing an instrument as well (musicianship/ musical ability). Even more influential are the variables having played an instrument, acted, danced or sung as a child as well as currently writing music as a hobby. They are very strong predictors for playing an instrument in the model.

Overall this reveals an important trend: people tend to learn an instrument mainly during childhood. They are probably either enrolled by their parents or themselves wish to learn an instrument. They then either continue this hobby in later life or drop it at some point. This seems to be the main path to learn an instrument and in later life there is more something like a demographic ‘penalty’, especially age, which reduces the probability of playing an instrument rather than incentives for adults to acquire new skills and develop their musical ability. What strikes me though is the large gender gap in the dataset after controlling for other demographic influences like parental status.

Goodness of fit

Overall the Log Likelihood Chi-square statistic with 15 degrees of freedom is 608.32 and its p-value is practically zero, meaning that we can reject the null model, which would always predict 0 (no), in favour of my model. It can be concluded that my model as a whole is statistically significant. Likewise the Homer-Lemeshow’s goodness-of-fit test accepts that the model fits the data with a Hosmer-Lemeshow Chi-square statistic of 6.97 with 8 degrees of freedom and a p-value of 0.5401. One can also take the Count R-squared and the adjusted Count R-squared into account. The former is at 0.903 while the latter is 0.121. The adjusted Count R-squared gives the proportion of correct predictions beyond the baseline model of always predicting 0. The estimated model therefore makes 12.1 percent more correct predictions. As the dataset contains a large number of non-musicians of above 90 percent interpretations of the pseudo R-squared statistics should be treated with caution and I will therefore focus on what determines playing an instrument at the margin.

Measures of Fit Model 1

Log-Lik Intercept Only: -1356.989 Log-Lik Full Model: -1052.83
D(3896): 2105.66 LR(15): 608.318
Prob > LR: 0
McFadden’s R2: 0.224 McFadden’s Adj R2: 0.212
ML (Cox-Snell) R2: 0.144 Cragg-Uhler (Nagelkerke) R2: 0.288
McKelvey & Zavoina’s R2: 0.328 Efron’s R2: 0.212
Variance of y*: 4.893 Variance of error: 3.29
Count R2: 0.903 Adj Count R2: 0.121
AIC: 0.546 AIC*n: 2137.66
BIC: -30121.288 BIC’: -484.241
BIC used by Stata: 2238.009 AIC used by Stata: 2137.66

Misspecification errors

The next step is to test my model on specification errors. It could be that the relationship is not linear or that I missed out on either a relevant regressor or a linear combination of my regressors. Note that singing and writing music is positively correlated (a person that writes music is more likely to sing as well). Therefore I already included an interaction term in case one sings and writes music in my model to avoid misspecification errors in these regards. The Linktest shows that while the _hat value is statistically significant at the 1 percent level, the _hatsq is not significant with a p-value of 0.325. It can be concluded that my model includes all relevant regressors and is correctly specified.


My model is likely to suffer from multicollinearity as at least writing music, reading books, painting and singing are similar in their nature, i.e. artistic or cultural leisure activities. This could be a source for severe multicollinearity and inflate my standard errors misleading one to conclude that regressors are in significant when they in fact need to be included. The tolerance of all regressors is greater than 0.1 which is the threshold as a rule of thumb below which one should have concerns about multicollinearity. Likewise, the variance inflating factors (VIF) are all less than 10 with a mean VIF of 1.77, which is pretty good. The highest VIFs have the age category dummies of 3.54, 3.27 and 2.92 respectively. Therefore my model’s standard errors are sufficiently robust and not significantly inflated.

Influential observations

To determine whether there are influential observations due to coding errors or other issues as well as plainly legitimate outliers which might be of interest for further study, one can use a plot of the standardized Pearson and Deviance residuals as well as leverage.

The highest outlier, as shown best in the Pearson residual index plot, is observation number 9329. Looking at the data one can see that this female is in the age range 25 to 44 and a lone parent with live-in children. She is currently in paid work and does painting in her leisure time. However, she did not play an instrument, acted, danced or sang during childhood. The model predicts that she is not playing an instrument (p=0.04) when she in fact now does so. Therefore this respondent probably started playing an instrument at a later age (after 15) despite time constraints regarding family and work and is therefore a significant positive outlier. The lowest outlier is observation number 9321. This person is between age 16 and 24, currently in paid work and also played an instrument, acted, danced or sang during childhood. This male currently writes music in his leisure time and has also done painting in the last 12 months. The model strongly predicts that this respondent plays an instrument with a probability of more than 0.93. However, this person in fact does not play an instrument. Another interesting outlier is observation number 7641. This female currently plays an instrument despite being a lone parent with live-in children. She is in the age range of 16 to 24, in paid work and has played an instrument, acted, danced or sang during childhood. The model predicts that this respondent does not play an instrument due to her time constraints (p=0.06) when she in fact does so. The last interesting observation number I want to discuss is 6314. This male is in the age range of 45 to 64 and in paid work. He is currently married but not a lone parent or partnered with live-in children. The respondent does singing and painting in his leisure time, reads books and also played an instrument, acted, danced or sang during childhood. The model predicts that this respondent plays an instrument with a probability of more than 0.90 when in fact he does not.

When using the rule of thumb that a leverage of three times the average leverage (0.044) is a threshold for influential observations, i.e. a value of greater than 0.132, the model includes 85 influential observations and 3,827 non-influential observations. However, when excluding the former in a second model, the significance of the regressors does not change. The odds ratios of some regressors do change to a small extent but the main results are the same.


Today’s exercise was all about using the Logit model in practice while shedding light on what might be a determinant of playing an instrument. First and foremost, it is driven by acquiring the skills during childhood (age 11 to 15) as well as currently writing music. This is followed by singing. There is an age penalty; as people become older they are less likely to play. Females are significantly less likely to play an instrument in the dataset. This might well derive from family and household commitments which females tend to pursue more often than males leaving them less free time to be allocated to their own hobbies, but that is my own interpretation.

Thanks for reading! I hope you enjoyed the exercise,


Department for Culture, Media and Sport. (2016). Taking Part: the National Survey of Culture, Leisure and Sport, 2014-2015; Adult and Child Data. [data collection]. UK Data Service. SN: 7872, http://dx.doi.org/10.5255/UKDA-SN-7872-1.


Leave a Reply

Fill in your details below or click an icon to log in:

WordPress.com Logo

You are commenting using your WordPress.com account. Log Out / Change )

Twitter picture

You are commenting using your Twitter account. Log Out / Change )

Facebook photo

You are commenting using your Facebook account. Log Out / Change )

Google+ photo

You are commenting using your Google+ account. Log Out / Change )

Connecting to %s