Background

What is the probability of people at certain ages served a mission? Is it an increase or decrease and you get older?

There are a few steps to answer this question.

  1. Survey people - get age and if they served a mission
  2. Collect and wrangle data
  3. Use Logistic Regression model to answer the questions
  4. Interpret data

Data Analysis

Basic Statistics

pander(favstats(Age ~ Mission, data = mission)[,-10])
Mission min Q1 median Q3 max mean sd n
0 18 18 18.5 19 23 19.21 1.762 14
1 19 21 22 23 26 22 1.688 34

Logistical Regression

mission_glm <- glm(Mission ~ Age, data = mission, family = binomial)
pandary(mission_glm)
  Estimate Std. Error z value Pr(>|z|)
(Intercept) -18.98 5.568 -3.408 0.0006547
Age 0.9688 0.2766 3.502 0.0004613

(Dispersion parameter for binomial family taken to be 1 )

Null deviance: 57.95 on 47 degrees of freedom
Residual deviance: 36.53 on 46 degrees of freedom
ggplot(mission, aes(Age,Mission)) +
  geom_point(color = "black") +
  geom_smooth(method = glm,
              method.args = list(family = binomial), 
              se = FALSE) +
  theme_bw()

hoslem.test(mission_glm$y, mission_glm$fitted, g=10)
## 
##  Hosmer and Lemeshow goodness of fit (GOF) test
## 
## data:  mission_glm$y, mission_glm$fitted
## X-squared = 9.4936, df = 8, p-value = 0.3024

Interpretation

From the regression we learn that as you get older the odds you served a mission increase by 0.9688. There are a few interesting insights we learn from the data and regression. With COVID-19 sending missionaries home can cause 19 year olds to be returned missionaries. We can see the probabilities of each age using the predict function in R.

pander(predict(mission_glm, data.frame(Age = c(18, 19, 20, 21, 22, 23, 24, 25)), type = "response"))
1 2 3 4 5 6 7 8
0.1769 0.3616 0.5988 0.7973 0.912 0.9647 0.9863 0.9948