datatable(challeng)
As described in the video, it was known that the o-rings could fail when the tempurature was below 53° F. It then stated that “Nasa had already launched shuttles below 53° and gotten away with it.” This second statement is actually false. The lowest temperature of any of the 23 prior launches (before the Challenger explosion) was 53° F [view source]. The “evidence” that the o-rings could fail below 53° was based on a simple conclusion that since the launch at 53° experienced two o-ring failures, it seemed unwise to launch below that temperature. The statement that NASA had launched previously with o-ring failures occurring, but the launch still being successful was true, but all launches had a temperature of at least 53° F. Thus, whether or not temperature was the root cause of the failures was debatable.
To model the probability of the o-rings failing at various temperatures, we could apply the logistic regression model \[ P(Y_i = 1|x_i) = \frac{e^{\beta_0+\beta_1 x_i}}{1+e^{\beta_0 + \beta_1 x_i}} = \pi_i \] where for observation \(i\):
Note that if \(\beta_1\) is zero in the above model, then \(x_i\) (temperature) provides no insight about the probability of a failed O-ring. Thus, we could test the hypothesis that
\[ H_0: \beta_1 = 0 \\ H_a: \beta_1 \neq 0 \]
To obtain estimates of the coefficients \(\beta_0\) and \(\beta_1\) for the challeng
data, we apply the following R code.
chall.glm <- glm(Fail>0 ~ Temp, data=challeng, family=binomial)
summary(chall.glm)
##
## Call:
## glm(formula = Fail > 0 ~ Temp, family = binomial, data = challeng)
##
## Deviance Residuals:
## Min 1Q Median 3Q Max
## -1.0611 -0.7613 -0.3783 0.4524 2.2175
##
## Coefficients:
## Estimate Std. Error z value Pr(>|z|)
## (Intercept) 15.0429 7.3786 2.039 0.0415 *
## Temp -0.2322 0.1082 -2.145 0.0320 *
## ---
## Signif. codes: 0 '***' 0.001 '**' 0.01 '*' 0.05 '.' 0.1 ' ' 1
##
## (Dispersion parameter for binomial family taken to be 1)
##
## Null deviance: 28.267 on 22 degrees of freedom
## Residual deviance: 20.315 on 21 degrees of freedom
## AIC: 24.315
##
## Number of Fisher Scoring iterations: 5
Thus the estimated model for \(\pi_i\) is given by \[
P(Y_i = 1|x_i) \approx \frac{e^{15.043-0.232 x_i}}{1+e^{15.043 - 0.232 x_i}} = \hat{\pi}_i
\] where \(b_0 = 15.043\) is the value of the (Intercept)
which estimates \(\beta_0\) and \(b_1 = -0.232\) is the value of Temp
which estimates \(\beta_1\).
Importantly, the \(p\)-value for the test of Temp
shows a significant result \((p = 0.0320)\) giving sufficient evidence to conclude that \(\beta_1 \neq 0\), which allows us to conclude that temperature effects the probability of an O-ring failure.
To visualize this simple logistic regression we could make the following plot.
plot( Fail>0 ~ Temp, data=challeng, main="", ylab='Probability of O-rings Failing', pch=16)
curve(exp(15.043-0.232*x)/(1+exp(15.043-0.232*x)), from=53, to=85, add=TRUE)
To demonstrate that the logistic regression is a good fit to these data we apply the Hosmer-Lemeshow goodness of fit test (since there are only a couple repeated \(x\)-values) from the library(ResourceSelection)
.
library(ResourceSelection)
hoslem.test(chall.glm$y, chall.glm$fitted, g=6)
##
## Hosmer and Lemeshow goodness of fit (GOF) test
##
## data: chall.glm$y, chall.glm$fitted
## X-squared = 7.4118, df = 4, p-value = 0.1157
# Note: doesn't give a p-value for g >= 7, default is g=10.
# Larger g is usually better than smaller g.
Recall that the null hypothesis is that the logistic regression is a good fit for the data, thus we conclude the null, and claim that the logistic regression is appropriate.
Since the temperature being zero is not really realistic for this model, the value of \(e^{b_0}\) is not interpretable. However, the value of \(e^{b_1} = e^{-0.232} \approx 0.79\) shows that the odds of the o-rings failing for a given launch decreases by a factor of 0.79 for every 1° F increase in temperature. (Note that this implies that every 1° F decrease in temperature increases the odds of a failed o-ring by a factor of \(e^{0.232} \approx 1.26\).) The Challenger shuttle was launched at a tempurature of 31° F. By waiting until 53° F, the odds of failure would have been decreased by a factor of \(e^{-0.232(53-31)}\approx 0.006\). Note that for a temperature of 31° F our model puts the probability of a failure at \[ P(Y_i = 1|x_i) \approx \frac{e^{15.043-0.232\cdot 31}}{1+e^{15.043 - 0.232 \cdot 31}} = \hat{\pi}_i \] which, using R to do this calculation we get \(\hat{\pi_i} \approx\)
predict(chall.glm, data.frame(Temp=31), type='response')
## 1
## 0.9996088