# Use this R-Chunk to import all your datasets!
Guns <- read_csv("https://github.com/kctolli/MATH335/raw/master/Case_Studies/Case_Study_04/analysis/full_data.csv") %>%
select(-X1) %>% na.omit() %>%
filter(!is.na(intent)) %>% filter(!is.na(age)) %>%
select(intent, age, race, education, sex)
View(Guns)
The world is a dangerous place. During 2015 and 2016 there was a lot of discussion in the news about police shootings. FiveThirtyEight reported on gun deaths in 2016. As leaders in data journalism, they have posted a clean version of this data in their GitHub repo called full_data.csv for us to use. The data of the gun deaths was recorded over the coarse of two years (2012-2014).
\[ H_{01}:\ \text{Intent and education are not associated.} \]
\[ H_{a1}:\ \text{Intent and education are associated.} \]
# Test H_{01}:
gun1 <- table(setDT(Guns %>% select(intent, education)))
chi.gun1 <- chisq.test(gun1); chi.gun1
##
## Pearson's Chi-squared test
##
## data: gun1
## X-squared = 7630.7, df = 9, p-value < 2.2e-16
pander5(chi.gun1$expected)
BA+ | HS/GED | Less than HS | Some college | |
---|---|---|---|---|
Accidental | TRUE | TRUE | TRUE | TRUE |
Homicide | TRUE | TRUE | TRUE | TRUE |
Suicide | TRUE | TRUE | TRUE | TRUE |
Undetermined | TRUE | TRUE | TRUE | TRUE |
All expected counts are greater than 5, so the requirements are met.
pander(chi.gun1$residuals)
BA+ | HS/GED | Less than HS | Some college | |
---|---|---|---|---|
Accidental | -4.415 | -2.132 | 7.611 | -1.198 |
Homicide | -43.62 | 5.094 | 48.85 | -22.21 |
Suicide | 32.74 | -3.265 | -37.17 | 16.39 |
Undetermined | -1.146 | -1.058 | 1.938 | 0.4351 |
The residuals allow us to visualize how the points follow the test. The values show how far off the data is from the expected values.
barplot(gun1, beside=TRUE, legend.text=TRUE, xlab="Education", main="Education vs Intent")
\[ H_{02}:\ \text{Intent and race are not associated.} \]
\[ H_{a2}:\ \text{Intent and race are associated.} \]
# Test H_{02}:
gun2 <- table(setDT(Guns %>% select(race, intent)))
chi.gun2 <- chisq.test(gun2); chi.gun2
##
## Pearson's Chi-squared test
##
## data: gun2
## X-squared = 40944, df = 12, p-value < 2.2e-16
pander5(chi.gun2$expected)
Accidental | Homicide | Suicide | Undetermined | |
---|---|---|---|---|
Asian/Pacific Islander | TRUE | TRUE | TRUE | TRUE |
Black | TRUE | TRUE | TRUE | TRUE |
Hispanic | TRUE | TRUE | TRUE | TRUE |
Native American/Native Alaskan | TRUE | TRUE | TRUE | TRUE |
White | TRUE | TRUE | TRUE | TRUE |
All expected counts are greater than 5, so the requirements are met.
pander(chi.gun2$residuals)
Accidental | Homicide | Suicide | Undetermined | |
---|---|---|---|---|
Asian/Pacific Islander | -1.888 | 4.163 | -2.734 | -0.07923 |
Black | -3 | 128.1 | -92.68 | -4.594 |
Hispanic | 0.1469 | 43.33 | -31.75 | 0.2446 |
Native American/Native Alaskan | 1.767 | -0.1479 | -0.4653 | 2.568 |
White | 1.782 | -92.25 | 66.93 | 2.344 |
The residuals allow us to visualize how the points follow the test. The values show how far off the data is from the expected values.
barplot(gun2, beside=TRUE, legend.text=TRUE, xlab="Intent", main="Intent vs Race")
\[ H_{03}:\ \text{Education and race are not associated.} \]
\[ H_{a3}:\ \text{Education and race are associated.} \]
# Test H_{03}:
gun3 <- table(setDT(Guns %>% select(race, education)))
chi.gun3 <- chisq.test(gun3); chi.gun3
##
## Pearson's Chi-squared test
##
## data: gun3
## X-squared = 8600.2, df = 12, p-value < 2.2e-16
pander5(chi.gun3$expected)
BA+ | HS/GED | Less than HS | Some college | |
---|---|---|---|---|
Asian/Pacific Islander | TRUE | TRUE | TRUE | TRUE |
Black | TRUE | TRUE | TRUE | TRUE |
Hispanic | TRUE | TRUE | TRUE | TRUE |
Native American/Native Alaskan | TRUE | TRUE | TRUE | TRUE |
White | TRUE | TRUE | TRUE | TRUE |
All expected counts are greater than 5, so the requirements are met.
pander(chi.gun3$residuals)
BA+ | HS/GED | Less than HS | Some college | |
---|---|---|---|---|
Asian/Pacific Islander | 15.79 | -7.705 | -5.113 | 3.691 |
Black | -39.82 | 7.474 | 35.49 | -15.13 |
Hispanic | -21.84 | -10.05 | 43.52 | -12.5 |
Native American/Native Alaskan | -7.017 | 0.07508 | 4.32 | 1.013 |
White | 30.17 | 0.3076 | -36.7 | 12.89 |
The residuals allow us to visualize how the points follow the test. The values show how far off the data is from the expected values.
barplot(gun3, beside=TRUE, legend.text=TRUE, xlab="Education", main="Education vs Race")
With each of the of the tests we learn what data is like for the each of the data. For the 3 tests the p-value is the same at < 2.2e-16. Each of the tests did pass the greater than 5 test (If this failed, it will still be appropriate as long as all expected counts are at least 1 and the average expected count is at least 5).