# Use this R-Chunk to import all your datasets!
Guns <- read_csv("https://github.com/kctolli/MATH335/raw/master/Case_Studies/Case_Study_04/analysis/full_data.csv") %>% 
  select(-X1) %>% na.omit() %>% 
  filter(!is.na(intent)) %>% filter(!is.na(age)) %>% 
  select(intent, age, race, education, sex)
View(Guns)

Background

The world is a dangerous place. During 2015 and 2016 there was a lot of discussion in the news about police shootings. FiveThirtyEight reported on gun deaths in 2016. As leaders in data journalism, they have posted a clean version of this data in their GitHub repo called full_data.csv for us to use. The data of the gun deaths was recorded over the coarse of two years (2012-2014).

Hypotheses & Data Analysis

Is intent associated with education regardless of race?

\[ H_{01}:\ \text{Intent and education are not associated.} \]

\[ H_{a1}:\ \text{Intent and education are associated.} \]

# Test H_{01}:
gun1 <- table(setDT(Guns %>% select(intent, education)))
chi.gun1 <- chisq.test(gun1); chi.gun1
## 
##  Pearson's Chi-squared test
## 
## data:  gun1
## X-squared = 7630.7, df = 9, p-value < 2.2e-16
pander5(chi.gun1$expected)
  BA+ HS/GED Less than HS Some college
Accidental TRUE TRUE TRUE TRUE
Homicide TRUE TRUE TRUE TRUE
Suicide TRUE TRUE TRUE TRUE
Undetermined TRUE TRUE TRUE TRUE

All expected counts are greater than 5, so the requirements are met.

pander(chi.gun1$residuals)
  BA+ HS/GED Less than HS Some college
Accidental -4.415 -2.132 7.611 -1.198
Homicide -43.62 5.094 48.85 -22.21
Suicide 32.74 -3.265 -37.17 16.39
Undetermined -1.146 -1.058 1.938 0.4351

The residuals allow us to visualize how the points follow the test. The values show how far off the data is from the expected values.

barplot(gun1, beside=TRUE, legend.text=TRUE, xlab="Education", main="Education vs Intent")



Is intent associated with race?

\[ H_{02}:\ \text{Intent and race are not associated.} \]

\[ H_{a2}:\ \text{Intent and race are associated.} \]

# Test H_{02}:
gun2 <- table(setDT(Guns %>% select(race, intent)))
chi.gun2 <- chisq.test(gun2); chi.gun2
## 
##  Pearson's Chi-squared test
## 
## data:  gun2
## X-squared = 40944, df = 12, p-value < 2.2e-16
pander5(chi.gun2$expected)
  Accidental Homicide Suicide Undetermined
Asian/Pacific Islander TRUE TRUE TRUE TRUE
Black TRUE TRUE TRUE TRUE
Hispanic TRUE TRUE TRUE TRUE
Native American/Native Alaskan TRUE TRUE TRUE TRUE
White TRUE TRUE TRUE TRUE

All expected counts are greater than 5, so the requirements are met.

pander(chi.gun2$residuals)
  Accidental Homicide Suicide Undetermined
Asian/Pacific Islander -1.888 4.163 -2.734 -0.07923
Black -3 128.1 -92.68 -4.594
Hispanic 0.1469 43.33 -31.75 0.2446
Native American/Native Alaskan 1.767 -0.1479 -0.4653 2.568
White 1.782 -92.25 66.93 2.344

The residuals allow us to visualize how the points follow the test. The values show how far off the data is from the expected values.

barplot(gun2, beside=TRUE, legend.text=TRUE, xlab="Intent", main="Intent vs Race")



Is race associated with education?

\[ H_{03}:\ \text{Education and race are not associated.} \]

\[ H_{a3}:\ \text{Education and race are associated.} \]

# Test H_{03}:
gun3 <- table(setDT(Guns %>% select(race, education)))
chi.gun3 <- chisq.test(gun3); chi.gun3
## 
##  Pearson's Chi-squared test
## 
## data:  gun3
## X-squared = 8600.2, df = 12, p-value < 2.2e-16
pander5(chi.gun3$expected)
  BA+ HS/GED Less than HS Some college
Asian/Pacific Islander TRUE TRUE TRUE TRUE
Black TRUE TRUE TRUE TRUE
Hispanic TRUE TRUE TRUE TRUE
Native American/Native Alaskan TRUE TRUE TRUE TRUE
White TRUE TRUE TRUE TRUE

All expected counts are greater than 5, so the requirements are met.

pander(chi.gun3$residuals)
  BA+ HS/GED Less than HS Some college
Asian/Pacific Islander 15.79 -7.705 -5.113 3.691
Black -39.82 7.474 35.49 -15.13
Hispanic -21.84 -10.05 43.52 -12.5
Native American/Native Alaskan -7.017 0.07508 4.32 1.013
White 30.17 0.3076 -36.7 12.89

The residuals allow us to visualize how the points follow the test. The values show how far off the data is from the expected values.

barplot(gun3, beside=TRUE, legend.text=TRUE, xlab="Education", main="Education vs Race")



Interpretation

With each of the of the tests we learn what data is like for the each of the data. For the 3 tests the p-value is the same at < 2.2e-16. Each of the tests did pass the greater than 5 test (If this failed, it will still be appropriate as long as all expected counts are at least 1 and the average expected count is at least 5).