Chi Squared Test

# Use this R-Chunk to import all your datasets!
Guns <- read_csv("https://github.com/kctolli/MATH335/raw/master/Case_Studies/Case_Study_04/analysis/full_data.csv") %>% 
  select(-X1) %>% na.omit() %>% 
  filter(!is.na(intent)) %>% filter(!is.na(age)) %>% 
  select(intent, age, race, education, sex)

View(Guns)

Background

The world is a dangerous place. During 2015 and 2016 there was a lot of discussion in the news about police shootings. FiveThirtyEight reported on gun deaths in 2016. As leaders in data journalism, they have posted a clean version of this data in their GitHub repo called full_data.csv for us to use. The data of the gun deaths was recorded over the coarse of two years (2012-2014).

Hypotheses & Data Analysis

Is intent associated with education regardless of race?

\[ H_{01}:\ \text{Intent and education are not associated.} \]

\[ H_{a1}:\ \text{Intent and education are associated.} \]

# Test H_{01}:
gun1 <- table(setDT(Guns %>% select(intent, education)))
chi.gun1 <- chisq.test(gun1); chi.gun1

## 
##  Pearson's Chi-squared test
## 
## data:  gun1
## X-squared = 7630.7, df = 9, p-value < 2.2e-16

pander5(chi.gun1$expected)

	BA+	HS/GED	Less than HS	Some college
Accidental	TRUE	TRUE	TRUE	TRUE
Homicide	TRUE	TRUE	TRUE	TRUE
Suicide	TRUE	TRUE	TRUE	TRUE
Undetermined	TRUE	TRUE	TRUE	TRUE

All expected counts are greater than 5, so the requirements are met.

pander(chi.gun1$residuals)

	BA+	HS/GED	Less than HS	Some college
Accidental	-4.415	-2.132	7.611	-1.198
Homicide	-43.62	5.094	48.85	-22.21
Suicide	32.74	-3.265	-37.17	16.39
Undetermined	-1.146	-1.058	1.938	0.4351

The residuals allow us to visualize how the points follow the test. The values show how far off the data is from the expected values.

barplot(gun1, beside=TRUE, legend.text=TRUE, xlab="Education", main="Education vs Intent")

Is intent associated with race?

\[ H_{02}:\ \text{Intent and race are not associated.} \]

\[ H_{a2}:\ \text{Intent and race are associated.} \]

# Test H_{02}:
gun2 <- table(setDT(Guns %>% select(race, intent)))
chi.gun2 <- chisq.test(gun2); chi.gun2

## 
##  Pearson's Chi-squared test
## 
## data:  gun2
## X-squared = 40944, df = 12, p-value < 2.2e-16

pander5(chi.gun2$expected)

	Accidental	Homicide	Suicide	Undetermined
Asian/Pacific Islander	TRUE	TRUE	TRUE	TRUE
Black	TRUE	TRUE	TRUE	TRUE
Hispanic	TRUE	TRUE	TRUE	TRUE
Native American/Native Alaskan	TRUE	TRUE	TRUE	TRUE
White	TRUE	TRUE	TRUE	TRUE

All expected counts are greater than 5, so the requirements are met.

pander(chi.gun2$residuals)

	Accidental	Homicide	Suicide	Undetermined
Asian/Pacific Islander	-1.888	4.163	-2.734	-0.07923
Black	-3	128.1	-92.68	-4.594
Hispanic	0.1469	43.33	-31.75	0.2446
Native American/Native Alaskan	1.767	-0.1479	-0.4653	2.568
White	1.782	-92.25	66.93	2.344

The residuals allow us to visualize how the points follow the test. The values show how far off the data is from the expected values.

barplot(gun2, beside=TRUE, legend.text=TRUE, xlab="Intent", main="Intent vs Race")

Is race associated with education?

\[ H_{03}:\ \text{Education and race are not associated.} \]

\[ H_{a3}:\ \text{Education and race are associated.} \]

# Test H_{03}:
gun3 <- table(setDT(Guns %>% select(race, education)))
chi.gun3 <- chisq.test(gun3); chi.gun3

## 
##  Pearson's Chi-squared test
## 
## data:  gun3
## X-squared = 8600.2, df = 12, p-value < 2.2e-16

pander5(chi.gun3$expected)

	BA+	HS/GED	Less than HS	Some college
Asian/Pacific Islander	TRUE	TRUE	TRUE	TRUE
Black	TRUE	TRUE	TRUE	TRUE
Hispanic	TRUE	TRUE	TRUE	TRUE
Native American/Native Alaskan	TRUE	TRUE	TRUE	TRUE
White	TRUE	TRUE	TRUE	TRUE

All expected counts are greater than 5, so the requirements are met.

pander(chi.gun3$residuals)

	BA+	HS/GED	Less than HS	Some college
Asian/Pacific Islander	15.79	-7.705	-5.113	3.691
Black	-39.82	7.474	35.49	-15.13
Hispanic	-21.84	-10.05	43.52	-12.5
Native American/Native Alaskan	-7.017	0.07508	4.32	1.013
White	30.17	0.3076	-36.7	12.89

The residuals allow us to visualize how the points follow the test. The values show how far off the data is from the expected values.

barplot(gun3, beside=TRUE, legend.text=TRUE, xlab="Education", main="Education vs Race")

Interpretation

With each of the of the tests we learn what data is like for the each of the data. For the 3 tests the p-value is the same at < 2.2e-16. Each of the tests did pass the greater than 5 test (If this failed, it will still be appropriate as long as all expected counts are at least 1 and the average expected count is at least 5).