library(tidyverse)
library(DT)
library(pander)
library(mosaic)
library(car)
dat <- read_csv("https://github.com/kctolli/MATH325/raw/master/Data/HighSchoolSeniors.csv") %>% na.omit()
#Remember: select "Session, Set Working Directory, To Source File Location", and then play this R-chunk into your console to read the HSS data into R.
df <- dat %>% select(Gender, Video_Games_Hours, Social_Websites_Hours, Texting_Messaging_Hours, Computer_Use_Hours, Watching_TV_Hours)
How does electronic use time relate to gender? Do males spend more time on electronics?
\[ H_0: \mu_\text{electronic use} - \mu_\text{gender} = 0 \]
\[ H_a: \mu_\text{electronic use} - \mu_\text{gender} \neq 0 \]
Brief Glimpse (Names) of the base data set. The data is from a survey given to the high school seniors that participated in the study.
names(df)
## [1] "Gender" "Video_Games_Hours"
## [3] "Social_Websites_Hours" "Texting_Messaging_Hours"
## [5] "Computer_Use_Hours" "Watching_TV_Hours"
HSS <- df %>%
mutate(Electronic_Use = Video_Games_Hours + Social_Websites_Hours + Texting_Messaging_Hours + Computer_Use_Hours + Watching_TV_Hours) %>%
select(- Video_Games_Hours, - Social_Websites_Hours, - Texting_Messaging_Hours, - Computer_Use_Hours, - Watching_TV_Hours) %>%
filter(Electronic_Use <= 160)
Mutate all hours spent on video games, social websites, texting, computer use and watching tv by add them up and creating an electronic use variable.
names(HSS)
## [1] "Gender" "Electronic_Use"
This is the Numerical Summary used for this data set.
datatable(HSS, options=list(lengthMenu = c(5,10,50)), extensions="Responsive")
Where do the outliers lay? What gender has more outliers? Which gender’s data stays closer to mean? What gender has a higher total electronic use time per week?
ggplot(data = HSS) +
geom_boxplot(aes(x = Gender, y = Electronic_Use, color = Gender)) +
theme_bw()
ggplot(data = HSS) +
geom_col(aes(x = Gender, y = Electronic_Use, fill = Gender)) +
theme_bw()
Based on the graphical summaries my hypothesis was wrong. Since the graphs side more on the female side. Yet we aren’t for sure on what the true conclusions. We can try t-tests which will give a closer and more exact conclusion.
ttest = t.test(Electronic_Use ~ Gender, data = HSS, mu = 0, alternative = "two.sided", conf.level = 0.95)
pander(ttest)
Test statistic | df | P value | Alternative hypothesis |
---|---|---|---|
0.5338 | 302.7 | 0.5939 | two.sided |
mean in group Female | mean in group Male |
---|---|
44.22 | 42.16 |
Above is the results of an independent samples t-test. An independent samples t test is used when a value is hypothesized for the difference between two (possibly) different population means, \(\mu_1 - \mu_2\). The mean of the data for females is higher than for males. Which means that the overall and average of the electronic use is greater for females compared to males. I find this interesting since males tend to play more video games than females.
My P-value is 0.5939 which is greater than the confidence interval which is .05. So We will fail to reject the null hypothesis.