Background

Five different Math 221 courses at BYU-Idaho were given a brief start of semester survey. Two of the variables collected gave information about each student’s class rank (Freshman, Sophomore, Junior, and Senior) and their off-track hourly wage (in U.S. Dollars). The survey data will be used to answer the following questions.

Note that the responses from two students were removed from the Class Survey data because they did not specify their class rank. Also, one other student’s responses were removed because they claimed an hourly wage of $100 an hour, which was likely a typo that was supposed to be $10 an hour.

ClassSurvey <- read.csv("../Data/ClassSurvey.csv", header=TRUE)
# Note that to get the ClassSurvey data into your Console
# you should go to "Import Dataset" under the "Environment" tab
# in the top right of your workspace. 

# There are a few problems in the data that need to be fixed
# before we work with the data. Run the following code to

# xyplot(Wage ~ Rank, data=ClassSurvey, type=c("p"))

# see that there are two observations that don't have a class
# rank recorded and that one person supposedly earned $100 an
# hour, which is probably a typo and should be deleted.

# Filter out the outlier:
ClassSurvey <- subset(ClassSurvey, Wage<100)
# xyplot(Wage ~ Rank, data=ClassSurvey, type=c("p"))

# Filter out the missing Rank values:
ClassSurvey <- droplevels(subset(ClassSurvey, Rank %in% c("FR","JR","SO","SR")))
# xyplot(Wage ~ Rank, data=ClassSurvey, type=c("p"))

How much do BYU-Idaho students make hourly during their off-track?

Do they earn more as they gain more education?

Analysis

Side-by-side boxplots show that in the sample, the Freshman have the highest median wage, while the Sophomores and Seniors have fairly right-skewed distributions with some very high outliers. This suggest that the shape of the distribution of wages is potentially different for the various class ranks. It may be the case that Freshman hourly wages are left skewed, Junior wages are fairly normal, and Sophomore and Senior wages are right-skewed.

boxplot(Wage ~ as.character(Rank), data=ClassSurvey, type=c("p","a"), col='grey', ylab="Hourly Wage", main="Math 221 Students")

\[ H_0: \text{All samples represent a sample of data from the same distribution.} \] \[ H_a: \text{At least one distribution is stochastically different than the others.} \]

According to the original authors, what the alternative to the Kruskal-Wallis test really is, “is a tendency for observations in at least one of the populations to be larger (or smaller) than all the observations together, when paired randomly. In many cases, this is practically equivalent to the mean of at least one population differing from the others.”

kruskal.test(Wage ~ Rank, data=ClassSurvey)

    Kruskal-Wallis rank sum test

data:  Wage by Rank
Kruskal-Wallis chi-squared = 2.3262, df = 3, p-value = 0.5075
pander(favstats(Wage ~ Rank, data=ClassSurvey)[,-10])
Rank min Q1 median Q3 max mean sd n
FR 1 9.03 11.1 13.12 15 10.47 4.475 8
JR 3.25 8.25 9.375 12 17 10.09 2.727 56
SO 7.25 8.175 9 11.48 30 10.53 4.2 71
SR 4.62 8.135 8.875 10.34 26.25 10.22 3.898 38

There is insufficient evidence to reject the null hypothesis that each of these represents a sample from the same distribution. We will continue to assume that the hourly wage distributions are the same across all class ranks.

Interpretation

Since all samples can be assumed to be from the same distribution, Class Rank really has no apparent effect on hourly wages. So students in general aren’t earning more or less during their off track as they progress through their college education.

It sufficies then to simply understand the off-track hourly wage of BYU-Idaho students as a whole since Class Rank has no apparent effect on hourly wages. The following histogram summarizes the relevant information. Oddly, some students are reporting earning $5 or less an hour. Most state minimum wages are above $7.25 an hour. So this is surprising. Typically students are earning around $9 (the median) an hour, although some are doing quite well, all the way up to $30 an hour!

hist(ClassSurvey$Wage, breaks=15, col='sandybrown', xlab="Hourly Wage", main="BYU-Idaho Math 221 Students")

pander(favstats(ClassSurvey$Wage)[-9], caption="Numerical Summaries of Hourly Wages")
Numerical Summaries of Hourly Wages
min Q1 median Q3 max mean sd n
1 8.2 9 11.67 30 10.32 3.698 173