# Use this R-Chunk to import all your datasets!
# Filtered the data to be useable
airport <- nycflights13::airports %>%
filter(name != "NA")
airline <- nycflights13::airlines %>%
filter(name != "NA")
flight <- nycflights13::flights %>%
filter(arr_time != "NA") %>%
filter(dep_time != "NA") %>%
filter(dep_delay != "NA") %>%
filter(air_time != "NA") %>%
filter(carrier != "NA") %>%
filter(origin != "NA")
View(flight)
You just started your internship at a big firm in New York, and your manager gave you an extensive file of flights that departed JFK, LGA, or EWR in 2013. From this data (nycflights13::flights), which you can obtain in R (install.packages(“nycflights13”); library(nycflights13)), your manager wants you to answer the following questions;
[X] Address at least two of the three questions in the background description (if you have time try to tackle all three)
[X] Make sure to include one or more visualization that shows the complexity of the data.
[X] Create one .rmd file that has your report
[X] Push your .Rmd, .md, and .html to your GitHub repo
[X] Be prepared to discuss your analysis in the upcoming class
[X] Complete the recommended reading on posting issues.
[X] Find two other student’s compiled files in their repository and provide feedback using the issues feature in GitHub (Late due to computer problems)
(If they already have three issues find a different student to critique)
[ ] Address 1-2 of the issues posted on your project and push the updates to GitHub
My results answer two questions from using the nycflight13 library. I answered “Which destination airport is the worst (you decide on the metric for worst) airport for arrival time,” and “Which origin airport is best to minimize my chances of a late arrival when I am using Delta Airlines?” I was able to learn more about using ggplot and other features in tidyverse, such as group_by() and summarise() in this Case Study.
flight %>%
filter(arr_time <= 8) %>%
group_by(dest) %>%
summarise(arr = mean(arr_time)) %>%
ggplot() +
geom_point(aes(x = dest, y = arr)) +
theme_minimal() +
theme(axis.text.x = element_text(angle = 90)) +
labs(title = "Worst Arrival Time", x = "Destination", y = "Arrive Time")
flight %>%
filter(carrier == "DL") %>%
group_by(origin, dest) %>%
summarise(delayarr = mean(arr_delay), minarr_delay = min(arr_delay), maxarr_delay = max(arr_delay)) %>%
ggplot() +
geom_point(aes(x = minarr_delay, y = delayarr)) +
facet_wrap(~ origin) +
labs(title = "Origin Airport to Min late Arrival", subtitle = "Delta Airlines", x = "Min Arrival Delay", y = "Average Arrival Delay") +
theme_minimal()