Task 3

Background

Readings

Tasks

Reading Notes

In Class Notes

Data Manipulation Commands

Data Questions

Embedded Systems
MATH/CS 335
Feedback
Sharing with Data Scientist

Background

Learning how to ask interesting questions takes time. As data scientists we need to learn how to ask questions that data can answer. This task supports your semester project. Note that the reading on data transformation below is necessary for the case study for this week and may be the most important reading of the semester to fully understand.

Readings

Computational Thinking
Optional Reading for new programmers
Creating Questions for your project (watch the videos that are free)
Questions and data science
Chapter 5: R for Data Science - Data transformation

Tasks

[X] Take notes on your reading of the specified ‘R for Data Science’ chapter in the README.md or in a ‘.R’ script in the class task folder

[X] Develop a few novel questions that data can answer

Get feedback from 5-10 people on their interest in your questions and summarize this feedback
Find other examples of people addressing your question
Present your question to a data scientist to get feedback on the quality of the question and if it can be addressed in 2-months.

[X] Create one .rmd file that has your report

Have a section for each question

[X] Be prepared to discuss your results in the upcoming class

Reading Notes

ggplot – implements the grammar of graphics, a coherent system for describing and building graphs

geom_col(mapping = NULL, data = NULL, position = “stack”, …, width = NULL, na.rm = FALSE, show.legend = NA, inherit.aes = TRUE)

geom_line(mapping = NULL, data = NULL, stat = “identity”, position = “identity”, na.rm = FALSE, show.legend = NA, inherit.aes = TRUE, …)

library(tidyverse) includes ggplot2, tibble, tidyr, readr, purr, dplyr, stringr, and forcats

ggplot set up –

ggplot(data = ) + (mapping = aes())
ggplot(data = ) + ( mapping = aes(), stat = , position = ) + +

Aesthetic

visual property of the objects in your plot
include things like the size, the shape, or the color of your points

Facets

particularly useful for categorical variables
subplots that each display one subset of the data
facet_wrap()/facet_grid() - first argument should be a formula created with ~

Geom

Geometrical object that a plot uses to represent data

Tibble

Data frames
Wrangle

Filter

Allows you to subset observations based on their values
The first argument is the name of the data frame
NA – not availables – unknown value

Arrange

Similar to filter()
Except that instead of selecting rows, it changes their order

Select

Allows you to rapidly zoom in on a useful subset using operations based on the names of the variables

Mutate

Always adds new columns at the end of your dataset so we’ll start by creating a narrower dataset so we can see the new variables

In Class Notes

%>% – Means “Then”, This is a pipe

Data Manipulation Commands

Filter - filter your data to a smaller set of important rows.
Sort - Organize the row order of my data
Select - select specific columns to keep or remove
Mutate - add new mutated (changed) variables as columns to my data.
Summarise - build summaries of the columns specified
Group by - divide your data into groups. Often used with summarise
Stack - convert data from “wide” to “long” format by moving column names into a key column, gathering the column values into a single value column
Unstack - convert data from “long” to “wide” format by using unique values of a key column as new column names, and values being placed in one of the new columns based on its key value in the original data structure
Seperate - parse, or break apart, each cell into several cells (usually applied to columns, but can be applied row wise as well)
Unite - collapse or combine cells across several columns to make a single column

Use dplyr and tidyr — Part of tidyverse

Data Questions

Embedded Systems

What radiation levels switch the states of the FPGA?

What causes Radiation?
Pressure differences?

Most used Embedded System in industry?
Most used language in Embedded Systems (other than Embedded C)?
Most common used social engineering technique?

MATH/CS 335

What is the average grade of the students that attend RLab?
Are student that go to RLab higher then then those that don’t?
Grade earned compared to Major?
Does those that have taken MATH/CS 335 do better in CS 450?
Does those that have taken MATH 325 do better in MATH/CS 335?

Feedback

Ben said that number 3 would be the easiest.
Jacob said that number 2 would be cool.
Seth said to talk to Hathaway and figure out if the data can be used
Braden said the questions are great.
Jasean said would the data science questions are great.

Task 3

Asking the right questions

Kyle Tolliver

March 16, 2020

Background

Readings

Tasks

Reading Notes

In Class Notes

Data Manipulation Commands

Data Questions

Embedded Systems

MATH/CS 335

Feedback

Task 3

Asking the right questions

Kyle Tolliver

March 16, 2020

Background

Readings

Tasks

Reading Notes

In Class Notes

Data Manipulation Commands

Data Questions

Embedded Systems

MATH/CS 335

Feedback

Sharing with Data Scientist