There are many ways to display data. The fundamental idea is that the graphical depiction of data should communicate the truth the data has to offer about the situation of interest.
1 Quantitative Variable
Great for showing the distribution of data for a single quantitative variable when the sample size is large. Dotplots are a good alternative for smaller sample sizes. Gives a good feel for the mean and standard deviation of the data.
To make a histogram in R use the function:
hist(object)
object
must be quantitative data. R refers to this as a “numeric vector.”Type ?hist
in your R Console to open the help file in R.
Example Code
Hover your mouse over the example codes to learn more.
Click to see what they do.
hist An R function “hist” used to create a histogram. ( Parenthesis to begin the function. Must touch the last letter of the function. airquality “airquality” is a dataset. Type “View(airquality)” in R to see it. $ The $ allows us to access any variable from the airquality dataset. Temp “Temp” is a quantitative variable (numeric vector) from the “airquality” dataset. )
Closing parenthsis for the hist function.
Press Enter to run the code. … Click to View Output.
hist(airquality$Temp, This code was explained in the first example code. col=“skyblue” col= allows us to specify the color of the plot using a named color. The name of the color must be placed in quotations. Type “colors()” in R to see color options. ) Functions always end with a closing parenthesis.
Press Enter to run the code. … Click to View Output.
hist(airquality$Temp This part was explained in the first example code. , The comma allows us to specify optional commands to the function. The space after the comma is not required. It just looks nice. col=“skyblue”col= allows us to specify the color of the plot using a named color. The name of the color must be placed in quotations. Type “colors()” in R to see color options. , A comma must always be used to separate additional commands. xlab=“Temperature” xlab= stands for “x label.” Use it to specify the text to print on the plot under the x-axis. The desired text must always be in quotations. , A comma must always be used to separate additional commands. main=“La Guardia Airport Daily Mean Temperatures” main= lets us specify the “main” title to be placed above the plot. The desired text must always be placed in quotations. ) Functions must always end with a closing parenthesis.
Press Enter to run the code. … Click to View Output.
To make a histogram in R using the ggplot approach, first ensure
library(ggplot2)
is loaded. Then,
ggplot(data, aes(x=column)) +
geom_histogram()
data
is the name of your dataset.column
is a column of data from your dataset that is quantitative.aes(x= )
is how you tell the gpplot to make the x-axis become your column
of data.geom_histogram()
causes the ggplot to become a histogram.Example Code
Hover your mouse over the example codes to learn more.
Click to see what they do.
ggplot An R function “ggplot” used to create a framework for a graphic that will have elements added to it with the +
sign. ( Parenthesis to begin the function. Must touch the last letter of the function. airquality “airquality” is a dataset. Type “View(airquality)” in R to see it. , The comma allows us to specify optional commands to the function. The space after the comma is not required. It just looks nice. aes( The aes
or “aesthetics” function allows you to tell the ggplot how it should appear. This includes things like what the x-axis or y-axis should become. x=Temp “x=” declares which variable will become the x-axis of the graphic. )
Closing parenthsis for the aes function. )
Closing parenthsis for the ggplot function. + The addition symbol +
is used to add further elements to the ggplot.
geom_histogram() The “geom_histogram()” function causes the ggplot to become a histogram. There are many other “geom_” functions that could be used.
Press Enter to run the code. … Click to View Output.
ggplot An R function “ggplot” used to create a framework for a graphic that will have elements added to it with the +
sign. ( Parenthesis to begin the function. Must touch the last letter of the function. airquality “airquality” is a dataset. Type “View(airquality)” in R to see it. , The comma allows us to specify optional commands to the function. The space after the comma is not required. It just looks nice. aes( The aes
or “aesthetics” function allows you to tell the ggplot how it should appear. This includes things like what the x-axis or y-axis should become. x=Temp “x=” declares which variable will become the x-axis of the graphic. )
Closing parenthsis for the aes function. )
Closing parenthsis for the ggplot function. + The addition symbol +
is used to add further elements to the ggplot.
geom_histogram( The “geom_histogram()” function causes the ggplot to become a histogram. There are many other “geom_” functions that could be used. binwidth=5, The “binwidth” command controls the width of the bars in the histogram. fill=“skyblue”, The “fill” command controls the color of the insides of each bar. color=“black” The “color” command controls the color of the edges of each bar. )
Closing parenthsis for the geom_histogram function.
Press Enter to run the code. … Click to View Output.
ggplot An R function “ggplot” used to create a framework for a graphic that will have elements added to it with the +
sign. ( Parenthesis to begin the function. Must touch the last letter of the function. airquality “airquality” is a dataset. Type “View(airquality)” in R to see it. , The comma allows us to specify optional commands to the function. The space after the comma is not required. It just looks nice. aes( The aes
or “aesthetics” function allows you to tell the ggplot how it should appear. This includes things like what the x-axis or y-axis should become. x=Temp “x=” declares which variable will become the x-axis of the graphic. )
Closing parenthsis for the aes function. )
Closing parenthsis for the ggplot function. + The addition symbol +
is used to add further elements to the ggplot.
geom_histogram( The “geom_histogram()” function causes the ggplot to become a histogram. There are many other “geom_” functions that could be used. binwidth=5, The “binwidth” command controls the width of the bars in the histogram. fill=“skyblue”, The “fill” command controls the color of the insides of each bar. color=“black” The “color” command controls the color of the edges of each bar. )
Closing parenthsis for the geom_histogram function. + The addition symbol +
is used to add further elements to the ggplot.
labs( The “labs” function is used to add labels to the plot, like a main title, x-label and y-label. title=“La Guardia Airport Daily Mean Temperature”, The “title=” command allows you to control the main title at the top of the graphic. x=“Temperature”, The “x=” command allows you to control the x-label of the graphic. y=“Number of Days” The “y=” command allows you to control the y-label of the graphic. )
Closing parenthsis for the labs function.
Press Enter to run the code. … Click to View Output.
Gallery
Not yet available.
Histograms group data that are close to each other into “bins” (the vertical bars in the plot). The height of a bin is determined by the number of data points that are contained within the bin. For example, if we group together all the sections of the book of scripture known as the Doctrine and Covenants that occurred in a given year (Jan. 1st - Dec. 31st) then we get the following counts.
Year | Number of Sections |
---|---|
1823 | 1 |
1824 | 0 |
1825 | 0 |
1826 | 0 |
1827 | 0 |
1828 | 1 |
1829 | 16 |
1830 | 19 |
1831 | 37 |
1832 | 16 |
1833 | 12 |
1834 | 5 |
1835 | 3 |
1836 | 4 |
1837 | 1 |
1838 | 8 |
1839 | 3 |
1840 | 0 |
1841 | 3 |
1842 | 2 |
1843 | 4 |
1844 | 1 |
1845 | 0 |
1846 | 0 |
1847 | 1 |
*Note that Section 138 occurred in 1918 and is removed from this example.
In this example, each “bin” spans 365 days (Jan. 1 - Dec. 31 of each year). Since “dates” can be used as quantitative data, it makes sense to make a histogram of these data. (Remember, histograms are only for quantitative data.)
Notice in the bins above that the left edge of the bin is on the year the data corresponds with. The right edge of the bin lands on the following year. For example, the first bin has left edge on 1823 and right edge on 1824. Since there was one revelation in 1823, this bin has a height of 1. The bin that has 1831 on the left and 1832 on the right shows that 37 revelations occurred in 1831. It is powerful to notice the amount of revelations occurring around 1830, the year the Church of Jesus Christ of Latter-day Saints was organized.
1 Quantitative Variable | 2+ Groups
Graphical depiction of the five-number summary. Great for comparing the distributions of data across several groups or categories. Provides a quick visual understanding of the location of the median as well as the range of the data. Can be useful in showing outliers. Sample size should be larger than at least five, or computing the five-number summary is not very meaningful. Side-by-side dotplots are a good alternative for smaller sample sizes.
To make a boxplot in R use the function:
boxplot(object)
To make side-by-side boxplots:
boxplot(object ~ group, data=NameOfYourData, ...)
object
must be quantitative data. R refers to this as a “numeric vector.”group
must be qualitative data. R refers to this as either a “character vector” or a “factor.” However, a “numeric vector” can also act as a qualitative variable.NameOfYourData
is the name of the dataset containing object
and group
....
implies there are many other options that can be given to the boxplot()
function. Type ?boxplot
in your R Console for more details.Example Code
boxplot An R function “boxplot” used to create boxplots. ( Parenthesis to begin the function. Must touch the last letter of the function. airquality “airquality” is a dataset. Type “View(airquality)” in R to see it. $ The $ allows us to access any variable from the airquality dataset. Temp “Temp” is a quantitative variable (numeric vector) from the “airquality” dataset. )
Closing parenthsis for the function.
Press Enter to run the code. … Click to View Output.
boxplot An R function “boxplot” used to create boxplots. ( Parenthesis to begin the function. Must touch the last letter of the function. Temp “Temp” is a quantitative variable (numeric vector) from the “airquality” dataset. ~ The ~ is used to tell R that you want one boxplot of the quantitative variable (“Temp”) for each group found in the qualitative variable (“Month”). Month “Month” is a qualitative variable (in this case a “numeric vector” defining months by 5, 6, 7, 8, and 9) from the “airquality” dataset. ,
The “,” is required to start specifying additional commands for the “boxplot()” function. data=airquality data= is used to tell R that the “Temp” and “Month” variables are located in the airquality dataset. Without this, R will not know where to find “Temp” and “Month” and the command will give an error. ) Functions always end with a closing parenthesis.
Press Enter to run the code. … Click to View Output.
boxplot An R function “boxplot” used to create boxplots. ( Parenthesis to begin the function. Must touch the last letter of the function. Temp “Temp” is a quantitative variable (numeric vector) from the “airquality” dataset. ~ The ~ is used to tell R that you want one boxplot of the quantitative variable (“Temp”) for each group found in the qualitative variable (“Month”). Month “Month” is a qualitative variable (in this case a “numeric vector” defining months by 5, 6, 7, 8, and 9) from the “airquality” dataset. ,
The “,” is required to start specifying additional commands for the “boxplot()” function. data=airquality data= is used to tell R that the “Temp” and “Month” variables are located in the airquality dataset. Without this, R will not know where to find “Temp” and “Month” and the command will give an error. ,
The “,” is required to start specifying additional commands for the “boxplot()” function. names=c(“May”,“June”,“July”,“Aug”,“Sep”) names= is used to tell R what labels to place on the x-axis below each boxplot. ) Functions always end with a closing parenthesis.
Press Enter to run the code. … Click to View Output.
boxplot(Temp ~ Month, data=airquality This code was explained in the previous example code. , The comma is used to separate each additional command to a function. xlab=“Month of the Year” xlab= stands for “x label.” Use it to specify the text to print on the plot under the x-axis. The desired text must always be contained in quotes. , The comma is used to separate each additional command to a function. ylab=“Temperature” ylab= stands for “y label.” Use it to specify the text to print on the plot next to the y-axis. The desired text must always be contained in quotes. , The comma is used to separate each additional command to a function. main=“La Guardia Airport Daily Temperatures” main= stands for the “main label” of the plot, which is placed at the top center of the plot. The desired text must always be contained in quotes. , The comma is used to separate each additional command to a function. col=“wheat” col= stands for the “color” of the plot. The color name “wheat” is an available color in R. Type colors() in the R Console to see more options. The color name must always be placed in quotes. ) Functions always end with a closing parenthesis.
Press Enter to run the code. … Click to View Output.
To make a boxplot in R using the ggplot approach, first ensure
library(ggplot2)
is loaded. Then,
ggplot(data, aes(x=groupsColumn, y=dataColumn) +
geom_boxplot()
data
is the name of your dataset.groupsColumn
is a column of data from your dataset that is qualitative and represents the groups that should each have a boxplot.dataColumn
is a column of data from your dataset that is quantitative.aes(x= , y=)
is how you tell the gpplot to make the x-axis have the values in your groupsColumn
of data, the y-axis become your dataColumn
. Note if groupsColumn
is not a factor, use factor(groupsColumn)
instead.geom_boxplot()
causes the ggplot to become a boxplot.Example Code
Hover your mouse over the example codes to learn more.
Click to see what they do.
ggplot An R function “ggplot” used to create a framework for a graphic that will have elements added to it with the +
sign. ( Parenthesis to begin the function. Must touch the last letter of the function. airquality “airquality” is a dataset. Type “View(airquality)” in R to see it. , The comma allows us to specify optional commands to the function. The space after the comma is not required. It just looks nice. aes( The aes
or “aesthetics” function allows you to tell the ggplot how it should appear. This includes things like what the y-axis should become. y=Temp “y=” declares which variable will become the y-axis of the graphic. )
Closing parenthsis for the aes function. )
Closing parenthsis for the ggplot function. + The addition symbol +
is used to add further elements to the ggplot.
geom_boxplot() The “geom_boxplot()” function causes the ggplot to become a boxplot. There are many other “geom_” functions that could be used.
Press Enter to run the code. … Click to View Output.
ggplot An R function “ggplot” used to create a framework for a graphic that will have elements added to it with the +
sign. ( Parenthesis to begin the function. Must touch the last letter of the function. airquality “airquality” is a dataset. Type “View(airquality)” in R to see it. , The comma allows us to specify optional commands to the function. The space after the comma is not required. It just looks nice. aes( The aes
or “aesthetics” function allows you to tell the ggplot how it should appear. This includes things like what the x-axis or y-axis should become. x=factor(Month), “x=” declares which variable will become the x-axis of the graphic. Since Month is “numeric” we must use “factor(Month)” instead of just “Month”. y=Temp “y=” declares which variable will become the y-axis of the graphic. )
Closing parenthsis for the aes function. )
Closing parenthsis for the ggplot function. + The addition symbol +
is used to add further elements to the ggplot.
geom_boxplot( The “geom_boxplot()” function causes the ggplot to become a boxplot. There are many other “geom_” functions that could be used. fill=“skyblue”, The “fill” command controls the color of the insides of each box in the boxplot. color=“black” The “color” command controls the color of the edges of each box. )
Closing parenthsis for the geom_boxplot function.
Press Enter to run the code. … Click to View Output.
ggplot An R function “ggplot” used to create a framework for a graphic that will have elements added to it with the +
sign. ( Parenthesis to begin the function. Must touch the last letter of the function. airquality “airquality” is a dataset. Type “View(airquality)” in R to see it. , The comma allows us to specify optional commands to the function. The space after the comma is not required. It just looks nice. aes( The aes
or “aesthetics” function allows you to tell the ggplot how it should appear. This includes things like what the x-axis or y-axis should become. x=factor(Month), “x=” declares which variable will become the x-axis of the graphic. Since Month is “numeric” we must use “factor(Month)” instead of just “Month”. y=Temp “y=” declares which variable will become the y-axis of the graphic. )
Closing parenthsis for the aes function. )
Closing parenthsis for the ggplot function. + The addition symbol +
is used to add further elements to the ggplot.
geom_boxplot( The “geom_histogram()” function causes the ggplot to become a histogram. There are many other “geom_” functions that could be used. fill=“skyblue”, The “fill” command controls the color of the insides of each box. color=“black” The “color” command controls the color of the edges of each box. )
Closing parenthsis for the geom_boxplot function. + The addition symbol +
is used to add further elements to the ggplot.
labs( The “labs” function is used to add labels to the plot, like a main title, x-label and y-label. title=“La Guardia Airport Daily Mean Temperature”, The “title=” command allows you to control the main title at the top of the graphic. x=“Month of the Year”, The “x=” command allows you to control the x-label of the graphic. y=“Daily Mean Temperature” The “y=” command allows you to control the y-label of the graphic. )
Closing parenthsis for the labs function.
Press Enter to run the code. … Click to View Output.
Gallery
Not yet available.
Understanding how a boxplot is created is the best way to understand what the boxplot shows.
1 Quantitative Variable | 2+ Groups
Depicts the actual values of each data point. Best for small sample sizes or for datasets where there are lots of repeated values. Histograms or boxplots are better alternatives for large sample sizes when there are few repeated values. Great for comparing the distribution of data across several groups or categories.
To make a dot plot in Base R use the code:
stripchart(object)
For side-by-side dotplots:
stripchart(object ~ group, data=NameOfYourData)
object
must be a quantitative (or ordinal) variable, what R refers to as a “numeric vector.”group
is a qualitative variable, which in R can be either a “character vector” or a “factor.”NameOfYourData
is the name of the dataset containing object
and group
.Example Code
stripchart An R function “stripchart” used to create a dot plot. ( Parenthesis to begin the function. Must touch the last letter of the function. airquality “airquality” is a dataset. Type “View(airquality)” in R to see it. $ The $ allows us to access any variable from the airquality dataset. Temp “Temp” is a quantitative variable (numeric vector) from the “airquality” dataset. ,
The “,” is required to start specifying additional commands for the function. method=“stack”
method= allows us to choose from the options “overplot”, “jitter”, and “stack”. The “stack” option stacks mutliple points that occur at the same location on top of each other. You can try the code yourself to see what “overplot” and “jitter” do. )
Closing parenthsis for the function.
Press Enter to run the code. … Click to View Output.
stripchart An R function “stripchart” used to create dot plots. ( Parenthesis to begin the function. Must touch the last letter of the function. Temp “Temp” is a quantitative variable (numeric vector) from the “airquality” dataset. ~ The ~ is used to tell R that you want a dot plot of the quantitative variable (“Temp”) for each group found in the qualitative variable (“Month”). Month “Month” is a qualitative variable (in this case a “numeric vector” defining months by 5, 6, 7, 8, and 9) from the “airquality” dataset. ,
The “,” is required to start specifying additional commands for the function. data=airquality data= is used to tell R that the “Temp” and “Month” variables are located in the airquality dataset. Without this, R will not know where to find “Temp” and “Month” and the command will give an error. ,
The “,” is required to start specifying additional commands for the function. method=“stack”
method= allows us to choose from the options “overplot”, “jitter”, and “stack”. The “stack” option stacks mutliple points that occur at the same location on top of each other. You can try the code yourself to see what “overplot” and “jitter” do. ) Functions always end with a closing parenthesis.
Press Enter to run the code. … Click to View Output.
stripchart(Temp ~ Month This part of the code was explained already in the example code directly above this one. ,
The “,” is required to start specifying additional commands for the function. data=airquality data= is used to tell R that the “Temp” and “Month” variables are located in the airquality dataset. Without this, R will not know where to find “Temp” and “Month” and the command will give an error. ,
The “,” is required to start specifying additional commands for the function. method=“stack”
method= allows us to choose from the options “overplot”, “jitter”, and “stack”. The “stack” option stacks mutliple points that occur at the same location on top of each other. You can try the code yourself to see what “overplot” and “jitter” do. , The comma is used to separate each additional command to a function. ylab=“Month of the Year” ylab= stands for “y label.” Use it to specify the text to print on the plot next to the y-axis. The desired text must always be contained in quotes. , The comma is used to separate each additional command to a function. xlab=“Temperature” xlab= stands for “x label.” Use it to specify the text to print on the plot below the x-axis. The desired text must always be contained in quotes. , The comma is used to separate each additional command to a function. main=“La Guardia Airport Daily Temperatures” main= stands for the “main label” of the plot, which is placed at the top center of the plot. The desired text must always be contained in quotes. , The comma is used to separate each additional command to a function. col=“sienna” col= stands for the “color” of the plot. The color name “sienna” is an available color in R. Type colors() in the R Console to see more options. The color name must always be placed in quotes. , The comma is used to separate each additional command to a function. pch=16 pch= stands for the “plotting character” of the plot. This plot uses the filled circle (option 16) as the plotting character. The options are 0, 1, 2, …, 25. Type ?pch in the R Console, and scroll down the help file half way to see what each option does. ) Functions always end with a closing parenthesis.
Press Enter to run the code. … Click to View Output.
To make a dot plot in R using the ggplot approach, first ensure:
library(ggplot2)
is loaded. Then,
ggplot(data, aes(x=groupsColumn, y=dataColumn) +
geom_dotplot()
data
is the name of your dataset.groupsColumn
is a column of data from your dataset that is qualitative and represents the groups that should each have a boxplot.dataColumn
is a column of data from your dataset that is quantitative.aes(x= , y=)
is how you tell the gpplot to make the x-axis have the values in your groupsColumn
of data, the y-axis become your dataColumn
. Note if groupsColumn
is not a factor, use factor(groupsColumn)
instead.geom_dotplot()
causes the ggplot to become a dot plot.Example Code
Click to view. Hover to learn.
ggplot An R function “ggplot” used to create a framework for a graphic that will have elements added to it with the +
sign. ( Parenthesis to begin the function. Must touch the last letter of the function. airquality “airquality” is a dataset. Type “View(airquality)” in R to see it. , The comma allows us to specify optional commands to the function. The space after the comma is not required. It just looks nice. aes( The aes
or “aesthetics” function allows you to tell the ggplot how it should appear. This includes things like what the y-axis should become. x=Temp “x=” declares which variable will become the x-axis of the graphic. )
Closing parenthsis for the aes function. )
Closing parenthsis for the ggplot function. + The addition symbol +
is used to add further elements to the ggplot.
geom_dotplot() The “geom_dotplot()” function causes the ggplot to become a dot plot. There are many other “geom_” functions that could be used.
Press Enter to run the code. … Click to View Output.
ggplot An R function “ggplot” used to create a framework for a graphic that will have elements added to it with the +
sign. ( Parenthesis to begin the function. Must touch the last letter of the function. airquality “airquality” is a dataset. Type “View(airquality)” in R to see it. , The comma allows us to specify optional commands to the function. The space after the comma is not required. It just looks nice. aes( The aes
or “aesthetics” function allows you to tell the ggplot how it should appear. This includes things like what the x-axis or y-axis should become. x=factor(Month), “x=” declares which variable will become the x-axis of the graphic. Use factor(Month) to change “Month”, which is numeric, into categories. y=Temp “y=” declares which variable will become the y-axis of the graphic. )
Closing parenthsis for the aes function. )
Closing parenthsis for the ggplot function. + The addition symbol +
is used to add further elements to the ggplot.
geom_dotplot( The “geom_dotplot()” function causes the ggplot to become a dot plot. There are many other “geom_” functions that could be used. binaxis = “y”, This tells the function that the y=Temp statement should be used as the quantitative data. stackdir = “up”, This causes the dots to be stacked on top of each other. position = “dodge”, This causes the dots to not overalap, i.e., “dodge each other.” dotsize = 0.75, Controls the size of the dots. You can make them larger with numbers greater than 1 and smaller with numbers less than 1. binwidth = 0.5 Controls how the dots are grouped, similar to the bins in a histogram. )
Closing parenthsis for the geom_dotplot function.
Press Enter to run the code. … Click to View Output.
ggplot An R function “ggplot” used to create a framework for a graphic that will have elements added to it with the +
sign. ( Parenthesis to begin the function. Must touch the last letter of the function. airquality “airquality” is a dataset. Type “View(airquality)” in R to see it. , The comma allows us to specify optional commands to the function. The space after the comma is not required. It just looks nice. aes( The aes
or “aesthetics” function allows you to tell the ggplot how it should appear. This includes things like what the x-axis or y-axis should become. x=factor(Month), “x=” declares which variable will become the x-axis of the graphic. Use factor(Month) to change “Month”, which is numeric, into categories. y=Temp “y=” declares which variable will become the y-axis of the graphic. )
Closing parenthsis for the aes function. )
Closing parenthsis for the ggplot function. + The addition symbol +
is used to add further elements to the ggplot.
coord_flip( ) The “coord_flip()” function causes the ggplot to reverse the axes when drawing the plot. However, all commands must be given as if the plot were to be drawn without coord_flip(), then coord_flip() is applied. + The addition symbol +
is used to add further elements to the ggplot.
geom_dotplot( The “geom_dotplot()” function causes the ggplot to become a dot plot. There are many other “geom_” functions that could be used. binaxis = “y”, This tells the function that the y=Temp statement should be used as the quantitative data. stackdir = “up”, This causes the dots to be stacked on top of each other. position = “dodge”, This causes the dots to not overalap, i.e., “dodge each other.” dotsize = 0.75, Controls the size of the dots. You can make them larger with numbers greater than 1 and smaller with numbers less than 1. binwidth = 0.5 Controls how the dots are grouped, similar to the bins in a histogram. )
Closing parenthsis for the geom_dotplot function. + The addition symbol +
is used to add further elements to the ggplot.
labs( The “labs” function is used to add labels to the plot, like a main title, x-label and y-label. title=“La Guardia Airport Daily Mean Temperature”, The “title=” command allows you to control the main title at the top of the graphic. x=“Month of the Year”, The “x=” command allows you to control the x-label of the graphic. y=“Daily Mean Temperature” The “y=” command allows you to control the y-label of the graphic. )
Closing parenthsis for the labs function.
Press Enter to run the code. … Click to View Output.
Not yet available.
2 Quantitative Variables
Depicts the actual values of the data points, which are \((x,y)\) pairs. Works well for small or large sample sizes. Visualizes well the correlation between the two variables. Should be used in linear regression contexts whenever possible.
To make a scatterplot in R use the code:
plot(y ~ x, data=NameOfYourData)
y
is the quantitative response variable, i.e., “numeric vector.”x
is the quantitative explanatory variable, i.e., “numeric vector.”NameOfYourData
is the name of the dataset containing y
and x
.Note: plot(object)
where object
is a “numeric vector” will create a time series plot, which is sometimes useful.
Example Code
plot An R function “plot” used to create a scatterplot, or in this case a time series plot because only one quantitative variable is being supplied to the function. ( Parenthesis to begin the function. Must touch the last letter of the function. airquality “airquality” is a dataset. Type “View(airquality)” in R to see it. $ The $ allows us to access any variable from the airquality dataset. Temp “Temp” is a quantitative variable (numeric vector) from the “airquality” dataset. ,
The “,” is required to start specifying additional commands for the function. type=“l”
type= allows us to choose from the options “p” for points, “l” for lines, and “b” for both. There are also other options that could be chosen, type ?plot in the R Console to learn about them. )
Closing parenthsis for the function.
Press Enter to run the code. … Click to View Output.
plot An R function “plot” used to create a scatterplot. ( Parenthesis to begin the function. Must touch the last letter of the function. Temp “Temp” is a quantitative variable (numeric vector) from the “airquality” dataset that is being used as the response variable (y-axis) for this plot. ~ The ~ is used to tell R that you want a scatterplot with the quantitative variable “Temp” on the y-axis and the qauntitative variable “Month” on the x-axis. Wind “Wind” is a quantitative variable (numeric vector) from the “airquality” dataset that is being used as the explanatory variable (x-axis) for this plot. ,
The “,” is required to start specifying additional commands for the function. data=airquality data= is used to tell R that the “Temp” and “Month” variables are located in the airquality dataset. Without this, R will not know where to find “Temp” and “Month” and the command will give an error. ,
The “,” is required to start specifying additional commands for the function. pch=8 pch= stands for the “plotting character” of the plot. This plot uses the star shape (option 8) as the plotting character. The options are 0, 1, 2, …, 25. Type ?pch in the R Console, and scroll down the help file half way to see what each option does. ) Functions always end with a closing parenthesis.
Press Enter to run the code. … Click to View Output.
plot(Temp ~ Wind This part of the code was explained already in the example code directly above this one. ,
The “,” is required to start specifying additional commands for the function. data=airquality data= is used to tell R that the “Temp” and “Month” variables are located in the airquality dataset. Without this, R will not know where to find “Temp” and “Month” and the command will give an error. ,
The “,” is required to start specifying additional commands for the function. xlab=“Daily Wind Speed (mph)” xlab= stands for “x label.” Use it to specify the text to print on the plot below the x-axis. The desired text must always be contained in quotes. , The comma is used to separate each additional command to a function. ylab=“Temperature” ylab= stands for “y label.” Use it to specify the text to print on the plot next to the y-axis. The desired text must always be contained in quotes. , The comma is used to separate each additional command to a function. main=“La Guardia Airport (May - Sep)” main= stands for the “main label” of the plot, which is placed at the top center of the plot. The desired text must always be contained in quotes. , The comma is used to separate each additional command to a function. col=“ivory3” col= stands for the “color” of the plot. The color name “ivory3” is an available color in R. Type colors() in the R Console to see more options. The color name must always be placed in quotes. , The comma is used to separate each additional command to a function. pch=18 pch= stands for the “plotting character” of the plot. This plot uses the filled diamond (option 18) as the plotting character. The options are 0, 1, 2, …, 25. Type ?pch in the R Console, and scroll down the help file half way to see what each option does. ) Functions always end with a closing parenthesis.
Press Enter to run the code. … Click to View Output.
pch Options
To make a scatterplot in R using the ggplot approach, first ensure:
library(ggplot2)
is loaded. Then,
ggplot(data, aes(x=dataColumn1, y=dataColumn2) +
geom_point()
data
is the name of your dataset.dataColumn1
is a column of data from your dataset that is quantitative and will be used as the explanatory variable.dataColumn2
is a column of data from your dataset that is quantitative and will be used as the response variable.aes(x= , y=)
is how you tell the gpplot to make the x-axis have the values in your dataColumn1
of data, the y-axis become your dataColumn2
.geom_point()
causes the ggplot to become a scatterplot.Example Code
Click to view. Hover to learn.
ggplot An R function “ggplot” used to create a framework for a graphic that will have elements added to it with the +
sign. ( Parenthesis to begin the function. Must touch the last letter of the function. airquality “airquality” is a dataset. Type “View(airquality)” in R to see it. , The comma allows us to specify optional commands to the function. The space after the comma is not required. It just looks nice. aes( The aes
or “aesthetics” function allows you to tell the ggplot how it should appear. This includes things like what the x-axis or y-axis should become. x=Wind, “x=” declares which variable will become the x-axis of the graphic, the explanatory variable. y=Temp “y=” declares which variable will become the y-axis of the graphic. )
Closing parenthsis for the aes function. )
Closing parenthsis for the ggplot function. + The addition symbol +
is used to add further elements to the ggplot.
geom_point( The “geom_point()” function causes the ggplot to become a scatterplot. There are many other “geom_” functions that could be used. )
Closing parenthsis for the geom_point function.
Press Enter to run the code. … Click to View Output.
ggplot An R function “ggplot” used to create a framework for a graphic that will have elements added to it with the +
sign. ( Parenthesis to begin the function. Must touch the last letter of the function. airquality “airquality” is a dataset. Type “View(airquality)” in R to see it. , The comma allows us to specify optional commands to the function. The space after the comma is not required. It just looks nice. aes( The aes
or “aesthetics” function allows you to tell the ggplot how it should appear. This includes things like what the x-axis or y-axis should become. x=Wind, “x=” declares which variable will become the x-axis of the graphic, the explanatory variable. y=Temp “y=” declares which variable will become the y-axis of the graphic. )
Closing parenthsis for the aes function. )
Closing parenthsis for the ggplot function. + The addition symbol +
is used to add further elements to the ggplot.
geom_point( The “geom_point()” function causes the ggplot to become a scatterplot. There are many other “geom_” functions that could be used. color = “ivory3”, Controls the color of the dots. pch = 18 Controls the type of plotting character to be used in the plot. )
Closing parenthsis for the geom_point function. + The addition symbol +
is used to add further elements to the ggplot.
labs( The “labs” function is used to add labels to the plot, like a main title, x-label and y-label. title=“La Guardia Airport (May - Sep)”, The “title=” command allows you to control the main title at the top of the graphic. x=“Daily Average Wind Speed (mph)”, The “x=” command allows you to control the x-label of the graphic. y=“Daily Mean Temperature” The “y=” command allows you to control the y-label of the graphic. )
Closing parenthsis for the labs function. + The addition symbol +
is used to add further elements to the ggplot.
theme_bw()
Changes the “theme” or look of the plot to “black” and “white”.
Press Enter to run the code. … Click to View Output.
To make a scatterplot in R using the plotly approach, first ensure:
library(plotly)
is loaded. Then,
plot_ly(data, x= ~dataColumn1, y= ~dataColumn2)
data
is the name of your dataset.dataColumn1
is a column of data from your dataset that is quantitative and will be used as the explanatory variable.dataColumn2
is a column of data from your dataset that is quantitative and will be used as the response variable.Example Code
plot_ly(KidsFeet,
x= ~length,
y= ~width,
color= ~sex,
size= ~birthmonth,
text= ~paste("Name:", name, "\n", "Birth-Month:", birthmonth),
colors=c("skyblue","hotpink")) %>%
layout(title="KidsFeet dataset",
xaxis=list(title="Length of the longer foot in cm"),
yaxis=list(title="Width of the longer foot in cm"))
1 (or 2) Qualitative Variable(s)
Depicts the number of occurrances for each category, or level, of the qualitative variable. Similar to a histogram, but there is no natural way to order the bars. Thus the white-space between each bar. It is called a Pareto chart if the bars are ordered from tallest to shortest. Clustered and stacked bar charts are often used to display information for two qualitative variables simultaneously.
To make a bar chart in R use the code:
barplot(heights)
heights
must be a “numeric vector” that contains the heights for each bar that will be drawn in the plot.Note: both the c()
and table()
functions can be used to specify the heights
. The example codes below demonstrate.
Example Code
Using thec()
function.
barplot barplot is an R function used to create a bar chart. ( Parenthesis to begin the barplot function. Must touch the last letter of the function. c c is an R function used to concatenate a list of values together into a “vector.” It is being used here to specify the heights of the 4 bars in the bar plot. ( Parenthesis to begin the c function. Must touch the last letter of the function. 10,5,28,3 This list of numbers will be joined together into a single “vector.” There is no limit on the number of entries that can be put into such a list. )
Closing parenthsis for the c() function. ,
The “,” is required to start specifying additional commands for the barplot function. col=“gray24” col= stands for the “color” of the plot. The color name “gray24” is an available color in R. Type colors() in the R Console to see more options. The color name must always be placed in quotes. )
Closing parenthsis for the barplot function.
Press Enter to run the code. … Click to View Output.
barplot barplot is an R function used to create a bar chart. ( Parenthesis to begin the barplot function. Must touch the last letter of the function. c c is an R function used to concatenate a list of values together into a “vector.” It is being used here to specify the heights of the 4 bars in the bar plot. ( Parenthesis to begin the c function. Must touch the last letter of the function. Pigs=10,Cats=5,Dogs=28,Roosters=3 This named list of numbers will be joined together into a single “vector.” There is no limit on the number of entries that can be put into such a list. Notice how the names show up as the labels for each bin in the bar chart. )
Closing parenthsis for the c() function. ,
The “,” is required to start specifying additional commands for the barplot function. col=“gray44” col= stands for the “color” of the plot. The color name “gray44” is an available color in R. Type colors() in the R Console to see more options. The color name must always be placed in quotes. )
Closing parenthsis for the barplot function.
Press Enter to run the code. … Click to View Output.
barplot( barplot is an R function used to create a bar chart. rbind( rbind stands for “row bind” and is a function that joins together different c() vectors to make them become rows of a table. `Farm 1`=c(Pigs=10,Cats=5,Dogs=28,Roosters=3) Notice how this c() vector of named values is being named “Farm 1.” The tick marks ` ` are required to specify a name of a vector that has a space in it. If the name was just Farm1 (without a space) then the tick marks would not be needed. Since `Farm 1` is the first vector in the rbind() function, it will become the first row of the resulting table that rbind() will create. ,
The “,” is required to specify additional c() vectors for the rbind() function. `Farm 2`=c(Pigs=15,Cats=3,Dogs=8,Roosters=1) Notice how this c() vector of named values is being named “Farm 2.” It will become the second row of the table created by rbind(). )
Closing parenthsis for the rbind() function. ,
The “,” is required to specify additional commands for the barplot function. col=c(“gray84”,“gray44”) col= stands for the “color” of the plot. Here two colors: “gray84” and “gray44” are being passed to the col= option by using the c() function. Notice how these two colors are used in the resulting bar chart. ,
The “,” is required to specify additional commands for the barplot function. beside=TRUE beside= can be set to either TRUE or FALSE. When it is TRUE, the bars are clustered side-by-side. When it is set to FALSE, the bars are stacked on top of each other. Typically, beside=TRUE is preferred. ,
The “,” is required to specify additional commands for the barplot function. legend.text=TRUE legend.text=TRUE allows for the legend to be placed on the barplot. )
Closing parenthsis for the barplot function.
Press Enter to run the code. … Click to View Output.
table()
function.
barplot barplot is an R function used to create a bar chart. ( Parenthesis to begin the function. Must touch the last letter of the function. table table is an R function used to tabulate how many times each value occurs in a given dataset. It is being used here to specify the heights of the bars in the bar chart. ( Parenthesis to begin the function. Must touch the last letter of the function. mtcars “mtcars” is a dataset. Type “View(mtcars)” in R to see it. $ The $ allows us to access any variable from the mtcars dataset. cyl “cyl” is a qualitative variable (in this case actually a numeric vector acting as a qualitative variable) from the “mtcars” dataset. It represents the number of cylinders the vehicle’s engine has. )
Closing parenthsis for the table() function. ,
The “,” is required to start specifying additional commands for the barplot function. col=“cornsilk” col= stands for the “color” of the plot. The color name “cornsilk” is an available color in R. Type colors() in the R Console to see more options. The color name must always be placed in quotes. )
Closing parenthsis for the barplot function.
Press Enter to run the code. … Click to View Output.
barplot( barplot is an R function used to create a bar chart. table( table is an R function used to tabulate how many times each pair of values occurs in a given dataset. It is being used here to specify the heights of the bars in this clustered bar chart. mtcars$am “mtcars” is a dataset and the $ sign is being used to access the “am” variable from that dataset. Note that “am” is being used as a qualitative variable, but is actually a numeric vector acting as a qualitative variable. It denotes whether the vehicle is an automatic (0) or manual (1) transmission. ,
The “,” is required to specify additional variables for the table() function. mtcars$cyl “mtcars” is a dataset and the $ sign is being used to access the “cyl” variable from that dataset. The “cyl” variable gives the cylinders of the vehicle’s engine as either 4, 6, or 8. So even though it is numeric, it can be used as a qualitative variable. )
Closing parenthsis for the table() function. ,
The “,” is required to start specifying additional commands for the barplot function. beside=TRUE
beside= is an optional command to the barplot() function. When TRUE, the bars are placed next to each other. When FALSE, the bars are stacked on top of each other. ,
The “,” is required to specify additional commands for the barplot function. col=c(“firebrick”,“snow1”) col= stands for the “color” of the plot. The colors of “firebrick” and “snow1” are being passed to the col= option using the c() function. ,
The “,” is required to specify additional commands for the barplot function. legend.text=TRUE legend.text=TRUE allows for the legend to be placed on the barplot. ,
The “,” is required to specify additional commands for the barplot function. xlab=“Cylinders” xlab= stands for “x label.” Use it to specify the text to print on the plot below the x-axis. The desired text must always be contained in quotes. )
Closing parenthsis for the barplot function.
Press Enter to run the code. … Click to View Output.
To make a bar chart in R using the ggplot approach, first ensure:
library(ggplot2)
is loaded. Then,
ggplot(data, aes(x=groupsColumn, y=countsColumn) +
geom_bar()
data
is the name of your dataset.groupsColumn
is a column of data from your dataset that is qualitative and represents the groups that should each have a bar in the barplot.countsColumn
is a column of data from your dataset that contains the counts of how many times each group has been observed.aes(x= , y=)
is how you tell the gpplot to make the x-axis have the values in your groupsColumn
of data, the y-axis become your countsColumn
. Note if groupsColumn
is not a factor, use factor(groupsColumn)
instead.geom_bar()
causes the ggplot to become a bar chart.Example Code
Manually building the counts data.
FarmAnimals <- data.frame(animal = c(“pigs”,“cats”,“dogs”,“Roosters”), count = c(10,5,28,3)) This code creates a data set manually called FarmAnimals using the data.frame() function. Notice that there are two columns in this dataset: “animal” and “count”.
ggplot An R function “ggplot” used to create a framework for a graphic that will have elements added to it with the +
sign. ( Parenthesis to begin the function. Must touch the last letter of the function. FarmAnimals “FarmAnimals” is a dataset we just created. Type “View(FarmAnimals)” in R after running the above code to see it. , The comma allows us to specify optional commands to the function. The space after the comma is not required. It just looks nice. aes( The aes
or “aesthetics” function allows you to tell the ggplot how it should appear. This includes things like what the x-axis or y-axis should become. animal, Declares which variable will become the x-axis of the graphic, the explanatory variable. count, Declares which variable will become the y-axis of the graphic. )
Closing parenthsis for the aes function. )
Closing parenthsis for the ggplot function. + The addition symbol +
is used to add further elements to the ggplot.
geom_col( ) The “geom_col()” function is being used here instead of “geom_bar()” because this is a very simple bar chart for just one groups column.
Press Enter to run the code. … Click to View Output.
FarmAnimals <- data.frame(animal = c(“pigs”,“pigs”,“cats”,“cats”,“dogs”,“dogs”,“Roosters”,“Roosters”), count = c(6,4,2,3,18,10,2,1), farm = c(“farm1”,“farm2”,“farm1”,“farm2”,“farm1”,“farm2”,“farm1”,“farm2”)) This code creates a data set manually called FarmAnimals using the data.frame() function. Notice that there are two columns in this dataset: “animal” and “count”.
ggplot An R function “ggplot” used to create a framework for a graphic that will have elements added to it with the +
sign. ( Parenthesis to begin the function. Must touch the last letter of the function. FarmAnimals “FarmAnimals” is a dataset we just created. Type “View(FarmAnimals)” in R after running the above code to see it. , The comma allows us to specify optional commands to the function. The space after the comma is not required. It just looks nice. aes( The aes
or “aesthetics” function allows you to tell the ggplot how it should appear. This includes things like what the x-axis or y-axis should become. x = animal, Declares which variable will become the x-axis of the graphic, the explanatory variable. y = count, Declares which variable will become the y-axis of the graphic. fill = farm, Declares which variable will become the y-axis of the graphic. )
Closing parenthsis for the aes function. )
Closing parenthsis for the ggplot function. + The addition symbol +
is used to add further elements to the ggplot.
geom_bar( The “geom_bar” function tells the ggplot() to become a bar chart. stat = “identity”,
Tells the ggplot to use the counts as listed in the counts column. position = “dodge”,
Causes the bars in the barchart to be side-by-side rather than stacked. color = “black”,
Controls the colors of the borders of the bars in the plot. )
Closing parenthsis for the geom_bar() function.
Press Enter to run the code. … Click to View Output.
ggplot An R function “ggplot” used to create a framework for a graphic that will have elements added to it with the +
sign. ( Parenthesis to begin the function. Must touch the last letter of the function. mtcars “mtcars” is a dataset in R. Type “View(mtcars)” in R to see it. , The comma allows us to specify optional commands to the function. The space after the comma is not required. It just looks nice. aes( The aes
or “aesthetics” function allows you to tell the ggplot how it should appear. This includes things like what the x-axis or y-axis should become. x = factor(cyl) Declares which variable will become the x-axis of the graphic. Use factor(columnName) when the column consists of numbers to turn it into groups. )
Closing parenthsis for the aes function. )
Closing parenthsis for the ggplot function. + The addition symbol +
is used to add further elements to the ggplot.
geom_bar( The “geom_bar” function tells the ggplot() to become a bar chart. fill = “cornsilk”,
Controls the colors of the insides of the bars in the plot. color = “black” Controls the colors of the borders of the bars in the plot. )
Closing parenthsis for the geom_bar() function.
Press Enter to run the code. … Click to View Output.
ggplot An R function “ggplot” used to create a framework for a graphic that will have elements added to it with the +
sign. ( Parenthesis to begin the function. Must touch the last letter of the function. mtcars “mtcars” is a dataset in R. Type “View(mtcars)” in R to see it. , The comma allows us to specify optional commands to the function. The space after the comma is not required. It just looks nice. aes( The aes
or “aesthetics” function allows you to tell the ggplot how it should appear. This includes things like what the x-axis or y-axis should become. x = factor(cyl), Declares which variable will become the x-axis of the graphic. Use factor(columnName) when the column consists of numbers to turn it into groups. fill = factor(am), Declares which variable will become the x-axis of the graphic. Use factor(columnName) when the column consists of numbers to turn it into groups. )
Closing parenthsis for the aes function. )
Closing parenthsis for the ggplot function. + The addition symbol +
is used to add further elements to the ggplot.
geom_bar( The “geom_bar” function tells the ggplot() to become a bar chart. position = “dodge”,
Causes the bars to be side-by-side instead of stacked. color = “black” Controls the colors of the borders of the bars in the plot. )
Closing parenthsis for the geom_bar() function. + The addition symbol +
is used to add further elements to the ggplot.
labs(x=“Cylinders”) The “labs” function is being used to add a title to the x-axis only. title=“main title” and y=“y title” could also be used.
Press Enter to run the code. … Click to View Output.
Not yet available.
Creativity Required
Sometimes no standard plot sufficiently describes the data. In these cases, the only guideline is the one stated originally, “the graphical depiction of data should communicate the truth the data has to offer about the situation of interest.”
You should add links to examples you find of interesting plots made in R.
Here is the R Code for the graphic to the left: