R4DS Chapter 2 Exercises

Published

October 15, 2022

2.2.4 Exercises

library(tidyverse)
── Attaching packages ─────────────────────────────────────── tidyverse 1.3.2 ──
✔ ggplot2 3.4.0          ✔ purrr   0.3.5     
✔ tibble  3.1.8          ✔ dplyr   1.0.10    
✔ tidyr   1.2.1.9001     ✔ stringr 1.5.0     
✔ readr   2.1.3          ✔ forcats 0.5.2     
── Conflicts ────────────────────────────────────────── tidyverse_conflicts() ──
✖ dplyr::filter() masks stats::filter()
✖ dplyr::lag()    masks stats::lag()
  1. Run ggplot(data = mpg). What do you see?
ggplot(data = mpg)

This creates the background of the plot, but since no layers were specified with geom function, nothing is drawn.

I see empty.

  1. How many rows are in mpg? How many columns?
nrow(mpg)
[1] 234

There are 234 rows.

We can also use glimpes() to find number of rows:

glimpse(mpg)
Rows: 234
Columns: 11
$ manufacturer <chr> "audi", "audi", "audi", "audi", "audi", "audi", "audi", "…
$ model        <chr> "a4", "a4", "a4", "a4", "a4", "a4", "a4", "a4 quattro", "…
$ displ        <dbl> 1.8, 1.8, 2.0, 2.0, 2.8, 2.8, 3.1, 1.8, 1.8, 2.0, 2.0, 2.…
$ year         <int> 1999, 1999, 2008, 2008, 1999, 1999, 2008, 1999, 1999, 200…
$ cyl          <int> 4, 4, 4, 4, 6, 6, 6, 4, 4, 4, 4, 6, 6, 6, 6, 6, 6, 8, 8, …
$ trans        <chr> "auto(l5)", "manual(m5)", "manual(m6)", "auto(av)", "auto…
$ drv          <chr> "f", "f", "f", "f", "f", "f", "f", "4", "4", "4", "4", "4…
$ cty          <int> 18, 21, 20, 21, 16, 18, 18, 18, 16, 20, 19, 15, 17, 17, 1…
$ hwy          <int> 29, 29, 31, 30, 26, 26, 27, 26, 25, 28, 27, 25, 25, 25, 2…
$ fl           <chr> "p", "p", "p", "p", "p", "p", "p", "p", "p", "p", "p", "p…
$ class        <chr> "compact", "compact", "compact", "compact", "compact", "c…
  1. What does the drv variable describe? Read the help for ?mpg to find out.
?mpg

dv means: “the type of drive train, where f = front-wheel drive, r = rear wheel drive, 4 = 4wd”

  1. Make a scatterplot of hwy vs cyl.
ggplot(data = mpg)+
geom_point(aes(hwy,cyl))

  1. What happens if you make a scatterplot of class vs drv? Why is the plot not useful?
ggplot(data = mpg)+
geom_point(aes(class,drv))

A scatter plot is not a useful display of these variables since both drv and class are categorical variables. A scatterplots work best for plotting a continuous x and a continuous y variable, and when all (x, y) values are unique.

However, there are other methods for plotting categorical variables: geom_count() and geom_tile()

ggplot(mpg, aes(x = class, y = drv)) +
  geom_count()

mpg %>%
  count(class, drv) %>%
  ggplot(aes(x = class, y = drv)) +
    geom_tile(mapping = aes(fill = n))

In the previous plot, there are many missing tiles. These missing tiles represent unobserved combinations of class and drv values. These missing values are not unknown, but represent values of (class, drv) where n = 0. The complete() function in the tidyr package adds new rows to a data frame for missing combinations of columns.

mpg %>%
  count(class, drv) %>%
  complete(class, drv, fill = list(n = 0)) %>%
  ggplot(aes(x = class, y = drv)) +
    geom_tile(mapping = aes(fill = n))

2.3.1 Exercises

  1. What’s gone wrong with this code? Why are the points not blue?
ggplot(data = mpg) + 
  geom_point(mapping = aes(x = displ, y = hwy, color = "blue"))

The argument colour = “blue” is included within the mapping argument, and as such, it is treated as an aesthetic, which is a mapping between a variable and a value

Below is the correct format:

ggplot(data = mpg) + 
  geom_point(mapping = aes(x = displ, y = hwy), color = "blue")

and it is blow.