starwars
## # A tibble: 87 x 14## name height mass hair_~1 skin_~2 eye_c~3 birth~4 sex gender## <chr> <int> <dbl> <chr> <chr> <chr> <dbl> <chr> <chr> ## 1 Luke~ 172 77 blond fair blue 19 male mascu~## 2 C-3PO 167 75 <NA> gold yellow 112 none mascu~## 3 R2-D2 96 32 <NA> white,~ red 33 none mascu~## 4 Dart~ 202 136 none white yellow 41.9 male mascu~## 5 Leia~ 150 49 brown light brown 19 fema~ femin~## 6 Owen~ 178 120 brown,~ light blue 52 male mascu~## # ... with 81 more rows, 5 more variables: homeworld <chr>,## # species <chr>, films <list>, vehicles <list>,## # starships <list>, and abbreviated variable names## # 1: hair_color, 2: skin_color, 3: eye_color, 4: birth_year
Take a glimpse
at the data:
glimpse(starwars)
## Rows: 87## Columns: 14## $ name <chr> "Luke Skywalker", "C-3PO", "R2-D2", "Darth V~## $ height <int> 172, 167, 96, 202, 150, 178, 165, 97, 183, 1~## $ mass <dbl> 77.0, 75.0, 32.0, 136.0, 49.0, 120.0, 75.0, ~## $ hair_color <chr> "blond", NA, NA, "none", "brown", "brown, gr~## $ skin_color <chr> "fair", "gold", "white, blue", "white", "lig~## $ eye_color <chr> "blue", "yellow", "red", "yellow", "brown", ~## $ birth_year <dbl> 19.0, 112.0, 33.0, 41.9, 19.0, 52.0, 47.0, N~## $ sex <chr> "male", "none", "none", "male", "female", "m~## $ gender <chr> "masculine", "masculine", "masculine", "masc~## $ homeworld <chr> "Tatooine", "Tatooine", "Naboo", "Tatooine",~## $ species <chr> "Human", "Droid", "Droid", "Human", "Human",~## $ films <list> <"The Empire Strikes Back", "Revenge of the~## $ vehicles <list> <"Snowspeeder", "Imperial Speeder Bike">, <~## $ starships <list> <"X-wing", "Imperial shuttle">, <>, <>, "TI~
How many rows and columns does this dataset have? What does each row represent? What does each column represent?
?starwars
How many rows and columns does this dataset have?
nrow(starwars) # number of rows
## [1] 87
ncol(starwars) # number of columns
## [1] 14
dim(starwars) # dimensions (row column)
## [1] 87 14
How would you describe the relationship between mass and height of Starwars characters? What other variables would help us understand data points that don't follow the overall trend? Who is the not so tall but really chubby character?
"The simple graph has brought more information to the data analyst's mind than any other device." — John Tukey
gg
in "ggplot2" stands for Grammar of Graphics A grammar of graphics is a tool that enables us to concisely describe the components of a graphic
Source: BloggoType
ggplot(data = starwars, mapping = aes(x = height, y = mass)) + geom_point() + labs(title = "Mass vs. height of Starwars characters", x = "Height (cm)", y = "Weight (kg)")
ggplot(data = starwars, mapping = aes(x = height, y = mass)) + geom_point() + labs(title = "Mass vs. height of Starwars characters", x = "Height (cm)", y = "Weight (kg)")
## Warning: Removed 28 rows containing missing values## (`geom_point()`).
+Suppressing warning to subsequent slides to save space
ggplot()
is the main function in ggplot2ggplot(data = [dataset], mapping = aes(x = [x-variable], y = [y-variable])) + geom_xxx() + other options
library(tidyverse)
Do you see anything out of the ordinary?
How are people reporting lower vs. higher values of FB visits?
Keyboard shortcuts
↑, ←, Pg Up, k | Go to previous slide |
↓, →, Pg Dn, Space, j | Go to next slide |
Home | Go to first slide |
End | Go to last slide |
Number + Return | Go to specific slide |
b / m / f | Toggle blackout / mirrored / fullscreen mode |
c | Clone slideshow |
p | Toggle presenter mode |
t | Restart the presentation timer |
?, h | Toggle this help |
Esc | Back to slideshow |