+ - 0:00:00
Notes for current slide
Notes for next slide

Data types



Introduction to Data Science with R and Tidyverse

based on datasciencebox.org

1 / 24

Why should you care about data types?

2 / 24

Example: Cat lovers

A survey asked respondents their name and number of cats. The instructions said to enter the number of cats as a numerical value.

cat_lovers <- read_csv("data/cat-lovers.csv")
## # A tibble: 60 x 3
## name number_of_cats handedness
## <chr> <chr> <chr>
## 1 Bernice Warren 0 left
## 2 Woodrow Stone 0 left
## 3 Willie Bass 1 left
## 4 Tyrone Estrada 3 left
## 5 Alex Daniels 3 left
## 6 Jane Bates 2 left
## # ... with 54 more rows
3 / 24

Oh why won't you work?!

cat_lovers %>%
summarise(mean_cats = mean(number_of_cats))
## Warning: There was 1 warning in `summarise()`.
## i In argument: `mean_cats = mean(number_of_cats)`.
## Caused by warning in `mean.default()`:
## ! argument is not numeric or logical: returning NA
## # A tibble: 1 x 1
## mean_cats
## <dbl>
## 1 NA
4 / 24
?mean

5 / 24

Oh why won't you still work??!!

cat_lovers %>%
summarise(mean_cats = mean(number_of_cats, na.rm = TRUE))
## Warning: There was 1 warning in `summarise()`.
## i In argument: `mean_cats = mean(number_of_cats, na.rm = TRUE)`.
## Caused by warning in `mean.default()`:
## ! argument is not numeric or logical: returning NA
## # A tibble: 1 x 1
## mean_cats
## <dbl>
## 1 NA
6 / 24

Take a breath and look at your data

What is the type of the number_of_cats variable?

glimpse(cat_lovers)
## Rows: 60
## Columns: 3
## $ name <chr> "Bernice Warren", "Woodrow Stone", "Will~
## $ number_of_cats <chr> "0", "0", "1", "3", "3", "2", "1", "1", ~
## $ handedness <chr> "left", "left", "left", "left", "left", ~
7 / 24

Let's take another look

Check out the responses from "Ginger Clark" and "Doug Bass"

8 / 24

Sometimes you might need to babysit your respondents

cat_lovers %>%
mutate(number_of_cats = case_when(
name == "Ginger Clark" ~ 2,
name == "Doug Bass" ~ 3,
TRUE ~ as.numeric(number_of_cats)
)) %>%
summarise(mean_cats = mean(number_of_cats))
## Warning: There was 1 warning in `mutate()`.
## i In argument: `number_of_cats = case_when(...)`.
## Caused by warning:
## ! NAs introduced by coercion
## # A tibble: 1 x 1
## mean_cats
## <dbl>
## 1 0.833
9 / 24

You always need to respect data types

cat_lovers %>%
mutate(
number_of_cats = case_when(
name == "Ginger Clark" ~ "2",
name == "Doug Bass" ~ "3",
TRUE ~ number_of_cats
),
number_of_cats = as.numeric(number_of_cats)
) %>%
summarise(mean_cats = mean(number_of_cats))
## # A tibble: 1 x 1
## mean_cats
## <dbl>
## 1 0.833
10 / 24

Now that we know what we're doing...

cat_lovers <- cat_lovers %>%
mutate(
number_of_cats = case_when(
name == "Ginger Clark" ~ "2",
name == "Doug Bass" ~ "3",
TRUE ~ number_of_cats
),
number_of_cats = as.numeric(number_of_cats)
)
11 / 24

Moral of the story

  • If your data does not behave how you expect it to, type coercion upon reading in the data might be the reason.
  • Go in and investigate your data, apply the fix, save your data, live happily ever after.
12 / 24

...now that we have a good motivation for learning about data types in R


let's learn about data types in R!

13 / 24

Data types

14 / 24

Data types in R

  • logical
  • double
  • integer
  • character
  • and some more, but we won't be focusing on those
15 / 24

Logical & character

logical - boolean values TRUE and FALSE

typeof(TRUE)
## [1] "logical"

character - character strings

typeof("hello")
## [1] "character"
16 / 24

Double & integer

double — floating point numerical values (default numerical type)

typeof(1.335)
## [1] "double"
typeof(7)
## [1] "double"

integer — integer numerical values (indicated with an L)

typeof(7L)
## [1] "integer"
typeof(1:3)
## [1] "integer"
17 / 24

Concatenation

Vectors can be constructed using the c() function.

c(1, 2, 3)
## [1] 1 2 3
c("Hello", "World!")
## [1] "Hello" "World!"
c(c("hi", "hello"), c("bye", "jello"))
## [1] "hi" "hello" "bye" "jello"
18 / 24

Converting between types

with intention...

x <- 1:3
x
## [1] 1 2 3
typeof(x)
## [1] "integer"
19 / 24

Converting between types

with intention...

x <- 1:3
x
## [1] 1 2 3
typeof(x)
## [1] "integer"
y <- as.character(x)
y
## [1] "1" "2" "3"
typeof(y)
## [1] "character"
19 / 24

Converting between types

with intention...

x <- c(TRUE, FALSE)
x
## [1] TRUE FALSE
typeof(x)
## [1] "logical"
20 / 24

Converting between types

with intention...

x <- c(TRUE, FALSE)
x
## [1] TRUE FALSE
typeof(x)
## [1] "logical"
y <- as.numeric(x)
y
## [1] 1 0
typeof(y)
## [1] "double"
20 / 24

Converting between types

without intention...

R will happily convert between various types without complaint when different types of data are concatenated in a vector, and that's not always a great thing!

c(1, "Hello")
## [1] "1" "Hello"
c(FALSE, 3L)
## [1] 0 3
21 / 24

Converting between types

without intention...

R will happily convert between various types without complaint when different types of data are concatenated in a vector, and that's not always a great thing!

c(1, "Hello")
## [1] "1" "Hello"
c(FALSE, 3L)
## [1] 0 3
c(1.2, 3L)
## [1] 1.2 3.0
c(2L, "two")
## [1] "2" "two"
21 / 24

Explicit vs. implicit coercion

Let's give formal names to what we've seen so far:

22 / 24

Explicit vs. implicit coercion

Let's give formal names to what we've seen so far:

  • Explicit coercion is when you call a function like as.logical(), as.numeric(), as.integer(), as.double(), or as.character()
22 / 24

Explicit vs. implicit coercion

Let's give formal names to what we've seen so far:

  • Explicit coercion is when you call a function like as.logical(), as.numeric(), as.integer(), as.double(), or as.character()

  • Implicit coercion happens when you use a vector in a specific context that expects a certain type of vector

22 / 24

Special values

23 / 24

Special values

  • NA: Not available
  • NaN: Not a number
  • Inf: Positive infinity
  • -Inf: Negative infinity
24 / 24

Special values

  • NA: Not available
  • NaN: Not a number
  • Inf: Positive infinity
  • -Inf: Negative infinity
pi / 0
## [1] Inf
0 / 0
## [1] NaN
1/0 - 1/0
## [1] NaN
1/0 + 1/0
## [1] Inf
24 / 24

Why should you care about data types?

2 / 24
Paused

Help

Keyboard shortcuts

, , Pg Up, k Go to previous slide
, , Pg Dn, Space, j Go to next slide
Home Go to first slide
End Go to last slide
Number + Return Go to specific slide
b / m / f Toggle blackout / mirrored / fullscreen mode
c Clone slideshow
p Toggle presenter mode
t Restart the presentation timer
?, h Toggle this help
Esc Back to slideshow