### Data Types and Basic Operations

Data Types

- R has five basic or “atomic” classes of objects:

- character
- numeric (real numbers)
- integer
- complex
- logical (True/False)

- A vector can only contain objects of the same class. A matrix is a vector with dimension attribute (nrow, ncol).
- A List can contain objects of different classes. A Data frame is used to store tabular data and can store different classes of objects in each column
- Factors are used to represent categorical data.
- Missing values are denoted by NA or NaN for undefined mathematical operations. Function is.na() is used to test objects if they are NA while is.nan() is used to test for NaN. NA values have a class also, so there are integer NA, character NA, etc.

Attributes

R objects can have attributes

- names, dimnames
- dimensions (e.g. matrices, arrays)
- class
- length
- other user-defined attributes/metadata

Attributes of an object can be accessed using the attributes() function.

Entering Input

At the R prompt we type expressions. The <- symbol is the assignment operator.

> x <- 1 > print(x) [1] 1 > x [1] 1 > msg <- "hello"

Creating Vectors

The c() function can be used to create vectors of objects by concatenating things together.

> x <- c(0.5, 0.6) ## numeric > x <- c(TRUE, FALSE) ## logical > x <- c(T, F) ## logical > x <- c("a", "b", "c") ## character > x <- 9:29 ## integer > x <- c(1+0i, 2+4i) ## complex

You can also use the vector() function to initialize an empty vector.

> x <- vector("numeric", length = 10) > x [1] 0 0 0 0 0 0 0 0 0 0

Explicit Coercion

Objects can be explicitly coerced from one class to another using the as.* functions, if available.

> x <- 0:6 > class(x) [1] "integer" > as.numeric(x) [1] 0 1 2 3 4 5 6 > as.logical(x) [1] FALSE TRUE TRUE TRUE TRUE TRUE TRUE > as.character(x) [1] "0" "1" "2" "3" "4" "5" "6"

Sometimes, R can’t figure out how to coerce an object and this can result in NAs being produced.

> x <- c("a", "b", "c") > as.numeric(x) Warning: NAs introduced by coercion [1] NA NA NA > as.logical(x) [1] NA NA NA > as.complex(x) Warning: NAs introduced by coercion [1] NA NA NA

Matrices are constructed column-wise, so entries can be thought of starting in the “upper left” corner and running down the columns.

> m <- matrix(1:6, nrow = 2, ncol = 3) > m [,1] [,2] [,3] [1,] 1 3 5 [2,] 2 4 6

Matrices can be created by column-binding or row-binding with the cbind() and rbind() functions.

> x <- 1:3 > y <- 10:12 > cbind(x, y) x y [1,] 1 10 [2,] 2 11 [3,] 3 12 > rbind(x, y) [,1] [,2] [,3] x 1 2 3 y 10 11 12

Factors

Factors are used to represent categorical data and can be unordered or ordered. One can think of a factor as an integer vector where each integer has a label. Using factors with labels is better than using integers because factors are self-describing. Having a variable that has values “Male” and “Female” is better than a variable that has values 1 and 2.

Factor objects can be created with the factor() function.

> x <- factor(c("yes", "yes", "no", "yes", "no")) > x [1] yes yes no yes no Levels: no yes > table(x) x no yes 2 3 > ## See the underlying representation of factor > unclass(x) [1] 2 2 1 2 1 attr(,"levels") [1] "no" "yes"

Missing Values

Missing values are denoted by NA or NaN for q undefined mathematical operations.

• is.na() is used to test objects if they are NA

• is.nan() is used to test for NaN

• NA values have a class also, so there are integer NA, character NA, etc.

• A NaN value is also NA but the converse is not true

> ## Create a vector with NAs in it > x <- c(1, 2, NA, 10, 3) > ## Return a logical vector indicating which elements are NA > is.na(x) [1] FALSE FALSE TRUE FALSE FALSE > ## Return a logical vector indicating which elements are NaN > is.nan(x) [1] FALSE FALSE FALSE FALSE FALSE > ## Now create a vector with both NA and NaN values > x <- c(1, 2, NaN, NA, 4) > is.na(x) [1] FALSE FALSE TRUE TRUE FALSE > is.nan(x) [1] FALSE FALSE TRUE FALSE FALSE

Data Frames

Data frames are used to store tabular data in R. Data frames are represented as a special type of list where every element of the list has to have the same length. Each element of the list can be thought of as a column and the length of each element of the list is the number of rows. Unlike matrices, data frames can store different classes of objects in each column.

> x <- data.frame(foo = 1:4, bar = c(T, T, F, F)) > x foo bar 1 1 TRUE 2 2 TRUE 3 3 FALSE 4 4 FALSE > nrow(x) [1] 4 > ncol(x) [1] 2