Introduction to R Programming

Data Types and Basic Operations

Data Types

  • R has five basic or “atomic” classes of objects:
  1. character
  2. numeric (real numbers)
  3. integer
  4. complex
  5. logical (True/False)
  • A vector can only contain objects of the same class. A matrix is a vector with dimension attribute (nrow, ncol).
  • A List can contain objects of different classes. A Data frame is used to store tabular data and can store different classes of objects in each column
  • Factors are used to represent categorical data.
  • Missing values are denoted by NA or NaN for undefined mathematical operations. Function is.na() is used to test objects if they are NA while is.nan() is used to test for NaN. NA values have a class also, so there are integer NA, character NA, etc.

Attributes

R objects can have attributes

  • names, dimnames
  • dimensions (e.g. matrices, arrays)
  • class
  • length
  • other user-defined attributes/metadata

Attributes of an object can be accessed using the attributes() function.

Entering Input

At the R prompt we type expressions. The <- symbol is the assignment operator.

Creating Vectors

The c() function can be used to create vectors of objects by concatenating things together.

You can also use the vector() function to initialize an empty vector.

Explicit Coercion

Objects can be explicitly coerced from one class to another using the as.* functions, if available.

Sometimes, R can’t figure out how to coerce an object and this can result in NAs being produced.

Matrices are constructed column-wise, so entries can be thought of starting in the “upper left” corner and running down the columns.

Matrices can be created by column-binding or row-binding with the cbind() and rbind() functions.

Factors

Factors are used to represent categorical data and can be unordered or ordered. One can think of a factor as an integer vector where each integer has a label. Using factors with labels is better than using integers because factors are self-describing. Having a variable that has values “Male” and “Female” is better than a variable that has values 1 and 2.

Factor objects can be created with the factor() function.

Missing Values

Missing values are denoted by NA or NaN for q undefined mathematical operations.

• is.na() is used to test objects if they are NA

• is.nan() is used to test for NaN

• NA values have a class also, so there are integer NA, character NA, etc.

• A NaN value is also NA but the converse is not true

Data Frames

Data frames are used to store tabular data in R. Data frames are represented as a special type of list where every element of the list has to have the same length. Each element of the list can be thought of as a column and the length of each element of the list is the number of rows. Unlike matrices, data frames can store different classes of objects in each column.

Be the first to comment

Leave a Reply

Your email address will not be published.


*