You may change the data type using the following functions, but the chance is that some of the information will be missing. Do this with caution!
x <- piprint(x)
[1] 3.141593
x_int <-as.integer(x)print(x_int)
[1] 3
Some of the conversion functions:
as.integer(): Convert to integer.
as.numeric(): Convert to numeric (float).
as.character(): Convert to character.
as.logical(): Convert to logical (boolean).
as.Date(): Convert to date.
as.factor(): Convert to factor (categorical variable).
as.list(): Convert to list.
as.matrix(): Convert to matrix.
as.data.frame(): Convert to data frame.
as.vector(): Convert to vector.
as.complex(): Convert to complex number.
1.2 Operators
Unary: With only one argument. E.g., -x (negation), !x (logical negation).
Binary: With two arguments. E.g., x + y (addition), x - y (subtraction), x * y (multiplication), x / y (division).
1.2.1 Comparison Operator
Comparing two objects. E.g., x == y (equal), x != y (not equal), x < y (less than), x > y (greater than), x <= y (less than or equal to), x >= y (greater than or equal to).
1.2.2 Logical Operator
Logical operators are used to combine or manipulate logical values (TRUE or FALSE). E.g., x & y (logical AND), x | y (logical OR), !x (logical NOT).
We shall note that the logical operators in R are vectorized, x | y and x || y are different. The former is vectorized, while the latter is not.
x <-c(TRUE, FALSE, FALSE)y <-c(TRUE, FALSE, FALSE)x | y # [1] TRUE FALSE FALSEx || y # This will return an error
1.3 Indexing
Indexing is a way to access or modify specific elements in a data structure. In R, indexing can be done using square brackets [] for vectors and matrices, or the $ operator for data frames. Note that the index starts from 0 in R, which is different from some other programming languages like Python.
1.4 Naming
In R, you can assign names to objects using the names() function. This is useful for making your code more readable and for accessing specific elements in a data structure.
A good practice is to use _ (underscore) to separate words in variable names, e.g., my_variable. This makes the code more readable and easier to understand.
# Assign names to a vectortemp <-c(20, 30, 27, 31, 45)names(temp) <-c("Mon", "Tues", "Wed", "Thurs", "Fri")print(temp)
One may define an array or a matrix in R using the array() or matrix() functions, respectively. An array is a multi-dimensional data structure, while a matrix is a two-dimensional array.
# Create a 1-dimensional arrayarray_1d <-array(1:10, dim =10)array_1d
[1] 1 2 3 4 5 6 7 8 9 10
# Create a 2-dimensional arrayarray_2d <-array(1:12, dim =c(4, 3))array_2d
Key-Value Pair is a data structure that consists of a key and its corresponding value. In R, this can be implemented using named vectors, lists, or data frames. Usually, the most commonly used case is in the lists and data frames. The values can be extra by providing the corresonding key
## Now providing a key - Tues### First waylist_temp[["Tues"]]
[1] 32
### Second waylist_temp$Tues
[1] 32
1.7 Data Frame
Dataframe is a two-dimensional, tabular data structure in R that can hold different types of variables (numeric, character, factor, etc.) in each column. It is similar to a spreadsheet or SQL table.
The tidyverse is a collection of open source packages for the R programming language introduced by Hadley Wickham and his team that “share an underlying design philosophy, grammar, and data structures” of tidy data. Characteristic features of tidyverse packages include extensive use of non-standard evaluation and encouraging piping.
## Load all tidyverse packageslibrary(tidyverse)## Or load specific packages in the tidy familylibrary(dplyr) # Data manipulationlibrary(ggplot2) # Data visualizationlibrary(readr) # Data importlibrary(tibble) # Tidy data frameslibrary(tidyr) # Data tidying# ...
1.9 Pipe
Pipe operator |> (native after R version 4.0) or %>$ (from magrittr package) is a powerful tool in R that allows you to chain together multiple operations in a clear and concise way. It takes the output of one function and passes it as the first argument to the next function.
For example, we can write
set.seed(777)x <-rnorm(5)## Without using pipeprint(round(mean(x), 2))
[1] 0.37
## Using pipex |>mean() |># applying the mean functionround(2) |>#round to 2nd decimal placeprint()
[1] 0.37
We can see that, without using the pipe, if we are applying multiple functions to the same object, we may have hard time to track. This can make the code less readable and harder to maintain. On the other hand, using pipe, we can clearly see the sequence of operations being applied to the data, making it easier to understand and modify.
1.9.1 Some rules
|> should always have a space before it and should typically be the last thing on a line. This simplifies adding new steps, reorganizing existing ones, and modifying elements within each step.
Note that all of the packages in the tidyverse family support the pipe operator (except ggplot2!), so you can use it with any of them.