ONCOSTAT & BBE Team
Training course R

Module 1 : Introduction of programming language R and the RStudio interface

Nusaïbah IBRAHIMI

Friday, June 20, 2025

Table of contents

  • Interface RStudio

  • Operator

  • Object

  • Function

R you ready?

Interface RStudio

First look at RStudio

First look at RStudio

The Console

The Console

Difference between SAS and R

No need to put ; at the end of each instruction.

In case of error, the code is blocked.

The Console

1+2
#> [1] 3
result <- 1 + 2       # creation of a variable called result and equal to 3
result                # to display the value of the variable result
#> [1] 3
result = result + 1   # change the value. it smash the previous value
print(result)         # another way to display the value with the function print()
#> [1] 4

Assignment operators

<- or =

Warning

The code in the Console is unsaved after closing RStudio.

Hence, we suggest to save it in R script.

The Environment

The Environment

Tip

Think of the global environment as our workspace (the Work folder in SAS)

The Environment

With the function rm(). List between the parenthesis the object names separated by a comma.

ls() # list all existing object in the current working directory (environment)
#> [1] "result"
rm(result) # rm() function removes object from the environment
ls()
#> character(0)

R script

R script

# script .R 
toto = c(1,2,3,4,5) # function c() forms a vector of elements of a common type 
toto1 = 1:5         # creates a sequence of integers from 1 to 5
toto2 = seq( from = 1, to = 5, by = 1)  # function seq() creates a sequence of number. The 3rd argument is the increment of the sequence

To execute the code line(s), CTRL+ENTER on the interest line(s) or click on the button Run

To run all code, CTRL+A for selection then CTRL+ENTER to execute, or click on the button Source

R script

Tips to make your code readable: (1/2)

  • comment or document your code by using # or CTRL + SHIFT + C

    • # only is for comments

    • # Title1 ---- a title of level 1 or CTRL + SHIFT + R

    • ## Title2 ---- a title of level 2

    • ### Title3 ---- a title of level 3

      The outline is displayed next to the Source item.

R script

Tips to make your code readable: (2/2)

  • separate elements of your code with white-space

  • use informative and concise names

  • use indentation by using CTRL + I

    Tip

    Apply often the 3 following steps:

    1. select all, CTRL + A;

    2. indent the code, CTRL + I;

    3. save the code, CTRL + S.

R script

When the code is running, the code lines and the outputs are displayed in the Console.

For some particular outputs, it is also displayed on the Viewer pane or the Plots pane or the Source pane.

toto == toto1  # == is a logical operator to verify the equality
#> [1] TRUE TRUE TRUE TRUE TRUE

toto1[3:5] # the hooks enables to display the elements of a vector
#> [1] 3 4 5

toto2[6] # toto2 = c(1, 2, 3, 4, 5)
#> [1] NA

plot(x = toto, y = 2* toto1)

R script

# creation of a new object: a dataframe 
# dataframe is a data structure constructed with rows and columns
population=data.frame(SUBJID=1:5,
                      AGE=sample(25:60,size = 5,replace = TRUE), # sampling with function sample()
                      PAYS=c("FR","ESP","UK","UK","MAR")) # vector of characters
population # displayed on the Console
#>   SUBJID AGE PAYS
#> 1      1  36   FR
#> 2      2  37  ESP
#> 3      3  44   UK
#> 4      4  52   UK
#> 5      5  44  MAR
View(population) # overview of the table in a new window

population %>% 
  crosstable(cols = c("AGE","PAYS")) %>% 
  af()

value

AGE

Min / Max

36.0 / 52.0

Med [IQR]

44.0 [37.0;44.0]

Mean (std)

42.6 (6.5)

N (NA)

5 (0)

PAYS

ESP

1 (20.00%)

FR

1 (20.00%)

MAR

1 (20.00%)

UK

2 (40.00%)

Plots & Viewer

Plots & Viewer

The figures and tables are displayed in the Plots tab or Viewer tab.

From the tab, you can zoom in or save it as PDF or as Image.

Packages

Main packages to know

Packages are shareable collections of code, data and documentation.

They can be on the CRAN (Comprehensive R Archive Network) or they can be download from Github or as .zip folder.

One of the most popular: tidyverse. It is a collection of R packages designed for working with data.

The most common “core” tidyverse packages are:

  • readr, for data import;

  • ggplot2, for data visualization;

  • dplyr, for data manipulation;

  • tidyr, for data tidying;

  • purrr, for functional programming;

  • tibble, for tibbles, a modern re-imagining of dataframes;

  • stringr, for string manipulation;

  • forcats, for working with factors (categorical data).

Installation of a package

Note

A package only needs to be installed once. Then, it is stocked in your RStudio. We can find it in the Packages tab.

code line (script .R or in the Console):

  • install.packages() function if the package is on the CRAN

  • install_github() function if the package is from Github

click-bottom:

  1. click on the box Install in the Packages tab

  2. specify if it is from the CRAN or package archive file (.zip, .tar.gz) if it is not available on the CRAN

  3. write the package name

  4. click on Install

Loading a package

To load a package:

  • you research for package name in the Packages tab and check the box next to it.

  • otherwise, you run the code library("package_name").

install.packages("stringr")
str_trim(string = " Paris, the City of Love  ",side = "both")
#> [1] "Paris, the City of Love"
library("stringr") # manipulation of characters
str_trim(string = " Paris, the City of Love  ",side = "both")
#> [1] "Paris, the City of Love"
str_squish(string = " Paris,  the City of Love   and   the City of Light ")
#> [1] "Paris, the City of Love and the City of Light"

Warning

Packages need to be loaded in each R session.

Help

Help on a package

To get the package documentation:

Click on the package name, in the Packages tab.

It takes us to the Help tab.

Alternatively,

Type this command into the Console

help(package = "stringr")

Help on a package

Help on a package

vignette(package = "stringr") # displays the list of vignettes from the package

Help on a function

To have the help of a function,

  • from Packages tab, click on the function name to see the help file

  • type the function name on the search toolbar in the Help tab

  • run one of the following code line in the Console

help("str_to_lower") 
help(str_to_lower) 
?str_to_lower

Help on a function

Warning

2 packages may have a function with the same call, but different argument.

Tip

To avoid the conflict, you precise the package name followed by :: and the name of the function.

Example:

dplyr::select() # select variables of a data table 
MASS::select() # fit a linear model by ridge regression

Operator

Comparison operators

Description SAS R
Less than < <
Greater then > >
Less than or equal to <= <=
Greater than or equal to >= >=
Equal = ==
Different ^=, ne !=
Not x not(x) !x
Belongs to in %in%

Logical operators

Description SAS R
AND & &
OR | |
Negation ^ !

Arithmetic operators

Description SAS R
Usual operators +, -, *, / +, -, *, /
Exponent ** ^
Minimum a<>b or Min(a, b) min(a,b)
Maximum a><b or Max(a, b) max(a,b)

Old way of exponent in R, is **

Object

Creation of object

Object are containers for storing values.

To assign a value to an object, use the assignment operators <- or = .

To print the value of the object, just type its name in the Console.

# we create 2 numerical objects
term1 = 50
term2 <- 60

sum <- term1 + term2 # same as using function sum() such as sum(term1, term2)

sum # or print(sum)
#> [1] 110

Type of objects

Important

Data type is an important concept. Object can store data of different types and different types can do different things.

3 main types of data:

  • vector;

  • dataframe;

  • list.

Vector

Object of 1 dimension stocking elements of same type.

There are many ways to create vectors, but the usual one is to use the command c().

vec_num = c(1,2,3,4,5) # double
vec_num
#> [1] 1 2 3 4 5
vec_num_same = 1:5 # integer
print(vec_num_same)
#> [1] 1 2 3 4 5
vec_char = c("Toulouse", "Paris", "Bordeaux", "Lyon", 'Marseille')  # character in quotation marks (simple or double)
vec_seq = seq(from = 0, to = 20, by = 2.5)  # function seq() generates a sequence
vec_rep = rep(x = c("I","You","He/She/It", "We", "You", "They"), 
              times = 3) # function rep() replicates the values in x times
vec_rep
#>  [1] "I"         "You"       "He/She/It" "We"        "You"       "They"      "I"         "You"      
#>  [9] "He/She/It" "We"        "You"       "They"      "I"         "You"       "He/She/It" "We"       
#> [17] "You"       "They"
vec_factor = factor(x = c(1,1,0,0,1),
                    levels = c(0,1), 
                    labels = c("Alive","Dead")) # factor
levels(vec_factor)
#> [1] "Alive" "Dead"
vec_factor[2]="Death"
vec_factor[2]="Dead"
is.vector(vec_char)  # function returns a logical value: TRUE                  
#> [1] TRUE
# if the argument is a vector, FALSE otherwise   

Vector

Each item is indexed from 1 to the number of items of the vector.

The function length() computes the length of the vector.

To select an item, you have to use the hooks [] and note the index of the item.

vec = seq(from = 0, to = 20, by = 5) # 0 5 10 15 20
length(vec)  
#> [1] 5
vec[2] 
#> [1] 5
vec[-3] # prints the vector without the third item
#> [1]  0  5 15 20
vec[c(1,2,4,5)] # prints the elements of index 1, 2, 4 and 5 of the vector
#> [1]  0  5 15 20
vec[3] = 100 # you can change the value of a vector
vec
#> [1]   0   5 100  15  20
vec[6] = 25 # you can add a new item by assigning to a new value
vec
#> [1]   0   5 100  15  20  25
vec[11] = 50
vec 
#>  [1]   0   5 100  15  20  25  NA  NA  NA  NA  50

Vector

Note

NA: missing value (Not Available) in a variable or as the result of a function

NaN: Not a Number when a function attempts to divide by 0

NULL: empty

-Inf, Inf: positive or negative infini when a function diverges

Caution

The names of these particular values are “reserved” by R, and cannot be used as variable names. This is also the case for booleans TRUE, FALSE.

is.na(vec) # function is.na() displays if each item of vec is NA.
#>  [1] FALSE FALSE FALSE FALSE FALSE FALSE  TRUE  TRUE  TRUE  TRUE FALSE

# TRUE=8  #ERROR Message
100/0
#> [1] Inf
c()  # empty vector 
#> NULL

Vector

Warning

The vector contains only elements of the same type.

If a character value is added in a numerical vector, hence the vector is converted into character vector.

print(vec)
#>  [1]   0   5 100  15  20  25  NA  NA  NA  NA  50
vec[5] ="last"

Vector

Exercise

vec_char = c("Toulouse", "PSG", "Bordeaux", "Lyon", "Marseille","Londres")  

length(vec_char) # what is the result?

# what the below command displays?
vec_char[5] 

# what command enables to change the item PSG by Paris

# create a derived vector without the last item. Display it in 2 ways.

# what the below command displays?
vec_char[8]

Vector

Correction

vec_char = c("Toulouse", "PSG", "Bordeaux", "Lyon", "Marseille","Londres")  

length(vec_char) # what is the result?
#> [1] 6

# what the below command displays?
vec_char[5] 
#> [1] "Marseille"

# what command enables to change the item PSG by Paris
vec_char[2]="Paris"

# create a derived vector without the last item. Display it in 2 ways.
vec_derived1 = vec_char[-6]
vec_derived1
#> [1] "Toulouse"  "Paris"     "Bordeaux"  "Lyon"      "Marseille"

vec_derived2 = vec_char[c(1,2,3,4,5)]
print(vec_derived2)
#> [1] "Toulouse"  "Paris"     "Bordeaux"  "Lyon"      "Marseille"

# what the below command displays?
vec_char[8]
#> [1] NA

Dataframe

An object of 2D table of variables of multiples types.

To create a dataframe, use the function data.frame().

For an overview of the table:

  • View()which gives you access to the table;

  • str() which lists the attributes of the dataframe.

dim() returns the dimension (row column) of the dataframe.

nrow() and ncol() return respectively the number of lines and columns.

df = data.frame("name" = c("Albert Einstein","Martin Luther","Nelson Mandela","William Shakespeare", "Ludwig van Beethoven"),
                "gender" = rep(x = 1, times = 5))
df
#>                   name gender
#> 1      Albert Einstein      1
#> 2        Martin Luther      1
#> 3       Nelson Mandela      1
#> 4  William Shakespeare      1
#> 5 Ludwig van Beethoven      1
dim(df)
#> [1] 5 2

is.data.frame(df) # function returns a logical value: TRUE if is a dataframe, FALSE otherwise
#> [1] TRUE
View(df) # overview of the dataframe
str(df)
#> 'data.frame':    5 obs. of  2 variables:
#>  $ name  : chr  "Albert Einstein" "Martin Luther" "Nelson Mandela" "William Shakespeare" ...
#>  $ gender: num  1 1 1 1 1

Note

Remind that the created objects are saved in the global environment.

Dataframe

To select an item, use the hooks [,] and note the index of the line number and the column number or the name of the variable, separated by a comma, because there are 2 dimensions to indicate.

To select a row (or observation), use the hooks [,] by assigning the index of the row, followed by a comma.

To select a column (a variable), use:

  • the dollar symbol $ followed the name of the variable;

  • or the hooks [,] by assigning the name of the variable in quotes or the index of the column preceded by a comma.

df[,"name"] # "Albert Einstein","Martin Luther","Nelson Mandela","William Shakespeare", "Ludwig van Beethoven"
#> [1] "Albert Einstein"      "Martin Luther"        "Nelson Mandela"       "William Shakespeare" 
#> [5] "Ludwig van Beethoven"

df$gender = factor(x = df$gender, levels = c(0,1), labels = c("Woman", "Man"))

df[1,3] # "Nelson Mandela"
#> NULL
df[1,] 
#>              name gender
#> 1 Albert Einstein    Man
# "Albert Einstein" "Man"

df[df$name!= "Martin Luther" & df$name!="Nelson Mandela",]
#>                   name gender
#> 1      Albert Einstein    Man
#> 4  William Shakespeare    Man
#> 5 Ludwig van Beethoven    Man

Dataframe

To add a new variable, use hooks [,] by indicating the name of the new variable or the dollar sign $ followed by the name.

To add a new observation, you can use the [,] by knowing the last observation number, and put the vector of elements.

bind_rows() combines a sequence of vectors or dataframes by rows.

# Adding the city of birth

df$countryBirth = c("Germany","USA","South-Africa","UK","Germany")

df[,"countryBirth"]
#> [1] "Germany"      "USA"          "South-Africa" "UK"           "Germany"

# Adding a new observation

df[6,] = c("Marie Curie", "Woman", "France")
nrow(df) 
#> [1] 6

df = bind_rows(df, 
               c("name" = "Malala Yousafzai", "gender" = "Woman", "countryBirth" = "Pakistan"))

Caution

When you use the [,], you have to call the variables in quotes. Whereas with the $ it is not necessary.

List

Lists are special objects whose elements can be of any kind (including other lists).

To create a list, use list() and write the individual elements between the parenthesis.

listing = list("object1"=vec_char,
               "object2"=df)

For an overview of the list, use the function str().

str(listing)
#> List of 2
#>  $ object1: chr [1:6] "Toulouse" "Paris" "Bordeaux" "Lyon" ...
#>  $ object2:'data.frame': 7 obs. of  3 variables:
#>   ..$ name        : chr [1:7] "Albert Einstein" "Martin Luther" "Nelson Mandela" "William Shakespeare" ...
#>   ..$ gender      : chr [1:7] "Man" "Man" "Man" "Man" ...
#>   ..$ countryBirth: chr [1:7] "Germany" "USA" "South-Africa" "UK" ...

List

To access an element of the list, you have many ways:

# by position number, using the double hooks `[[]]` 
listing[[2]] 
#>                   name gender countryBirth
#> 1      Albert Einstein    Man      Germany
#> 2        Martin Luther    Man          USA
#> 3       Nelson Mandela    Man South-Africa
#> 4  William Shakespeare    Man           UK
#> 5 Ludwig van Beethoven    Man      Germany
#> 6          Marie Curie  Woman       France
#> 7     Malala Yousafzai  Woman     Pakistan

# by name, using the double hooks `[[]]`  
listing[["object1"]] 
#> [1] "Toulouse"  "Paris"     "Bordeaux"  "Lyon"      "Marseille" "Londres"

# by name, using the operator `$`(commun method) 
listing$object1
#> [1] "Toulouse"  "Paris"     "Bordeaux"  "Lyon"      "Marseille" "Londres"

Function

Usual functions

min(), max(), mean(), median() – return the minimum / maximum / mean / median value of a numeric vector

sum() – returns the sum of a numeric vector

range() – returns the minimum and maximum values of a numeric vector

abs() – returns the absolute value of a number

str() – shows the structure of an R object

print() – displays an R object on the console

head() – displays the first rows (by default the 6 first ones) of a table

ncol(), nrow() – returns the number of columns or rows of a matrix or a dataframe

colnames() – returns

length() – returns the number of items in an R object (a vector, a list, etc.)

cat– concatenate and print

nchar() – returns the number of characters in a character object

sort() – sorts a vector in ascending or descending (decreasing=TRUE) order

exists() – returns TRUE or FALSE depending on whether or not a variable is defined in the R environment.

Creation of a new function

Function name

This is the name of the function object that will be stored in the R environment after the function definition and used for calling that function.

It should be concise but clear and meaningful so that the user who reads our code can easily understand what exactly this function does.

Good Practice

Variable and function names should be lowercase.

Use an underscore _ to separate words within a name.

Generally, variable names should be nouns and function names should be verbs.

Warning

A name can not begin with a number

# Good
day_one
day_1

# Bad
first_day_of_the_month   # too long 
DayOne  # uppercase in the middle
dayone  # misunderstanding
djm1  #not clear

Function parameters (arguments)

The arguments are the variables in the function definition placed inside the parentheses and separated with a comma that will be set to actual values each time we call the function.

quadratic_polynomial <- function(x, a, b, c){
  y = a * x^2 + b*x + c
  return(y)
}
quadratic_polynomial(x = 1, a = 2, b = 3, c = 4)
#> [1] 9

It’s possible, even though rarely useful, for a function to have no parameters.

iteration <- function(){
  print("New iteration")
}

iteration()
#> [1] "New iteration"

Function parameters (arguments)

Also, some parameters can be set to default values inside the function definition, which then can be reset when calling the function.

quadratic_polynomial <- function(x, a=2, b=1, c=10){
  y = a * x^2 + b*x + c
  return(y)
}
quadratic_polynomial(x = -1, a = 3) #b and c is defined by default.
#> [1] 12

Good practice

Clearly assign a value to each argument.

Function body

The function body is a set of commands inside the curly braces {} that are run in a predefined order every time we call the function.

prop_tab <- function(v, decimales=0.1) {
  tri <- table(v)
  effectif_total <- length(v)
  tri <- tri / effectif_total * 100
  tri <- round(tri, decimales)
 tri
}
vec <- c("rouge", "vert", "vert", "bleu", "rouge")
prop_tab(vec)
#> v
#>  bleu rouge  vert 
#>    20    40    40

It usually isn’t necessary to explicitly include the return statement when defining a function.

Function body

But this becomes inevitable if we need to return more than one result from a function.

 check_sqrt<- function(x){
   if (x >= 0){    # no warning
     value = paste("The value is positive",x, sep = ":")
     result = sqrt(x)
   }else {
     value = paste0("Warning! the value is negative:",x)
     result = NaN
   }
   output = list("note" = value, "result" = result)
   return(output)
}
 
check_sqrt(x = 100)
#> $note
#> [1] "The value is positive:100"
#> 
#> $result
#> [1] 10

Important

The return() function can return only a single R object.

Output

To call a function: put function name and add the necessary arguments inside the parenthesis.

When calling a function, we usually assign the result of this operation to an object, to be able to use it later.

x = check_sqrt(x = 25)  # x is a list of a character (note) and the value (result)

quadratic_polynomial(x$result,a = -1, b = -1, c = 4) 
#> [1] -26

We R done