Module 1 : Introduction of programming language R and the RStudio interface
Nusaïbah IBRAHIMI
Friday, June 20, 2025
Table of contents
Interface RStudio
Operator
Object
Function
R you ready?
Interface RStudio
First look at RStudio
First look at RStudio
The Console
The Console
Difference between SAS and R
No need to put ; at the end of each instruction.
In case of error, the code is blocked.
The Console
1+2#> [1] 3result <-1+2# creation of a variable called result and equal to 3result # to display the value of the variable result#> [1] 3result = result +1# change the value. it smash the previous valueprint(result) # another way to display the value with the function print()#> [1] 4
Assignment operators
<- or =
Warning
The code in the Console is unsaved after closing RStudio.
Hence, we suggest to save it in R script.
The Environment
The Environment
Tip
Think of the global environment as our workspace (the Work folder in SAS)
The Environment
With the function rm(). List between the parenthesis the object names separated by a comma.
ls() # list all existing object in the current working directory (environment)#> [1] "result"rm(result) # rm() function removes object from the environmentls()#> character(0)
R script
R script
# script .R toto =c(1,2,3,4,5) # function c() forms a vector of elements of a common type toto1 =1:5# creates a sequence of integers from 1 to 5toto2 =seq( from =1, to =5, by =1) # function seq() creates a sequence of number. The 3rd argument is the increment of the sequence
To execute the code line(s), CTRL+ENTER on the interest line(s) or click on the button Run
To run all code, CTRL+A for selection then CTRL+ENTER to execute, or click on the button Source
R script
Tips to make your code readable: (1/2)
comment or document your code by using # or CTRL + SHIFT + C
# only is for comments
# Title1 ---- a title of level 1 or CTRL + SHIFT + R
## Title2 ---- a title of level 2
### Title3 ---- a title of level 3
The outline is displayed next to the Source item.
R script
Tips to make your code readable: (2/2)
separate elements of your code with white-space
use informative and concise names
use indentation by using CTRL + I
Tip
Apply often the 3 following steps:
select all, CTRL + A;
indent the code, CTRL + I;
save the code, CTRL + S.
R script
When the code is running, the code lines and the outputs are displayed in the Console.
For some particular outputs, it is also displayed on the Viewer pane or the Plots pane or the Source pane.
toto == toto1 # == is a logical operator to verify the equality#> [1] TRUE TRUE TRUE TRUE TRUEtoto1[3:5] # the hooks enables to display the elements of a vector#> [1] 3 4 5toto2[6] # toto2 = c(1, 2, 3, 4, 5)#> [1] NAplot(x = toto, y =2* toto1)
R script
# creation of a new object: a dataframe # dataframe is a data structure constructed with rows and columnspopulation=data.frame(SUBJID=1:5,AGE=sample(25:60,size =5,replace =TRUE), # sampling with function sample()PAYS=c("FR","ESP","UK","UK","MAR")) # vector of characterspopulation # displayed on the Console#> SUBJID AGE PAYS#> 1 1 36 FR#> 2 2 37 ESP#> 3 3 44 UK#> 4 4 52 UK#> 5 5 44 MARView(population) # overview of the table in a new window
population %>%crosstable(cols =c("AGE","PAYS")) %>%af()
value
AGE
Min / Max
36.0 / 52.0
Med [IQR]
44.0 [37.0;44.0]
Mean (std)
42.6 (6.5)
N (NA)
5 (0)
PAYS
ESP
1 (20.00%)
FR
1 (20.00%)
MAR
1 (20.00%)
UK
2 (40.00%)
Plots & Viewer
Plots & Viewer
The figures and tables are displayed in the Plots tab or Viewer tab.
From the tab, you can zoom in or save it as PDF or as Image.
Packages
Main packages to know
Packages are shareable collections of code, data and documentation.
They can be on the CRAN (Comprehensive R Archive Network) or they can be download from Github or as .zip folder.
One of the most popular: tidyverse. It is a collection of R packages designed for working with data.
The most common “core” tidyverse packages are:
readr, for data import;
ggplot2, for data visualization;
dplyr, for data manipulation;
tidyr, for data tidying;
purrr, for functional programming;
tibble, for tibbles, a modern re-imagining of dataframes;
stringr, for string manipulation;
forcats, for working with factors (categorical data).
Installation of a package
Note
A package only needs to be installed once. Then, it is stocked in your RStudio. We can find it in the Packages tab.
code line (script .R or in the Console):
install.packages() function if the package is on the CRAN
install_github() function if the package is from Github
click-bottom:
click on the box Install in the Packages tab
specify if it is from the CRAN or package archive file (.zip, .tar.gz) if it is not available on the CRAN
write the package name
click on Install
Loading a package
To load a package:
you research for package name in the Packages tab and check the box next to it.
otherwise, you run the code library("package_name").
install.packages("stringr")str_trim(string =" Paris, the City of Love ",side ="both")#> [1] "Paris, the City of Love"library("stringr") # manipulation of charactersstr_trim(string =" Paris, the City of Love ",side ="both")#> [1] "Paris, the City of Love"str_squish(string =" Paris, the City of Love and the City of Light ")#> [1] "Paris, the City of Love and the City of Light"
Warning
Packages need to be loaded in each R session.
Help
Help on a package
To get the package documentation:
Click on the package name, in the Packages tab.
It takes us to the Help tab.
Alternatively,
Type this command into the Console
help(package ="stringr")
Help on a package
Help on a package
vignette(package ="stringr") # displays the list of vignettes from the package
Help on a function
To have the help of a function,
from Packages tab, click on the function name to see the help file
type the function name on the search toolbar in the Help tab
2 packages may have a function with the same call, but different argument.
Tip
To avoid the conflict, you precise the package name followed by :: and the name of the function.
Example:
dplyr::select() # select variables of a data table MASS::select() # fit a linear model by ridge regression
Operator
Comparison operators
Description
SAS
R
Less than
<
<
Greater then
>
>
Less than or equal to
<=
<=
Greater than or equal to
>=
>=
Equal
=
==
Different
^=, ne
!=
Not x
not(x)
!x
Belongs to
in
%in%
Logical operators
Description
SAS
R
AND
&
&
OR
|
|
Negation
^
!
Arithmetic operators
Description
SAS
R
Usual operators
+, -, *, /
+, -, *, /
Exponent
**
^
Minimum
a<>b or Min(a, b)
min(a,b)
Maximum
a><b or Max(a, b)
max(a,b)
Old way of exponent in R, is **
Object
Creation of object
Object are containers for storing values.
To assign a value to an object, use the assignment operators <- or = .
To print the value of the object, just type its name in the Console.
# we create 2 numerical objectsterm1 =50term2 <-60sum <- term1 + term2 # same as using function sum() such as sum(term1, term2)sum # or print(sum)#> [1] 110
Type of objects
Important
Data type is an important concept. Object can store data of different types and different types can do different things.
3 main types of data:
vector;
dataframe;
list.
Vector
Object of 1 dimension stocking elements of same type.
There are many ways to create vectors, but the usual one is to use the command c().
vec_num =c(1,2,3,4,5) # doublevec_num#> [1] 1 2 3 4 5vec_num_same =1:5# integerprint(vec_num_same)#> [1] 1 2 3 4 5vec_char =c("Toulouse", "Paris", "Bordeaux", "Lyon", 'Marseille') # character in quotation marks (simple or double)vec_seq =seq(from =0, to =20, by =2.5) # function seq() generates a sequencevec_rep =rep(x =c("I","You","He/She/It", "We", "You", "They"), times =3) # function rep() replicates the values in x timesvec_rep#> [1] "I" "You" "He/She/It" "We" "You" "They" "I" "You" #> [9] "He/She/It" "We" "You" "They" "I" "You" "He/She/It" "We" #> [17] "You" "They"vec_factor =factor(x =c(1,1,0,0,1),levels =c(0,1), labels =c("Alive","Dead")) # factorlevels(vec_factor)#> [1] "Alive" "Dead"vec_factor[2]="Death"vec_factor[2]="Dead"
is.vector(vec_char) # function returns a logical value: TRUE #> [1] TRUE# if the argument is a vector, FALSE otherwise
Vector
Each item is indexed from 1 to the number of items of the vector.
The function length() computes the length of the vector.
To select an item, you have to use the hooks [] and note the index of the item.
vec =seq(from =0, to =20, by =5) # 0 5 10 15 20length(vec) #> [1] 5vec[2] #> [1] 5vec[-3] # prints the vector without the third item#> [1] 0 5 15 20vec[c(1,2,4,5)] # prints the elements of index 1, 2, 4 and 5 of the vector#> [1] 0 5 15 20vec[3] =100# you can change the value of a vectorvec#> [1] 0 5 100 15 20vec[6] =25# you can add a new item by assigning to a new valuevec#> [1] 0 5 100 15 20 25vec[11] =50vec #> [1] 0 5 100 15 20 25 NA NA NA NA 50
Vector
Note
NA: missing value (Not Available) in a variable or as the result of a function
NaN: Not a Number when a function attempts to divide by 0
NULL: empty
-Inf, Inf: positive or negative infini when a function diverges
Caution
The names of these particular values are “reserved” by R, and cannot be used as variable names. This is also the case for booleans TRUE, FALSE.
is.na(vec) # function is.na() displays if each item of vec is NA.#> [1] FALSE FALSE FALSE FALSE FALSE FALSE TRUE TRUE TRUE TRUE FALSE# TRUE=8 #ERROR Message100/0#> [1] Infc() # empty vector #> NULL
Vector
Warning
The vector contains only elements of the same type.
If a character value is added in a numerical vector, hence the vector is converted into character vector.
print(vec)#> [1] 0 5 100 15 20 25 NA NA NA NA 50vec[5] ="last"
Vector
Exercise
vec_char =c("Toulouse", "PSG", "Bordeaux", "Lyon", "Marseille","Londres") length(vec_char) # what is the result?# what the below command displays?vec_char[5] # what command enables to change the item PSG by Paris# create a derived vector without the last item. Display it in 2 ways.# what the below command displays?vec_char[8]
Vector
Correction
vec_char =c("Toulouse", "PSG", "Bordeaux", "Lyon", "Marseille","Londres") length(vec_char) # what is the result?#> [1] 6# what the below command displays?vec_char[5] #> [1] "Marseille"# what command enables to change the item PSG by Parisvec_char[2]="Paris"# create a derived vector without the last item. Display it in 2 ways.vec_derived1 = vec_char[-6]vec_derived1#> [1] "Toulouse" "Paris" "Bordeaux" "Lyon" "Marseille"vec_derived2 = vec_char[c(1,2,3,4,5)]print(vec_derived2)#> [1] "Toulouse" "Paris" "Bordeaux" "Lyon" "Marseille"# what the below command displays?vec_char[8]#> [1] NA
Dataframe
An object of 2D table of variables of multiples types.
To create a dataframe, use the function data.frame().
For an overview of the table:
View()which gives you access to the table;
str() which lists the attributes of the dataframe.
dim() returns the dimension (row column) of the dataframe.
nrow() and ncol() return respectively the number of lines and columns.
df =data.frame("name"=c("Albert Einstein","Martin Luther","Nelson Mandela","William Shakespeare", "Ludwig van Beethoven"),"gender"=rep(x =1, times =5))df#> name gender#> 1 Albert Einstein 1#> 2 Martin Luther 1#> 3 Nelson Mandela 1#> 4 William Shakespeare 1#> 5 Ludwig van Beethoven 1dim(df)#> [1] 5 2is.data.frame(df) # function returns a logical value: TRUE if is a dataframe, FALSE otherwise#> [1] TRUEView(df) # overview of the dataframestr(df)#> 'data.frame': 5 obs. of 2 variables:#> $ name : chr "Albert Einstein" "Martin Luther" "Nelson Mandela" "William Shakespeare" ...#> $ gender: num 1 1 1 1 1
Note
Remind that the created objects are saved in the global environment.
Dataframe
To select an item, use the hooks [,] and note the index of the line number and the column number or the name of the variable, separated by a comma, because there are 2 dimensions to indicate.
To select a row (or observation), use the hooks [,] by assigning the index of the row, followed by a comma.
To select a column (a variable), use:
the dollar symbol $ followed the name of the variable;
or the hooks [,] by assigning the name of the variable in quotes or the index of the column preceded by a comma.
df[,"name"] # "Albert Einstein","Martin Luther","Nelson Mandela","William Shakespeare", "Ludwig van Beethoven"#> [1] "Albert Einstein" "Martin Luther" "Nelson Mandela" "William Shakespeare" #> [5] "Ludwig van Beethoven"df$gender =factor(x = df$gender, levels =c(0,1), labels =c("Woman", "Man"))df[1,3] # "Nelson Mandela"#> NULLdf[1,] #> name gender#> 1 Albert Einstein Man# "Albert Einstein" "Man"df[df$name!="Martin Luther"& df$name!="Nelson Mandela",]#> name gender#> 1 Albert Einstein Man#> 4 William Shakespeare Man#> 5 Ludwig van Beethoven Man
Dataframe
To add a new variable, use hooks [,] by indicating the name of the new variable or the dollar sign $ followed by the name.
To add a new observation, you can use the [,] by knowing the last observation number, and put the vector of elements.
bind_rows() combines a sequence of vectors or dataframes by rows.
# Adding the city of birthdf$countryBirth =c("Germany","USA","South-Africa","UK","Germany")df[,"countryBirth"]#> [1] "Germany" "USA" "South-Africa" "UK" "Germany"# Adding a new observationdf[6,] =c("Marie Curie", "Woman", "France")nrow(df) #> [1] 6df =bind_rows(df, c("name"="Malala Yousafzai", "gender"="Woman", "countryBirth"="Pakistan"))
Caution
When you use the [,], you have to call the variables in quotes. Whereas with the $ it is not necessary.
List
Lists are special objects whose elements can be of any kind (including other lists).
To create a list, use list() and write the individual elements between the parenthesis.
listing =list("object1"=vec_char,"object2"=df)
For an overview of the list, use the function str().
To access an element of the list, you have many ways:
# by position number, using the double hooks `[[]]` listing[[2]] #> name gender countryBirth#> 1 Albert Einstein Man Germany#> 2 Martin Luther Man USA#> 3 Nelson Mandela Man South-Africa#> 4 William Shakespeare Man UK#> 5 Ludwig van Beethoven Man Germany#> 6 Marie Curie Woman France#> 7 Malala Yousafzai Woman Pakistan# by name, using the double hooks `[[]]` listing[["object1"]] #> [1] "Toulouse" "Paris" "Bordeaux" "Lyon" "Marseille" "Londres"# by name, using the operator `$`(commun method) listing$object1#> [1] "Toulouse" "Paris" "Bordeaux" "Lyon" "Marseille" "Londres"
Function
Usual functions
min(), max(), mean(), median() – return the minimum / maximum / mean / median value of a numeric vector
sum() – returns the sum of a numeric vector
range() – returns the minimum and maximum values of a numeric vector
abs() – returns the absolute value of a number
str() – shows the structure of an R object
print() – displays an R object on the console
head() – displays the first rows (by default the 6 first ones) of a table
ncol(), nrow() – returns the number of columns or rows of a matrix or a dataframe
colnames() – returns
length() – returns the number of items in an R object (a vector, a list, etc.)
cat– concatenate and print
nchar() – returns the number of characters in a character object
sort() – sorts a vector in ascending or descending (decreasing=TRUE) order
exists() – returns TRUE or FALSE depending on whether or not a variable is defined in the R environment.
Creation of a new function
Function name
This is the name of the function object that will be stored in the R environment after the function definition and used for calling that function.
It should be concise but clear and meaningful so that the user who reads our code can easily understand what exactly this function does.
Good Practice
Variable and function names should be lowercase.
Use an underscore _ to separate words within a name.
Generally, variable names should be nouns and function names should be verbs.
Warning
A name can not begin with a number
# Goodday_oneday_1# Badfirst_day_of_the_month # too long DayOne # uppercase in the middledayone # misunderstandingdjm1 #not clear
Function parameters (arguments)
The arguments are the variables in the function definition placed inside the parentheses and separated with a comma that will be set to actual values each time we call the function.
quadratic_polynomial <-function(x, a, b, c){ y = a * x^2+ b*x + creturn(y)}quadratic_polynomial(x =1, a =2, b =3, c =4)#> [1] 9
It’s possible, even though rarely useful, for a function to have no parameters.
iteration <-function(){print("New iteration")}iteration()#> [1] "New iteration"
Function parameters (arguments)
Also, some parameters can be set to default values inside the function definition, which then can be reset when calling the function.
quadratic_polynomial <-function(x, a=2, b=1, c=10){ y = a * x^2+ b*x + creturn(y)}quadratic_polynomial(x =-1, a =3) #b and c is defined by default.#> [1] 12
Good practice
Clearly assign a value to each argument.
Function body
The function body is a set of commands inside the curly braces {} that are run in a predefined order every time we call the function.
prop_tab <-function(v, decimales=0.1) { tri <-table(v) effectif_total <-length(v) tri <- tri / effectif_total *100 tri <-round(tri, decimales) tri}vec <-c("rouge", "vert", "vert", "bleu", "rouge")prop_tab(vec)#> v#> bleu rouge vert #> 20 40 40
It usually isn’t necessary to explicitly include the return statement when defining a function.
Function body
But this becomes inevitable if we need to return more than one result from a function.
check_sqrt<-function(x){if (x >=0){ # no warning value =paste("The value is positive",x, sep =":") result =sqrt(x) }else { value =paste0("Warning! the value is negative:",x) result =NaN } output =list("note"= value, "result"= result)return(output)}check_sqrt(x =100)#> $note#> [1] "The value is positive:100"#> #> $result#> [1] 10
Important
The return() function can return only a single R object.
Output
To call a function: put function name and add the necessary arguments inside the parenthesis.
When calling a function, we usually assign the result of this operation to an object, to be able to use it later.
x =check_sqrt(x =25) # x is a list of a character (note) and the value (result)quadratic_polynomial(x$result,a =-1, b =-1, c =4) #> [1] -26