the power of packages in R

One of the wonderful things about R / RStudio are the many, many packages which expand the capabilities of R.

There are a suite of packages that I use most weeks – these include ggplot2 (for plotting), data.table (for fast data manipulation) and foreign (for reading in GIS data (.dbf) tables).

This video should walk you through the differences between the library() and require() methods of loading packages. Hope it helps!

using packages in R

To use a package in R, there is a two-step process involved.

  1. You need to download and install the package
  2. You need to tell R / RStudio that you want to use that package in this session.

Step 1: downloading and installing packages

To download and install the package, you need to type in the R console:

install.packages(“put_the_package_name_in_here_in_quotes”)

so for example, if we want to install the package called data.table, we would type:

install.packages(“data.table”)

Step 2: loading the package for use in this R session

to load the package for use in this session, you have two options, which you will, no doubt have seen.

require(the_package_name_with_NO_quotes)

or

library(the_package_name_with_NO_quotes)

Why are there two options?

It seems to me that R, there are always multiple ways of doing things. In this case, when we are loading packages, you have two options from two functions: library() and require().

Both functions will load the package. Both will attach it to the search list. However, there are a few difference in exactly what they do. One comes in the warnings that they print out on screen. Another is how they respond to errors.

  1. library() will print out warning messages.
  2. If the library()line is in another script, no code after this point will run if there is an error.
  3. require() will not print out warning messages.
    If this require() function is in another script, the code after this point will continue to run if there is an error.

So, should I use require() or library() ?

If you are just executing code in the console, or developing code or scripts, I strongly recommend that you use the library() function.

If the package you are loading does not exist, library() will ERROR, and let you know. These error messages are obvious, and they can flag potential problems early on. This can save you hours of headaches, wondering why your code is not working.

However, if you are writing functions, normally, require() is the preferred choice. require() is designed for use inside other functions, and will NOT ERROR if there is a problem with the package. Instead, it will simply provide a warning, and then move on.

The most obvious difference between library() and require() for me is when you are running code in a function or a script.

As an example, I have two scripts which load some packages, including a non-existent package called “kids_love_strawberries”.

One script uses library() to load the packages, the other uses require().

script 1: library_test.R

# library_test.R

print("1. this line occurs BEFORE the library statement")

library(ggplot2) # this is a real package, and should load fine

library(kids_love_strawberries) # this is a fictional package, so will not load

print("2. this line occurs AFTER the failed library statement")

# some additional code to run

x <- 100
y <- x*2

print(paste0(" x = ", x))
print(paste0(" y = ", y))

script 2: require_test.R

# require_test.R

print("1. this line occurs BEFORE the require statement")

require(ggplot2) # this is a real package, and should load fine

require(kids_love_strawberries) # this is a fictional package, so will not load

print("2. this line occurs AFTER the failed require statement")

# some additional code to run

x <- 100
y <- x*2

print(paste0(" x = ", x))
print(paste0(" y = ", y))

script 3: the master script

Then we have one more script that runs both of these scripts. The two scripts above are called using the source() function.

 # ensure you set the working directory to your folder where you have saved the two files below. 

setwd("C:/Users/tim/blog_resources/R/scripts") 

# this will install the ggplot2 package
install.packages("ggplot2")  

# if library() fails, it STOPS the script. No more lines are run after script after the library() function

source("library_test.R") 

# if require() fails, it CONTINUES to run the script after the require() function

source("require_test.R") 

So, what happens when we run these two scripts?

The result : library()

> source("library_test.R")  # if library() fails, it STOPS the script. No more lines are run after script after the library() function
[1] "1. this line occurs BEFORE the library statement"
Error in library(kids_love_strawberries) : 
  there is no package called ‘kids_love_strawberries’
  1. This made no mention of the ggplot2 package, which loaded successfully.
  2. It reports an ERROR – no package called “kids_love_strawberries”
  3. It runs no more lines of code.

The result: require()

> source("require_test.R")  # if require() fails, it CONTINUES to run the script after the require() function
[1] "1. this line occurs BEFORE the require statement"
Loading required package: kids_love_strawberries
[1] "2. this line occurs AFTER the failed require statement"
[1] " x = 100"
[1] " y = 200"
Warning message:
In library(package, lib.loc = lib.loc, character.only = TRUE, logical.return = TRUE,  :
  there is no package called ‘kids_love_strawberries
  1. This made no mention of the ggplot2 package, which loaded successfully.
  2. It gives a WARNING – no package called “kids_love_strawberries”
  3. It continues to run the lines of code, which follow this failed require() function.

I hope this has helped you gain a bit more clarity on the difference between library() and require(). The video above may help too.

Do let me know if you have any questions, or need any help.

Categories: BlogdataR

0 Comments

Leave a Reply