9 The apply family
Whenever you’re using a for loop, you may want to revise your code to see whether you can use the lapply function instead. Learn all about this intuitive way of applying a function over a list or a vector, and how to use its variants, sapply and vapply.
9.1 Use lapply with your own function
As Filip explained in the instructional video, you can use lapply() on your own functions as well. You just need to code a new function and make sure it is available in the workspace. After that, you can use the function inside lapply() just as you did with base R functions.
In the previous exercise you already used lapply() once to convert the information about your favorite pioneering statisticians to a list of vectors composed of two character strings. Let’s write some code to select the names and the birth years separately.
The sample code already includes code that defined select_first(), that takes a vector as input and returns the first element of this vector.
Instructions 100 XP
- Apply select_first() over the elements of split_low with lapply() and assign the result to a new variable names.
- Next, write a function select_second() that does the exact same thing for the second element of an inputted vector.
- Finally, apply the select_second() function over split_low and assign the output to the variable years.
ex_0.32.R
# Code from previous exercise:
pioneers <-
c("GAUSS:1777", "BAYES:1702", "PASCAL:1623", "PEARSON:1857")
split <- strsplit(pioneers, split = ":")
split_low <- lapply(split, tolower)
# Write function select_first()
select_first <- function(x) {
x[1]
}
# Apply select_first() over split_low: names
names <- lapply(split_low, select_first)
# Write function select_second()
select_second <- function(x) {
x[2]
}
# Apply select_second() over split_low: years
years <- lapply(split_low, select_second)## lapply and anonymous functions Writing your own functions and then using them inside lapply() is quite an accomplishment! But defining functions to use them only once is kind of overkill, isn’t it? That’s why you can use so-called anonymous functions in R.
Previously, you learned that functions in R are objects in their own right. This means that they aren’t automatically bound to a name. When you create a function, you can use the assignment operator to give the function a name. It’s perfectly possible, however, to not give the function a name. This is called an anonymous function:
# Named function
triple <- function(x) { 3 * x }
# Anonymous function with same implementation
function(x) { 3 * x }
# Use anonymous function inside lapply()
lapply(list(1,2,3), function(x) { 3 * x })split_low is defined for you.
Instructions 100 XP
- Transform the first call of
lapply()such that it uses an anonymous function that does the same thing. - In a similar fashion, convert the second call of
lapplyto use an anonymous version of theselect_second()function. - Remove both the definitions of
select_first()andselect_second(), as they are no longer useful.
ex_033.R
# split_low has been created for you
split_low
# Transform: use anonymous function inside lapply
names <- lapply(split_low, function(x){ x[1] })
# Transform: use anonymous function inside lapply
years <- lapply(split_low, function(x){ x[2] })## Use lapply with additional arguments
In the video, the triple() function was transformed to the multiply() function to allow for a more generic approach. lapply() provides a way to handle functions that require more than one argument, such as the multiply() function:
multiply <- function(x, factor) {
x * factor
}
lapply(list(1,2,3), multiply, factor = 3)On the right we’ve included a generic version of the select functions that you’ve coded earlier: select_el(). It takes a vector as its first argument, and an index as its second argument. It returns the vector’s element at the specified index.
Instructions 100 XP
Use lapply() twice to call select_el() over all elements in split_low: once with the index equal to 1 and a second time with the index equal to 2. Assign the result to names and years, respectively.
ex_034.R
# Definition of split_low
pioneers <-
c("GAUSS:1777", "BAYES:1702", "PASCAL:1623", "PEARSON:1857")
split <- strsplit(pioneers, split = ":")
split_low <- lapply(split, tolower)
# Generic select function
select_el <- function(x, index) {
x[index]
}
# Use lapply() twice on split_low: names and years
names <- lapply(split_low, select_el, 1)
years <- lapply(split_low, select_el, 2)9.2 Apply functions that return NULL
In all of the previous exercises, it was assumed that the functions that were applied over vectors and lists actually returned a meaningful result. For example, the tolower() function simply returns the strings with the characters in lowercase. This won’t always be the case. Suppose you want to display the structure of every element of a list. You could use the str() function for this, which returns NULL:
lapply(list(1, "a", TRUE), str)This call actually returns a list, the same size as the input list, containing all NULL values. On the other hand calling
str(TRUE)on its own prints only the structure of the logical to the console, not NULL. That’s because str() uses invisible() behind the scenes, which returns an invisible copy of the return value, NULL in this case. This prevents it from being printed when the result of str() is not assigned.
What will the following code chunk return (split_low is already available in the workspace)? Try to reason about the result before simply executing it in the console!
lapply(split_low, function(x) {
if (nchar(x[1]) > 5) {
return(NULL)
} else {
return(x[2])
}
})9.3 How to use sapply
You can use sapply() similar to how you used lapply(). The first argument of sapply() is the list or vector X over which you want to apply a function, FUN. Potential additional arguments to this function are specified afterwards (...):
sapply(X, FUN, ...)In the next couple of exercises, you’ll be working with the variable temp, that contains temperature measurements for 7 days. temp is a list of length 7, where each element is a vector of length 5, representing 5 measurements on a given day. This variable has already been defined in the workspace: type str(temp) to see its structure.
9.3.1 Instructions 100 XP
- Use
lapply()to calculate the minimum (built-in functionmin()) of the temperature measurements for every day. - Do the same thing but this time with
sapply(). See how the output differs. - Use
lapply()to compute the the maximum (max()) temperature for each day. Again, usesapply()to solve the same question and see howlapply()andsapply()differ.
ex_035.R
# temp has already been defined in the workspace
# Use lapply() to find each day's minimum temperature
lapply(temp, min)
# Use sapply() to find each day's minimum temperature
sapply(temp, min)
# Use lapply() to find each day's maximum temperature
lapply(temp, max)
# Use sapply() to find each day's maximum temperature
sapply(temp, max)9.4 sapply with your own function
Like lapply(), sapply() allows you to use self-defined functions and apply them over a vector or a list:
sapply(X, FUN, ...)Here, FUN can be one of R’s built-in functions, but it can also be a function you wrote. This self-written function can be defined before hand, or can be inserted directly as an anonymous function.
Instructions 100 XP
- Finish the definition of
extremes_avg(): it takes a vector of temperatures and calculates the average of the minimum and maximum temperatures of the vector. - Next, use this function inside
sapply()to apply it over the vectors insidetemp. - Use the same function over
tempwithlapply()and see how the outputs differ.
ex_036.R
# temp is already defined in the workspace
# Finish function definition of extremes_avg
extremes_avg <- function(x) {
( min(x) + max(x) ) / 2
}
# Apply extremes_avg() over temp using sapply()
sapply(temp, extremes_avg)
# Apply extremes_avg() over temp using lapply()
lapply(temp, extremes_avg)9.5 sapply with function returning vector
In the previous exercises, you’ve seen how sapply() simplifies the list that lapply() would return by turning it into a vector. But what if the function you’re applying over a list or a vector returns a vector of length greater than 1? If you don’t remember from the video, don’t waste more time in the valley of ignorance and head over to the instructions!
Instructions 100 XP
- Finish the definition of the
extremes()function. It takes a vector of numerical values and returns a vector containing the minimum and maximum values of a given vector, with the names"min"and"max", respectively. - Apply this function over the vector temp using
sapply(). - Finally, apply this function over the vector temp using
lapply()as well.
ex_037.R
# temp is already available in the workspace
# Create a function that returns min and max of a vector: extremes
extremes <- function(x) {
c(min = min(x), max = max(x))
}
# Apply extremes() over temp with sapply()
sapply(temp, extremes)
# Apply extremes() over temp with lapply()
lapply(temp, extremes)9.6 sapply can’t simplify, now what?
It seems like we’ve hit the jackpot with sapply(). On all of the examples so far, sapply() was able to nicely simplify the rather bulky output of lapply(). But, as with life, there are things you can’t simplify. How does sapply() react?
We already created a function, below_zero(), that takes a vector of numerical values and returns a vector that only contains the values that are strictly below zero.
Instructions 100 XP
- Apply
below_zero()overtempusingsapply()and store the result infreezing_s. - Apply
below_zero()over temp usinglapply(). Save the resulting list in a variablefreezing_l. - Compare
freezing_stofreezing_lusing theidentical()function.
ex_038.R
# temp is already prepared for you in the workspace
# Definition of below_zero()
below_zero <- function(x) {
return(x[x < 0])
}
# Apply below_zero over temp using sapply(): freezing_s
freezing_s <- sapply(temp, below_zero)
# Apply below_zero over temp using lapply(): freezing_l
freezing_l <- lapply(temp, below_zero)
# Are freezing_s and freezing_l identical?
identical(freezing_s, freezing_s)9.7 sapply with functions that return NULL
You already have some apply tricks under your sleeve, but you’re surely hungry for some more, aren’t you? In this exercise, you’ll see how sapply() reacts when it is used to apply a function that returns NULL over a vector or a list.
A function print_info(), that takes a vector and prints the average of this vector, has already been created for you. It uses the cat() function.
Instructions 100 XP
- Apply
print_info()over the contents of temp withsapply(). - Repeat this process with
lapply(). Do you notice the difference?
ex_039.R
# temp is already available in the workspace
# Definition of print_info()
print_info <- function(x) {
cat("The average temperature is", mean(x), "\n")
}
# Apply print_info() over temp using sapply()
sapply(temp, print_info)
# Apply print_info() over temp using lapply()
lapply(temp, print_info)9.8 Reverse engineering sapply
sapply(list(runif (10), runif (10)),
function(x) c(min = min(x), mean = mean(x), max = max(x)))Without going straight to the console to run the code, try to reason through which of the following statements are correct and why.
sapply()can’t simplify the result thatlapply()would return, and thus returns a list of vectors.- This code generates a matrix with 3 rows and 2 columns.
- The function that is used inside sapply() is anonymous.
- The resulting data structure does not contain any names.
Select the option that lists all correct statements.
answer : (2) and (3)
9.9 Use vapply
Before you get your hands dirty with the third and last apply function that you’ll learn about in this intermediate R course, let’s take a look at its syntax. The function is called vapply(), and it has the following syntax:
vapply(X, FUN, FUN.VALUE, ..., USE.NAMES = TRUE)Over the elements inside X, the function FUN is applied. The FUN.VALUE argument expects a template for the return argument of this function FUN. USE.NAMES is TRUE by default; in this case vapply() tries to generate a named array, if possible.
For the next set of exercises, you’ll be working on the temp list again, that contains 7 numerical vectors of length 5. We also coded a function basics() that takes a vector, and returns a named vector of length 3, containing the minimum, mean and maximum value of the vector respectively.
Instructions 100 XP
Apply the function basics() over the list of temperatures, temp, using vapply(). This time, you can use numeric(3) to specify the FUN.VALUE argument.
ex_040.R
# temp is already available in the workspace
# Definition of basics()
basics <- function(x) {
c(min = min(x), mean = mean(x), max = max(x))
}
# Apply basics() over temp using vapply()
vapply(temp, basics, numeric(3))9.10 Use vapply (2)
So far you’ve seen that vapply() mimics the behavior of sapply() if everything goes according to plan. But what if it doesn’t?
In the video, Filip showed you that there are cases where the structure of the output of the function you want to apply, FUN, does not correspond to the template you specify in FUN.VALUE. In that case, vapply() will throw an error that informs you about the misalignment between expected and actual output.
Instructions 100 XP
- Inspect the pre-loaded code and try to run it. If you haven’t changed anything, an error should pop up. That’s because
vapply()still expectsbasics()to return a vector of length 3. The error message gives you an indication of what’s wrong. - Try to fix the error by editing the
vapply()command.
ex_041.R
# temp is already available in the workspace
# Definition of the basics() function
basics <- function(x) {
c(min = min(x), mean = mean(x), median = median(x), max = max(x))
}
# Fix the error:
vapply(temp, basics, numeric(4))9.11 From sapply to vapply
As highlighted before, vapply() can be considered a more robust version of sapply(), because you explicitly restrict the output of the function you want to apply. Converting your sapply() expressions in your own R scripts to vapply() expressions is therefore a good practice (and also a breeze!).
Instructions 100 XP
Convert all the sapply() expressions on the right to their vapply() counterparts. Their results should be exactly the same; you’re only adding robustness. You’ll need the templates numeric(1) and logical(1).
ex_042.R
# temp is already defined in the workspace
# Convert to vapply() expression
vapply(temp, max, numeric(1))
# Convert to vapply() expression
vapply(temp, function(x, y) { mean(x) > y }, y = 5, logical(1))