13  All About Arguments

13.1 NULL defaults

The cut() function used by cut_by_quantile() can automatically provide sensible labels for each category. The code to generate these labels is pretty complicated, so rather than appearing in the function signature directly, its labels argument defaults to NULL, and the calculation details are shown on the ?cut help page.

Instructions 100 XP

Update the definition of cut_by_quantile() so that the labels argument defaults to NULL. Remove the labels argument from the call to cut_by_quantile().

ex_08.R
# Set the default for labels to NULL
cut_by_quantile <- function(x, n = 5, na.rm = FALSE, labels=NULL, 
interval_type) {
  probs <- seq(0, 1, length.out = n + 1)
  qtiles <- quantile(x, probs, na.rm = na.rm, names = FALSE)
  right <- switch(interval_type, "(lo, hi]" = TRUE, "[lo, hi)" = FALSE)
  cut(x, qtiles, labels = labels, right = right, include.lowest = TRUE)
}

# Remove the labels argument from the call
cut_by_quantile(
  n_visits,
  interval_type = "(lo, hi]"
)

13.2 Categorical defaults

When cutting up a numeric vector, you need to worry about what happens if a value lands exactly on a boundary. You can either put this value into a category of the lower interval or the higher interval. That is, you can choose your intervals to include values at the top boundary but not the bottom (in mathematical terminology, “open on the left, closed on the right”, or (lo, hi]). Or you can choose the opposite (“closed on the left, open on the right”, or [lo, hi)). cut_by_quantile() should allow these two choices.

The pattern for categorical defaults is:

function(cat_arg = c("choice1", "choice2")) { 
    cat_arg \<- match.arg(cat_arg)
}

Free hint: In the console, type head(rank) to see the start of rank()’s definition, and look at the ties.method argument.

Instructions 100 XP

Update the signature of cut_by_quantile() so that the interval_type argument can be “(lo, hi ]” or ” [lo, hi)“. Note the space after each comma. Update the body of cut_by_quantile() to match the interval_type argument. Remove the interval_type argument from the call to cut_by_quantile().

ex_09.R
# Set the categories for interval_type to "(lo, hi]" and "[lo, hi)"
cut_by_quantile <- function(x, n = 5, na.rm = FALSE, labels = NULL, 
interval_type=  c("(lo, hi]", "[lo, hi)")) {
  # Match the interval_type argument
  interval_type <- match.arg(interval_type)
  probs <- seq(0, 1, length.out = n + 1)
  qtiles <- quantile(x, probs, na.rm = na.rm, names = FALSE)
  right <- switch(interval_type, "(lo, hi]" = TRUE, "[lo, hi)" = FALSE)
  cut(x, qtiles, labels = labels, right = right, include.lowest = TRUE)
}
# Remove the interval_type argument from the call
cut_by_quantile(n_visits)

13.3 Harmonic mean

The harmonic mean is the reciprocal of the arithmetic mean of the reciprocal of the data. That is the harmonic mean is often used to average ratio data. You’ll be using it on the price/earnings ratio of stocks in the Standard and Poor’s 500 index, provided as std_and_poor500. Price/earnings ratio is a measure of how expensive a stock is.

The dplyr package is loaded.

Instructions 100 XP

  • Look at std_and_poor500 (you’ll need this later). Write a function, get_reciprocal, to get the reciprocal of an input x. Its only argument should be x, and it should return one over x.

  • Write a function, calc_harmonic_mean(), that calculates the harmonic mean of its only input, x.

  • Using std_and_poor500, group by sector, and summarize to calculate the harmonic mean of the price/earning ratios in the pe_ratio column.

ex_10.R
# Look at the Standard and Poor 500 data
glimpse(std_and_poor500)

# Write a function to calculate the reciprocal
get_reciprocal <- function(x){
  1/x 
}

# From previous step
get_reciprocal <- function(x) {
  1 / x
}

# Write a function to calculate the harmonic mean
calc_harmonic_mean <- function(x) {
  x %>%
    get_reciprocal() %>%
    mean() %>%
    get_reciprocal()
}

# From previous steps
get_reciprocal <- function(x) {
  1 / x
}
calc_harmonic_mean <- function(x) {
  x %>%
    get_reciprocal() %>%
    mean() %>%
    get_reciprocal()
}

std_and_poor500 %>% 
  # Group by sector
  group_by(sector) %>% 
  # Summarize, calculating harmonic mean of P/E ratio
  summarize(hmean_pe_ratio = calc_harmonic_mean(pe_ratio))

13.4 Dealing with missing values

In the last exercise, many sectors had an NA value for the harmonic mean. It would be useful for your function to be able to remove missing values before calculating.

Rather than writing your own code for this, you can outsource this functionality to mean().

The dplyr package is loaded.

Instructions 100 XP

  • Modify the signature and body of calc_harmonic_mean() so it has an na.rm argument, defaulting to false, that gets passed to mean().

  • Using std_and_poor500, group by sector, and summarize to calculate the harmonic mean of the price/earning ratios in the pe_ratio column, removing missing values.

ex_11.R
# Add an na.rm arg with a default, and pass it to mean()
calc_harmonic_mean <- function(x,na.rm = FALSE) {
  x %>%
    get_reciprocal() %>%
    mean(na.rm =na.rm) %>%
    get_reciprocal()
}

# From previous step
calc_harmonic_mean <- function(x, na.rm = FALSE) {
  x %>%
    get_reciprocal() %>%
    mean(na.rm = na.rm) %>%
    get_reciprocal()
}

std_and_poor500 %>% 
  # Group by sector
  group_by(sector) %>% 
  # Summarize, calculating harmonic mean of P/E ratio
  summarize(hmean_pe_ratio = calc_harmonic_mean(pe_ratio,
   na.rm = TRUE))

13.5 Passing arguments with ...

Rather than explicitly giving calc_harmonic_mean() and na.rm argument, you can use ... to simply “pass other arguments” to mean().

The dplyr package is loaded.

Instructions 100 XP

  • Replace the na.rm argument with … in the signature and body of calc_harmonic_mean().

  • Using std_and_poor500, group by sector, and summarize to calculate the harmonic mean of the price/earning ratios in the pe_ratio column, removing missing values.

ex_12.R
# Swap na.rm arg for ... in signature and body
calc_harmonic_mean <- function(x, ...) {
  x %>%
    get_reciprocal() %>%
    mean(...) %>%
    get_reciprocal()
}
calc_harmonic_mean(x = c(1, NA, 3, NA, 5))

calc_harmonic_mean <- function(x, ...) {
  x %>%
    get_reciprocal() %>%
    mean(...) %>%
    get_reciprocal()
}

std_and_poor500 %>% 
  # Group by sector
  group_by(sector) %>% 
  # Summarize, calculating harmonic mean of P/E ratio
  summarize(hmean_pe_ratio = calc_harmonic_mean(pe_ratio,
   na.rm = TRUE))

13.6 Throwing errors with bad arguments

If a user provides a bad input to a function, the best course of action is to throw an error letting them know. The two rules are

Throw the error message as soon as you realize there is a problem (typically at the start of the function). Make the error message easily understandable. You can use the assert\_\*() functions from assertive to check inputs and throw errors when they fail.

Instructions 100 XP

  • Add a line to the body of calc_harmonic_mean() to assert that x is numeric.
  • Look at what happens when you pass a character argument to calc_harmonic_mean().
ex_13.R
alc_harmonic_mean <- function(x, na.rm = FALSE) {
  # Assert that x is numeric
  assert_is_numeric(x)
  x %>%
    get_reciprocal() %>%
    mean(na.rm = na.rm) %>%
    get_reciprocal()
}

# See what happens when you pass it strings
calc_harmonic_mean(std_and_poor500$sector)

13.7 Custom error logic

Sometimes the assert_*() functions in assertive don’t give the most informative error message. For example, the assertions that check if a number is in a numeric range will tell the user that a value is out of range, but the won’t say why that’s a problem. In that case, you can use the is \_\*() functions in conjunction with messages, warnings, or errors to define custom feedback.

The harmonic mean only makes sense when x has all positive values. (Try calculating the harmonic mean of one and minus one to see why.) Make sure your users know this!

13.7.1 Instructions 100 XP

  • If any values of x are non-positive (ignoring NAs) then throw an error.
  • Look at what happens when you pass a character argument to calc_harmonic_mean().
ex_14.R
calc_harmonic_mean <- function(x, na.rm = FALSE) {
  assert_is_numeric(x)
  # Check if any values of x are non-positive
  if(any(is_non_positive(x), na.rm = TRUE)) {
    # Throw an error
    stop("x contains non-positive values, so the harmonic mean makes no sense.")
  }
  x %>%
    get_reciprocal() %>%
    mean(na.rm = na.rm) %>%
    get_reciprocal()
}

# See what happens when you pass it negative numbers
calc_harmonic_mean(std_and_poor500$pe_ratio - 20)

13.8 Fixing function arguments

The harmonic mean function is almost complete. However, you still need to provide some checks on the na.rm argument. This time, rather than throwing errors when the input is in an incorrect form, you are going to try to fix it.

na.rm should be a logical vector with one element (that is, TRUE, or FALSE).

The assertive package is loaded for you.

Instructions 100 XP

  • Update calc_harmonic_mean() to fix the na.rm argument by using use_first() to select the first na.rm element, and coerce_to() to change it to logical.
ex_15.R
# Update the function definition to fix the na.rm argument
calc_harmonic_mean <- function(x, na.rm = FALSE) {
  assert_is_numeric(x)
  if(any(is_non_positive(x), na.rm = TRUE)) {
    stop("x contains non-positive values, so the harmonic mean makes no sense.")
  }
  # Use the first value of na.rm, and coerce to logical
  na.rm <- coerce_to(use_first(na.rm), target_class  = "logical")
  x %>%
    get_reciprocal() %>%
    mean(na.rm = na.rm) %>%
    get_reciprocal()
}

# See what happens when you pass it malformed na.rm
calc_harmonic_mean(std_and_poor500$pe_ratio, na.rm = 1:5)