35 Left and Right Joins
Learn two more mutating joins, the left and right join, which are mirror images of each other! You’ll learn use cases for each type of join as you explore parts and colors of LEGO themes. Then, you’ll explore how to join tables to themselves to understand the hierarchy of LEGO themes in the data.
35.1 Left joining two sets by part and color
You has been learned how to left join two LEGO sets. Now you’ll practice your ability to do this looking at two new sets: the Millennium Falcon and Star Destroyer sets. We’ve created these for you and they have been preloaded for you:
millennium_falcon <- inventory_parts_joined %>%
filter(set_num == "7965-1")
star_destroyer <- inventory_parts_joined %>%
filter(set_num == "75190-1")Instructions 100 XP
ex_006.R
# Combine the star_destroyer and millennium_falcon tables
millennium_falcon %>%
left_join(
star_destroyer,
by = c("part_num", "color_id"),
suffix = c("_falcon", "_star_destroyer")
)35.2 Left joining two sets by color
In the last exercise, you joined two sets based on their part and color. What if you joined the datasets by color alone? As with the last exercise, the Millennium Falcon and Star Destroyer sets have been created and preloaded for you:
millennium_falcon <- inventory_parts_joined %>%
filter(set_num == "7965-1")
star_destroyer <- inventory_parts_joined %>%
filter(set_num == "75190-1")Instructions 100 XP
- Sum the
quantitycolumn by color_id in the Millennium Falcon dataset.
ex_007.R
# Aggregate Millennium Falcon for the total quantity in each part
millennium_falcon_colors <- millennium_falcon %>%
group_by(color_id) %>%
summarize(total_quantity = sum(quantity))
# Aggregate Star Destroyer for the total quantity in each part
star_destroyer_colors <- star_destroyer %>%
group_by(color_id) %>%
summarize(total_quantity = sum(quantity))
# Left join the Millennium Falcon colors to the Star Destroyer colors
millennium_falcon_colors %>%
left_join(
star_destroyer_colors,
by = c("color_id"),
suffix =c(
"_falcon",
"_star_destroyer"
)
)35.3 Finding an observation that doesn’t have a match
Left joins are really great for testing your assumptions about a data set and ensuring your data has integrity.
For example, the inventories table has a version column, for when a LEGO kit gets some kind of change or upgrade. It would be fair to assume that all sets (which joins well with inventories) would have at least a version 1. But let’s test this assumption out in the following exercise.
Instructions 100 XP
- Use a
left_jointo join togethersetsandinventory_version_1using their common column. filterfor where the version column isNAusingis.na.
ex_008.R
inventory_version_1 <- inventories %>%
filter(version == 1)
# Join versions to sets
sets %>%
left_join(
inventory_version_1,
by = c("set_num")
) %>%
# Filter for where version is na
filter(is.na(version))35.4 Which join is best?
Sometimes you’ll want to do some processing before you do a join, and prioritize keeping the second (right) table’s rows instead. In this case, a right join is for you.
In this exercise, we’ll count the part_cat_id from parts, before using a right_join to join with part_categories. The reason we do this is because we don’t only want to know the count of part_cat_id in parts, but we also want to know if there are any part_cat_ids not present in parts.
Instructions 100 XP
ex_009.R
parts %>%
# Count the part_cat_id
count(part_cat_id) %>%
# Right join part_categories
right_join(
part_categories,
by=c("part_cat_id" = "id" )
)35.5 Counting part colors
Sometimes you’ll want to do some processing before you do a join, and prioritize keeping the second (right) table’s rows instead. In this case, a right join is for you.
In this exercise, we’ll count the part_cat_id from parts, before using a right_join to join with part_categories. The reason we do this is because we don’t only want to know the count of part_cat_id in parts, but we also want to know if there are any part_cat_ids not present in parts.
Instructions 100 XP
ex_010.R
parts %>%
count(part_cat_id) %>%
right_join(part_categories, by = c("part_cat_id" = "id")) %>%
# Filter for NA
filter(is.na(n))35.6 Cleaning up your count
In both left and right joins, there is the opportunity for there to be NA values in the resulting table. Fortunately, the replace_na function can turn those NAs into meaningful values.
In the last exercise, we saw that the n column had NAs after the right_join. Let’s use the replace_na column, which takes a list of column names and the values with which NAs should be replaced, to clean up our table.
Instructions 100 XP
Use replace_na to replace NAs in the n column with the value 0.
ex_011.R
parts %>%
count(part_cat_id) %>%
right_join(part_categories, by = c("part_cat_id" = "id")) %>%
# Use replace_na to replace missing values in the n column
replace_na(list(n = 0))35.7 Joining themes to their children
Tables can be joined to themselves!
In the themes table, which is available for you to inspect in the console, you’ll notice there is both an id column and a parent_id column. Keeping that in mind, you can join the themes table to itself to determine the parent- child relationships that exist for different themes.
In the videos, you saw themes joined to their own parents. In this exercise, you’ll try a similar approach of joining themes to their own children, which is similar but reversed. Let’s try this out to discover what children the theme "Harry Potter" has.
Instructions 100 XP
- Inner join
themesto their own children, resulting in the suffixes"_parent"and"_child", respectively. - Filter this table to find the children of the
"Harry Potter"theme.
ex_012.R
themes %>%
# Inner join the themes table
inner_join(
themes,
by = c("id" = "parent_id" ),
suffix = c( "_parent", "_child")
) %>%
# Filter for the "Harry Potter" parent name
filter(name_parent == "Harry Potter")35.8 Joining themes to their grandchildren
We can go a step further than looking at themes and their children. Some themes actually have grandchildren: their children’s children.
Here, we can inner join themes to a filtered version of itself again to establish a connection between our last join’s children and their children.
Instructions 100 XP
- Use another inner join to combine themes again with itself.
- Be sure to use the suffixes “_parent” and “_grandchild” so the columns in the resulting table are clear.
- Update the by argument to specify the correct columns to join on. If you’re unsure of what columns to join on, it might help to look at the result of the first join to get a feel for it.
ex_013.R
# Join themes to itself again to find the grandchild relationships
themes %>%
inner_join(
themes,
by = c("id" = "parent_id"),
suffix = c("_parent", "_child")) %>%
inner_join(
themes,
by = c("id_child" = "parent_id"),
suffix = c("_parent", "_grandchild")
)35.9 Left join a table to itself
So far, you’ve been inner joining a table to itself in order to find the children of themes like "Harry Potter" or "The Lord of the Rings".
But some themes might not have any children at all, which means they won’t be included in the inner join. As you’ve learned in this chapter, you can identify those with a left_join and a filter().
Instructions 100 XP
- Left join the themes table to its own children, with the suffixes
_parentand_childrespectively. - Filter the result of the join to find themes that have no children.
ex_014.R
themes %>%
# Left join the themes table to its own children
left_join(
themes,
by = c("id" = "parent_id"),
suffix = c("_parent", "_child")
) %>%
# Filter for themes that have no child themes
filter(is.na(id_child))