20  Reproducible Excel work with XLConnect

Beyond importing data from Excel, you can take things one step further with XLConnect. Learn all about it and bridge the gap between R and Excel.

20.1 Connect to a workbook

When working with XLConnect, the first step will be to load a workbook in your R session with loadWorkbook(); this function will build a “bridge” between your Excel file and your R session.

In this and the following exercises, you will continue to work with urbanpop.xlsx (view data folder), containing urban population data throughout time.

Instructions 100 XP

  • Load the XLConnect package using library(); it is already installed on DataCamp’s servers.
  • Use loadWorkbook() to build a connection to the "urbanpop.xlsx" file in R. Call the workbook my_book.
  • Print out the class of my_book. What does this tell you?
ex_023.R
# urbanpop.xlsx is available in your working directory
# Load the XLConnect package
library(XLConnect)

# Build connection to urbanpop.xlsx: my_book
my_book <- loadWorkbook("urbanpop.xlsx")

# Print out the class of my_book
class(my_book)

20.2 List and read Excel sheets

Just as readxl and gdata, you can use XLConnect to import data from Excel file into R.

To list the sheets in an Excel file, use getSheets(). To actually import data from a sheet, you can use readWorksheet(). Both functions require an XLConnect workbook object as the first argument.

You’ll again be working with urbanpop.xlsx. The my_book object that links to this Excel file has already been created.

Instructions 100 XP

  • Print out the sheets of the Excel file that my_book links to.
  • Import the second sheet in my_book as a data frame. Print it out.
ex_024.R
# XLConnect is already available

# Build connection to urbanpop.xlsx
my_book <- loadWorkbook("urbanpop.xlsx")

# List the sheets in my_book
getSheets(my_book)

# Import the second sheet in my_book
readWorksheet(my_book, sheet = "1967-1974")

20.3 Customize readWorksheet

To get a clear overview about urbanpop.xlsx (view) without having to open up the Excel file, you can execute the following code:

my_book <- loadWorkbook("urbanpop.xlsx")
sheets <- getSheets(my_book)
all <- lapply(sheets, readWorksheet, object = my_book)
str(all)

Suppose we’re only interested in urban population data of the years 1968, 1969 and 1970. The data for these years is in the columns 3, 4, and 5 of the second sheet. Only selecting these columns will leave us in the dark about the actual countries the figures belong to.

Instructions 100 XP

  • Extend the readWorksheet() command with the startCol and endCol arguments to only import the columns 3, 4, and 5 of the second sheet. -urbanpop_sel no longer contains information about the countries now. Can you write another readWorksheet() command that imports only the first column from the second sheet? Store the resulting data frame as countries. -Use cbind() to paste together countries and urbanpop_sel, in this order. Store the result as selection.
ex_025.R
# XLConnect is already available

# Build connection to urbanpop.xlsx
my_book <- loadWorkbook("urbanpop.xlsx")

# Import columns 3, 4, and 5 from second sheet in my_book: urbanpop_sel
urbanpop_sel <- 
    readWorksheet(
        my_book,
        sheet = 2,
        startCol = 3,
        endCol = 5
    )

# Import first column from second sheet in my_book: 
# countries
countries <- 
    readWorksheet(
        my_book,
        sheet = 2,
        startCol = 1,
        endCol = 1
    )
# cbind() urbanpop_sel and countries together: selection
selection <- cbind(countries, urbanpop_sel)

20.4 Add worksheet

Where readxl and gdata were only able to import Excel data, XLConnect's approach of providing an actual interface to an Excel file makes it able to edit your Excel files from inside R. In this exercise, you’ll create a new sheet. In the next exercise, you’ll populate the sheet with data, and save the results in a new Excel file.

You’ll continue to work with urbanpop.xlsx. The my_book object that links to this Excel file is already available.

Instructions 100 XP

  • Use createSheet(), to create a new sheet in my_book, named "data_summary".
  • Use getSheets() to verify that my_book now represents an Excel file with four sheets.
ex_026.R
# XLConnect is already available

# Build connection to urbanpop.xlsx
my_book <- loadWorkbook("urbanpop.xlsx")

# Add a worksheet to my_book, named "data_summary"
createSheet(my_book, "data_summary")

# Use getSheets() on my_book
getSheets(my_book)

20.5 Populate worksheet

The first step of creating a sheet is done; let’s populate it with some data now! summ, a data frame with some summary statistics on the two Excel sheets is already coded so you can take it from there.

Instructions 100 XP

  • Use writeWorksheet() to populate the "data_summary" sheet with the summ data frame.
  • Call saveWorkbook() to store the adapted Excel workbook as a new file, "summary.xlsx"
ex_026.R
# XLConnect is already available

# Build connection to urbanpop.xlsx
my_book <- loadWorkbook("urbanpop.xlsx")

# Add a worksheet to my_book, named "data_summary"
createSheet(my_book, "data_summary")

# Create data frame: summ
sheets <- getSheets(my_book)[1:3]
dims <- sapply(
    sheets,
    function(x) dim(readWorksheet(my_book, sheet = x)),
    USE.NAMES = FALSE
)
summ <- data.frame(sheets = sheets,
                   nrows = dims[1, ],
                   ncols = dims[2, ])

# Add data in summ to "data_summary" sheet
writeWorksheet(my_book, summ, sheet = "data_summary")

# Save workbook as summary.xlsx
saveWorkbook(my_book, "summary.xlsx")

20.6 Renaming sheets

Come to think of it, “data_summary” is not an ideal name. As the summary of these excel sheets is always data-related, you simply want to name the sheet “summary”.

The code to build a connection to "urbanpop.xlsx" and create my_book is already provided for you. It refers to an Excel file with 4 sheets: the three data sheets, and the “data_summary” sheet.

Instructions 100 XP

  • Use renameSheet() to rename the fourth sheet to "summary".
  • Next, call getSheets() on my_book to print out the sheet names.
  • Finally, make sure to actually save the my_book object to a new Excel file, "renamed.xlsx".
ex_027.R
# Build connection to urbanpop.xlsx: my_book
my_book <- loadWorkbook("urbanpop.xlsx")
# Rename "data_summary" sheet to "summary"
renameSheet(my_book, "data_summary", "summary")
# Print out sheets of my_book
getSheets(my_book)
# Save workbook to "renamed.xlsx"
saveWorkbook(my_book, "renamed.xlsx")

20.7 Removing sheets

After presenting the new Excel sheet to your peers, it appears not everybody is a big fan. Why summarize sheets and store the info in Excel if all the information is implicitly available? To hell with it, just remove the entire fourth sheet!

Instructions 100 XP

  • Load the XLConnect package.
  • Build a connection to “renamed.xlsx”, the Excel file that you’ve built in the previous exercise; it’s available in your working directory. Store this connection as my_book.
  • Use removeSheet() to remove the fourth sheet from my_book. The sheet name is "summary". Recall that removeSheet() accepts either the index or the name of the sheet as the second argument.
  • Save the resulting workbook, my_book, to a file "clean.xlsx".
ex_027.R
# Load the XLConnect package

library(XLConnect)
# Build connection to renamed.xlsx: my_book
my_book <- loadWorkbook("renamed.xlsx")

# Remove the fourth sheet
removeSheet(my_book, "summary")

# Save workbook to "clean.xlsx"
saveWorkbook(my_book, "clean.xlsx")