21  Importing data from databases (Part 1)

Many companies store their information in relational databases. The R community has also developed R packages to get data from these architectures. You’ll learn how to connect to a database and how to retrieve data from it.

21.1 Establish a connection

The first step to import data from a SQL database is creating a connection to it. As Filip explained, you need different packages depending on the database you want to connect to. All of these packages do this in a uniform way, as specified in the DBI package.

dbConnect() creates a connection between your R session and a SQL database. The first argument has to be a DBIdriver object, that specifies how connections are made and how data is mapped between R and the database. Specifically for MySQL databases, you can build such a driver with RMySQL::MySQL().

If the MySQL database is a remote database hosted on a server, you’ll also have to specify the following arguments in dbConnect(): dbname, host, port, user and password. Most of these details have already been provided.

Instructions 100 XP

  • Load the DBI library, which is already installed on DataCamp’s servers.
  • Edit the dbConnect() call to connect to the MySQL database. Change the port argument (3306) and user argument ("student").
ex_001.R
# Load the DBI package
library(DBI)

# Edit dbConnect() call
con <- dbConnect(RMySQL::MySQL(), 
                 dbname = "tweater", 
                 host = "courses.csrrinzqubik.us-east-1.rds.amazonaws.com", 
                 port = 3306,
                 user = "student",
                 password = "datacamp")

21.2 List the database tables

After you’ve successfully connected to a remote MySQL database, the next step is to see what tables the database contains. You can do this with the dbListTables() function. As you might remember from the video, this function requires the connection object as an input, and outputs a character vector with the table names.

Instructions 100 XP

  • Add code to create a vector tables, that contains the tables in the tweater database. You can connect to this database through the con object.

  • Display the structure of tables; what’s the class of this vector?

ex_002.R
# Load the DBI package
library(DBI)
# Connect to the MySQL database: con
con <- dbConnect(RMySQL::MySQL(), 
                 dbname = "tweater", 
                 host = "courses.csrrinzqubik.us-east-1.rds.amazonaws.com", 
                 port = 3306,
                 user = "student",
                 password = "datacamp")
# Build a vector of table names: tables
tables <- dbListTables(con)
# Display structure of tables
str(tables)
class(tables)

21.3 Import users

As you might have guessed by now, the database contains data on a more tasty version of Twitter, namely Tweater. Users can post tweats with short recipes for delicious snacks. People can comment on these tweats. There are three tables: users, tweats, and comments that have relations among them. Which ones, you ask? You’ll discover in a moment!

Let’s start by importing the data on the users into your R session. You do this with the ‘dbReadTable()’ function. Simply pass it the connection object (‘con’), followed by the name of the table you want to import. The resulting object is a standard R data frame.

Instructions 100 XP

  • Add code that imports the “users” table from the tweater database and store the resulting data frame as users.
  • Print the users data frame.
ex_003.R
# Load the DBI package
library(DBI)

# Connect to the MySQL database: con
con <- dbConnect(RMySQL::MySQL(), 
                 dbname = "tweater", 
                 host = "courses.csrrinzqubik.us-east-1.rds.amazonaws.com", 
                 port = 3306,
                 user = "student",
                 password = "datacamp")

# Import the users table from tweater: users
users <- dbReadTable(con, "users")

# Print users
users

21.4 Import all tables

Next to the users, we’re also interested in the tweats and comments tables. However, separate dbReadTable() calls for each and every one of the tables in your database would mean a lot of code duplication. Remember about the lapply() function? You can use it again here! A connection is already coded for you, as well as a vector table_names, containing the names of all the tables in the database. ### Instructions 100 XP {.unnumbered}

  • Finish the lapply() function to import the users, tweats and comments tables in a single call. The result, a list of data frames, will be stored in the variable tables.
  • Print tables to check if you got it right.
ex_004.R
# Load the DBI package
library(DBI)

# Connect to the MySQL database: con
con <- dbConnect(RMySQL::MySQL(), 
                 dbname = "tweater", 
                 host = "courses.csrrinzqubik.us-east-1.rds.amazonaws.com", 
                 port = 3306,
                 user = "student",
                 password = "datacamp")

# Get table names
table_names <- dbListTables(con)

# Import all tables
tables <- lapply(table_names, dbReadTable, conn = con)

# Print out tables
tables