Exercises for Chapter 9
In these exercises, we use again the database dbadvanced
and the tables we have created in the chapter. Make sure that this database exists exactly as described in the chapter. Alternatively, you can regenerate this database with
library(RPostgres)
db <- dbConnect(Postgres(),
dbname = "dbadvanced",
user = "postgres",
password = "pgpasswd"
)
dbExecute(db, "DROP TABLE IF EXISTS elections")
dbExecute(db, "DROP TABLE IF EXISTS parties")
dbExecute(db, "DROP TABLE IF EXISTS first_elections")
elections <- read.csv(file.path("ch09", "elections.csv"), header = T)
elections$election_date <- as.Date(elections$election_date)
dbWriteTable(db, "elections", elections)
dbExecute(db, "ALTER TABLE elections
ADD COLUMN seat_share real,
ADD COLUMN year integer")
dbExecute(db, "UPDATE elections
SET seat_share=seats::real / seats_total,
year=extract(year from election_date)")
parties <- read.csv(file.path("ch09", "parties.csv"), header = T)
dbWriteTable(db, "parties", parties)
populist <- read.csv(file.path("ch09", "populist.csv"), header = T)
dbWriteTable(db, "populist", populist)
dbExecute(db, "ALTER TABLE parties ADD COLUMN populist integer")
dbExecute(
db,
"UPDATE parties
SET populist = populist.populist
FROM populist
WHERE parties.party_id = populist.parlgov_id"
)
dbExecute(db, "ALTER TABLE parties ADD PRIMARY KEY (party_id)")
dbExecute(db, "ALTER TABLE elections ADD FOREIGN KEY (party_id) REFERENCES parties (party_id)")
Exercise 1: Grouping and Aggregation with Joined Tables
The PopuList dataset that we have used in the Chapter also includes information about whether a party is considered to be “far left” or “far right”. In this exercise, we want to find out which of these extremist parties have the highest electoral experience, measured by the number of elections they have participated in. As a first step, merge the respective variables farleft
and farright
from the PopuList into the parties
table. Next, use grouping and aggregation to retrieve the number of elections that parties of either type have participated in. Can you order this list such that the parties with the most electoral experience appear at the top?
Exercise 2: Foreign Keys and Joins If Tables Do Not Match
Rather than merging values from populist
into parties
as new attributes, somebody suggests that we could simply join the two tables when we need them, and add a foreign key relationship such that referential integrity is maintained. What foreign key do you need for this? What happens if you try to set up a foreign key, and why?
It seems that there is a number of entries in the PopuList dataset that cause problems. Can you find these entries with a JOIN? Since the standard join retrieves only those records with matching combinations, we will have to use a different type of join. Use only a single SELECT statement for this!
Exercise 3: Saving Intermediate Results
Let us assume that for a new project, we want to know which election results parties achieve when they first run. For this, we need to flag those election results in elections
where a party first shows up in an election. All the data we need for this is already contained in the elections
table.
Write a short SQL statement that shows you the party_id
and the election_date
when the respective party first ran.
Create a separate table first_elections
, which contains the data created in the previous step. Hint: You can create a new table from a SELECT statement using CREATE TABLE xyz AS <select statement>
. This creates a copy of the result and stores it as a new table.
Finally, we want to update the elections
table such that it gets a new boolean variable first_election
, which indicates whether the party first ran in the given election. Hint: as described in the chapter, create the variable first, and then UPDATE it from the first_elections
table.
What is the average election result that parties achieve in their first elections?
Exercise 4: Cascading
Our elections
table depends on parties
, since the latter contains information on the parties participating in the elections. This is why in the chapter, we have defined party_id
to be a foreign key in election
, since it points to the parties
table. As we have seen in the chapter, the database system ensures that these relationships remain intact at the level of the individual records - for example, a record that is still referenced from the other table cannot be deleted. However, the DBMS also maintains these relationships at the level of entire tables. Try deleting the parties
table with DROP TABLE
. What happens, and why?
PostgreSQL gives you a specific recommendation for dealing with this problem: using the CASCADE
keyword. Change your statement such that is uses this keyword. What exactly does it do? Does it delete the elections
table? If not, what do you think has happened?