Exercises for Chapter 8
In these exercises, we use again the database we have created in the chapter. If you need to re-generate this database, you can do so with
library(RPostgres)
db <- dbConnect(Postgres(),
dbname = "dbintro",
user = "postgres",
password = "pgpasswd"
)
dbExecute(db, "CREATE TABLE elections (
election_id integer,
country_name varchar,
election_date date,
party_id integer,
vote_share real,
seats integer,
seats_total integer
)")
elections <- read.csv(file.path("ch08", "elections.csv"))
dbAppendTable(db, "elections", elections)
dbExecute(db, "ALTER TABLE elections ADD year integer")
dbExecute(db, "UPDATE elections SET year = extract(year from election_date)")
dbExecute(db, "ALTER TABLE elections ADD seat_share real")
dbExecute(db, "UPDATE elections SET seat_share=seats::real / seats_total")
Exercise 1: Categorizing SQL Statements
Recall that there are three different types of operations we can carry out on data tables: data definition, data manipulation and data extraction. Which of the three categories do the SQL commands that we have introduced in this chapter belong to? Assign CREATE TABLE, ALTER TABLE, DELETE, INSERT, UPDATE and SELECT to the correct category.
Exercise 2: Computing the (Effective) Number of Parties
In this exercise, we want to compute the number of parties in parliament after each election. We use again the elections
from ParlGov table we have created in the chapter. Start by computing the number of entries per election with a simple aggregation statement in SQL. The PostgreSQL documentation on aggregation functions may be helpful here. Does this give us the result we want? If not, how can fix the statement?
The problem with a simple count of the parties is that this ignores their size. To remedy this, Laakso and Taagepera (1979) introduce a measure called the “effective number of parties”, which is the inverted sum of the squared seat shares of the parties in parliament. Can you compute the effective number of parties for each election in our table with an SQL aggregation statement? How do you deal with parties with no seats that are present in the data?
Exercise 3: Adding a Binary Variable
We want flag all parties that had a majority in parliament after an election, i.e. more than 50% of the seats. To this end, add a new boolean variable to the elections table called majority
. Since most parties are not majority parties, we want the initial value of this variable to be FALSE
. Can you specify this when you create the new variable?
Next, recode the majority
variable such that it takes the value true
for all parties with a majority in parliament.
One problem with our new variable is that it also contains valid values for those parties not represented in parliament. To fix this, update the variable such that its values are missing for parties without seats in parliament.
Finally, delete the new variable again, such that we preserve the original elections
table.
Exercise 4: Finding Duplicates
In this exercise, we use aggregation in SQL to find duplicates in a table. Import the elections-duplicates.csv
dataset located in the ex08
folder as a new table duplicates
in your database. Make sure to leave the original elections table intact. As a first step, we count how often each combination of field values occurs in the table. How can you do this with a simple aggregation statement?
The duplicates in our table are those where the above computation returns counts greater than 1. Is there a simple way to display only the duplicates, rather than having to go through the entire result? Hint: The HAVING
clause may be of use here, see the PostgreSQL documentation. What is the difference between WHERE
and HAVING
?