Exercises for Chapter 7
For the exercises, we need again the datasets used in Chapter 7 of the book. You can import and prepare these datasets using the code presented in the chapter.
The Polity IV dataset:
library(readxl)
polity <- read_excel(file.path("ch07", "polity.xls"))
The WID:
library(tidyverse)
library(countrycode)
wid <- read_csv(file.path("ch07", "inequality.csv"), na = "")
wid <- wid %>% select(country, year, value)
wid <- wid %>% rename(p90p100 = value)
wid <- wid %>% mutate(ccode = countrycode(country, "iso2c", "cown"))
wid <- wid %>% mutate(ccode = ifelse(country == "RS", 345, ccode))
Exercise 1: Creating New Variables and Temporal Lags
Sometimes, a binary categorization of regimes into autocracies and democracies may be too simple. The Polity project proposes an alternative classification of regimes as “autocracies” (-10 to -6), “anocracies” (-5 to +5 and three special values: -66, -77 and -88), and “democracies” (+6 to +10). Note that this categorization is based on the polity
variable in the dataset, not the polity2
variable. Add a new variable regimetype
to the dataset that implements this categorization. Use numeric codes for the regime types (0 for autocracies, 1 for anocracies, and 2 for democracies). The case_when()
function is useful for this. Next, add a one-year lag of this variable to the data. Which cases are there where a country moved from an autocracy straight to a democratic system, skipping the anocracy category?
Exercise 2: Grouping and Aggregation
Suppose we want to select those countries from the Polity dataset that were either democracies (polity2 >= 6
) or non-democracies (polity2 < 6
) over the entire observation period. How can you do this in a single tidyverse statement? Hint: use a simple grouping/aggregation procedure. Make sure you exclude country-years for which the polity2
is missing.
Exercise 3: Joins with More Complex Join Conditions
A different way to study the evolution of inequality over time pattern is to compute in which year the different countries attained their maximum level of inequality. First, for each country, compute its maximum inequality according to the WID. Store these values in a separate table wid_max
. Second, determine the year in which each country reached this maximum level. You can do this by joining wid_max
and the original WID. What attributes should you base your join on? Consult the documentation to find out how to specify more complex join conditions. What potential problem do you see in the data? How would you solve it?
Exercise 4: Self-joining a Dataset
In this exercise, we want to identify those countries with the most extreme fluctuations in inequality over the entire observation period, in other words, where the difference between the maximum and the minimum level of inequality is particularly large. We want measure these fluctuations to occur with arbitrary periods of five years. The easiest way to do this is to use the Cartesian Product method, using the same table twice. Create pairs of observations for the same country, which are a maximum of five years apart. We can then sort these pairs in decreasing order to get the result we want. Hint: you will need to consult the documentation for the dplyr
join function to find out how to create a Cartesian Product (which in the documentation is referred to as a “cross-join”). Think about which of the combinations of rows from the two tables do you need to keep!