Exercises for Chapter 10

In these exercises, we use again the database dbtuning and the survey table we have created in the chapter, but without the indexes for the city and year variables. You can regenerate this table with

library(RPostgres)

db <- dbConnect(Postgres(),
  dbname = "dbtuning",
  user = "postgres",
  password = "pgpasswd"
)
dbExecute(db, "DROP TABLE IF EXISTS survey")
dbExecute(db, "CREATE TABLE survey AS
          (SELECT * FROM
            generate_series(1,100) AS person,
            generate_series(1,1000) AS city,
            generate_series(1970, 2020) AS year,
            random() AS result)")

We also create a cities table with additional (random) data about the 1000 cities in the survey:

dbExecute(db, "DROP TABLE IF EXISTS cities")
dbExecute(db, "CREATE TABLE cities AS
          (SELECT * FROM
            generate_series(1,1000) AS id,
            random() AS city_var)")

Exercise 1: Automatically Creating Identifiers

As we have seen, it is highly recommended to have unique identifiers for the records in a table. These identifiers are often integer numbers. In this exercise, we will show how PostgreSQL helps you create and maintain these identifiers. This is done by adding a field of type serial to a table. A serial is an integer field, which is automatically populated by the database with unique values. Let’s see how this works. Add a new column id to the survey table that is of type serial. Next, check the values of the column by computing is minimum and maximum. Now, add a new record to the table (you can choose arbitrary values). When you do so, simply omit the value for the new id field! Finally, retrieve the newly inserted record and check what id value it has.

Exercise 2: Indexed and Non-indexed Joins between Tables

In this exercise, we explore how indexes affect joins between tables. As described above, we have a cities table that contains additional data about the cities that our survey respondents live in. We would like to join the survey data with the city-level data based on the city field in the survey table, which corresponds to the id field in the cities table. For each of the following joins, write a simple SELECT statement that counts the number of rows in the result (SELECT count(*) FROM ... JOIN ...) and measure the execution time.

Join the survey table with cities
Create a primary key on the id field in cities, and execute the join again. What do you observe?
Configure the database such that city in the survey table is now a foreign key that points to id in cities, and execute the join.

How does the execution time change during steps 1-3? Do you have an explanation for this?

Exercise 3: Indexing Text Fields

In this exercise, we explore if PostgreSQL’s default indexes also help up speed up searching text fields. For this, add a new column respondent_name of type VARCHAR column to the survey table. We fill this column with random strings. One way to do this is to use an UPDATE statement and assign the random strings as follows: respondent_name = md5(random()::text). Then, for each of the following operations, write a simple SELECT statement that counts the number of rows in the result (SELECT count(*) FROM ... WHERE ...) and measure the execution time:

Respondent names that contain the string abc
Respondent names that match the string abc exactly

For this, recall that the LIKE operator may be useful.

Next, create an index on respondent_name, repeat the two operations, and measure again the execution time. What do you observe? Where is the index most helpful?

Exercise 4: Database Privileges at the Column Level

In this exercise, we will deal again with database privileges that we have covered in the chapter. Recall that we have set up an additional user other, and we gave this user the privilege to SELECT and to UPDATE the surveys table. At the end of the chapter, we revoked all these privileges, so the user can no longer access the table in any way:

dbExecute(db, "REVOKE ALL PRIVILEGES ON survey FROM other")

Let us now assume that user other serves as a research assistant in a project, and you would like this person to update the survey data. However, you would like to limit the changes this user can perform to a single column, such that the existing table remains unchanged. Can you use the PostgreSQL online documentation to find out how to do this? Hint: Since UPDATE requires the user to also reference the column names, you need to also grant SELECT privileges. As in the chapter, make sure to open again a second connection db1, which connects as user other.