Exercises for Chapter 10
In these exercises, we use again the database dbtuning
and the survey
table we have created in the chapter, but without the indexes for the city
and year
variables. You can regenerate this table with
library(RPostgres)
db <- dbConnect(Postgres(),
dbname = "dbtuning",
user = "postgres",
password = "pgpasswd"
)
dbExecute(db, "DROP TABLE IF EXISTS survey")
dbExecute(db, "CREATE TABLE survey AS
(SELECT * FROM
generate_series(1,100) AS person,
generate_series(1,1000) AS city,
generate_series(1970, 2020) AS year,
random() AS result)")
We also create a cities
table with additional (random) data about the 1000 cities in the survey:
dbExecute(db, "DROP TABLE IF EXISTS cities")
dbExecute(db, "CREATE TABLE cities AS
(SELECT * FROM
generate_series(1,1000) AS id,
random() AS city_var)")
Exercise 1: Automatically Creating Identifiers
As we have seen, it is highly recommended to have unique identifiers for the records in a table. These identifiers are often integer numbers. In this exercise, we will show how PostgreSQL helps you create and maintain these identifiers. This is done by adding a field of type serial
to a table. A serial
is an integer field, which is automatically populated by the database with unique values. Let’s see how this works. Add a new column id
to the survey
table that is of type serial
. Next, check the values of the column by computing is minimum and maximum. Now, add a new record to the table (you can choose arbitrary values). When you do so, simply omit the value for the new id
field! Finally, retrieve the newly inserted record and check what id
value it has.
Exercise 2: Indexed and Non-indexed Joins between Tables
In this exercise, we explore how indexes affect joins between tables. As described above, we have a cities
table that contains additional data about the cities that our survey respondents live in. We would like to join the survey data with the city-level data based on the city
field in the survey
table, which corresponds to the id
field in the cities
table. For each of the following joins, write a simple SELECT
statement that counts the number of rows in the result (SELECT count(*) FROM ... JOIN ...
) and measure the execution time.
- Join the
survey
table withcities
- Create a primary key on the
id
field incities
, and execute the join again. What do you observe? - Configure the database such that
city
in thesurvey
table is now a foreign key that points toid
incities
, and execute the join.
How does the execution time change during steps 1-3? Do you have an explanation for this?
Exercise 3: Indexing Text Fields
In this exercise, we explore if PostgreSQL’s default indexes also help up speed up searching text fields. For this, add a new column respondent_name
of type VARCHAR
column to the survey
table. We fill this column with random strings. One way to do this is to use an UPDATE
statement and assign the random strings as follows: respondent_name = md5(random()::text)
. Then, for each of the following operations, write a simple SELECT
statement that counts the number of rows in the result (SELECT count(*) FROM ... WHERE ...
) and measure the execution time:
- Respondent names that contain the string
abc
- Respondent names that match the string
abc
exactly
For this, recall that the LIKE
operator may be useful.
Next, create an index on respondent_name
, repeat the two operations, and measure again the execution time. What do you observe? Where is the index most helpful?
Exercise 4: Database Privileges at the Column Level
In this exercise, we will deal again with database privileges that we have covered in the chapter. Recall that we have set up an additional user other
, and we gave this user the privilege to SELECT and to UPDATE the surveys
table. At the end of the chapter, we revoked all these privileges, so the user can no longer access the table in any way:
dbExecute(db, "REVOKE ALL PRIVILEGES ON survey FROM other")
Let us now assume that user other
serves as a research assistant in a project, and you would like this person to update the survey data. However, you would like to limit the changes this user can perform to a single column, such that the existing table remains unchanged. Can you use the PostgreSQL online documentation to find out how to do this? Hint: Since UPDATE requires the user to also reference the column names, you need to also grant SELECT privileges. As in the chapter, make sure to open again a second connection db1
, which connects as user other
.