Data Management for Social Scientists: From Files to Databases

Last updated: 29 April 2025

Welcome to the companion website for the book! To get started, there are two steps you need to complete:

Setup the necessary software
Download the pre-configured project and the data examples.

Both steps are described below. Before you start working with this material, please go through Chapter 2 in the book to complete the project configuration!

Step 1: Software Setup

The book relies on several software tools: The R statistical toolkit, the RStudio development environment for R, as well as the PostgreSQL relational database system. Please refer to the following documents for detailed instructions on how to install these tools on your system.

Step 2: Download Project Configuration and Data Examples

The zipped archive containing the project configuration and the data examples is available at

https://github.com/nilsbw/dmbook-setup/archive/refs/heads/main.zip.

Simply unzip the archive and move all the files to a location of your choice. This will be your working directory. Some important notes:

Project File

The dmbook.Rproj file contains the project configuration to be used with RStudio.

Project Environment

The R project is configured such that it comes with a list of extension packages you need for the book. This configuration uses the renv package and requires the information stored in the renv.lock file. You do not need to do anything with this file – Chapter 2 in the book describes how to automatically install the required packages with renv. If, for some reason, this does not work, you can run the code in install-all-packages.R to install the packages manually.

Note that for many of the required packages, the project does not use the most recent versions. This is intentional, because it helps to keep the R environment aligned with the package versions discussed in the book. When you start a new R project, however, make sure to use a separate R environment with the most recent package versions.

Data

This downloaded archive also contains the necessary data files for the code and exercises presented in the book. Data files are organized by chapter (ch04 etc). For each chapter, the references to the datasets are provided in the REFERENCES.md file.

The data preparation is documented in prepare-data.R. For readers of this book, there is no need to run this file, since the required data files can be obtained directly from this repository. The script downloads each external dataset to a subfolder under the raw directory, applies the necessary modifications and copies the final files to one of the data subdirectories (ch04 etc).

Many of the original datasets are modified to facilitate presentation in the book. Modifications include the dropping of variables or cases, the renaming of files, or changes in the file format. See the code in the data preparation script for the modifications applied to the data.

Issues

If you encounter an error in the data and configuration provided here, please file an issue at https://github.com/nilsbw/dmbook-setup/issues.