Data Management for Social Scientists: From Files to Databases
Last updated: 25 July 2023
Welcome to the companion website for the book! To get started, there are two steps you need to complete:
- Setup the necessary software
- Download the pre-configured project and the data examples.
Both steps are described below. Before you start working with this material, please go through Chapter 2 in the book to complete the project configuration!
Step 1: Software Setup
The book relies on several software tools: The R statistical toolkit, the RStudio development environment for R, as well as the PostgreSQL relational database system. Please refer to the following documents for detailed instructions on how to install these tools on your system.
Step 2: Download Project Configuration and Data Examples
The zipped archive containing the project configuration and the data examples is available at
Simply unzip the archive and move all the files to a location of your choice. This will be your working directory. Some important notes:
dmbook.Rproj file contains the project configuration to be used with RStudio.
The R project is configured such that it comes with a list of extension packages you need for the book. This configuration uses the
renv package and requires the information stored in the
renv.lock file. You do not need to do anything with this file – Chapter 2 in the book describes how to automatically install the required packages with
renv. If, for some reason, this does not work, you can run the code in
install-all-packages.R to install the packages manually.
Note that for many of the required packages, the project does not use the most recent versions. This is intentional, because it helps to keep the R environment aligned with the package versions discussed in the book. When you start a new R project, however, make sure to use a separate R environment with the most recent package versions.
This downloaded archive also contains the necessary data files for the code and exercises presented in the book. Data files are organized by chapter (
ch04 etc). For each chapter, the references to the datasets are provided in the
The data preparation is documented in
prepare-data.R. For readers of this book, there is no need to run this file, since the required data files can be obtained directly from this repository. The script downloads each external dataset to a subfolder under the
raw directory, applies the necessary modifications and copies the final files to one of the data subdirectories (
Many of the original datasets are modified to facilitate presentation in the book. Modifications include the dropping of variables or cases, the renaming of files, or changes in the file format. See the code in the data preparation script for the modifications applied to the data.
If you encounter an error in the data and configuration provided here, please file an issue at https://github.com/nilsbw/dmbook-setup/issues.