Exercises for Chapter 4

Exercise 1: Problems in CSV files

In the repository for the exercises for this chapter, you find three modifications of the capital distance dataset that we have used above: file1.csv, file2.csv and file3.csv. When you import these files using read.csv(), you will see this generates problems since the files do not conform to a proper CSV format. For each file, find out what the problems are, and how you can fix them. Sometimes you need to manually edit the CSV file, but in some cases it is possible to adjust the read.csv() function such that you can leave the original file unchanged (which is usually preferable).

Exercise 2: File encodings and special characters

In file turkish-cities.csv, you find a list of the ten largest cities in Turkey and their populations from the GeoNames project. The city names are given both in the English transliteration as well as in Turkish, the latter of which include a number of special characters. However, this file uses an unknown encoding, with leads to many of the city names to become garbled up when you import the file using read.csv(). Can you find out what the file coding is? Can you adjust the import function such that the file is read correctly? Hint: the guess_encoding() function can point you in the right direction (the ISO standard), but this standard consists of several parts. You need to find out yourself what the correct part for Turkish is, since the function fails to recognize it properly.

Exercise 3: Labeling variables

In this exercise, we again use the UN Security Council membership data from Chapter 4 from file unsc-membership.xls. Load the data, label the variables (the labelled package is useful here) and save the file in Stata format. Re-open it and check if the labels are still there (or check with Stata if you have a license).

Exercise 4: Guessing file types

In the repository for the exercises for this chapter, you will find three files without a proper file ending: unknown-file1, unknown-file2 and unknown-file3. All of them have the same content (a simple table with three columns and six rows). The purpose of this exercise is to find out what the file types are, and correctly import them into R using the appropriate functions introduced in the chapter. Sometimes, it helps to look at the file in RStudio’s text editor, which can give you certain clues regarding the file type. If this fails, you can try the different import functions we have discussed in the chapter. While files should have a proper ending to clearly indicate the file type, is this necessary for R’s import functions to work properly?