Tutorial: Using R to Convert File Types
Carolina Data Desk recently received a request from a reporter looking to convert .dta files (data files used with STATA software) to .csv files. In this post, we’ll walk through how you can easily do this using R.
Reading in the Existing Files
Our first order of business is reading our existing data into R. R has several built-in functions for reading in data files. If you type “read.” in RStudio, you will see a list of suggested functions like read.csv and read.delim. If you are trying to convert .csv, .txt, or other file formats covered in the list of read functions, you’re in good shape.
If you are wondering why we initially mentioned converting .dta files while there is no read.dta function in R, don’t worry! We can add this function to R by loading the foreign package. Because R is an open-source programming language, developers create add-ons called packages that provide additional functionality. To load this package, you will first have to install it using the command install.packages(“foreign”). You only need to install a package once, and when you want to use it later in a new R session, you can load it with the command library(foreign). Now, when you type “read.” in RStudio, there is a read.dta function.
To read in your existing files, you just need to run the command read.dta(“filename.dta”). Depending on where your file is stored and what working directory RStudio is using, you may need to include the entire file path, i.e. read.dta(“/Users/Documents/filename.dta”). Since the read command just prints the data in the console, you need to save this to a variable using R’s assignment command. It’s probably best if this variable name is similar to the file name to avoid confusion later. The full command for reading in a file and assigning it to a variable should look like this:
variable_name <- read.dta("/Users/Documents/filename.dta")
Exporting to a Different File Format
Now that we’ve imported our data into R, we have one more step to finish the conversion. To do this, we want to use the write.csv function. This function takes two inputs: a variable containing the data you want to export, and the name you want to give your exported file. The variable will be the variable name we assigned in our first step when we imported the data, and the file name can be whatever you want it to be. I would generally recommend keeping the same file name as the original so you will know exactly where the data came from, but use your discretion. The command you will use should look something like this:
That’s it! You can repeat this for other files and other file types as necessary. And while all of this can be done in the console of RStudio, you can also save your commands in an R script if you want to keep a record of what you’ve done.