Importing SAV-file in R with haven

In this short article I introduce the haven package in R, and how it can be used for importing data from SPSS in SAV-file format into R. In previous posts I demonstrated how to read in data in R from formats such as csv, json, xlsx and xml. I have also written articles covering sqlite3 in R and sqlite3 in Python. I demonstrated how one can connect a sqlite3 database engine to Python.

Analysts can use the “haven” package in R for reading and processing data from SAS, SPSS and Stata. I can use different functions for this, provided by the haven package: – read_sas() for reading data in SAS format – read_sav() for reading data in SAV-file format, from SPSS – read_dta() for reading in data dta-format from Stata

Below I install the haven package and read in an exemplaric sav-file, using the read_sav function.

#install.packages("haven")
library(haven)
## Warning: package 'haven' was built under R version 3.6.3
data = read_sav("data.sav")
head(data)
## # A tibble: 6 x 35
##   CollectionID Country Countrycode Location Region Site  Latitude Longitude
##          <dbl> <chr>   <chr>       <chr>    <chr>  <chr>    <dbl>     <dbl>
## 1        63079 France  FR          Thiverv~ Yveli~ N/A       48.9      1.92
## 2        63081 France  FR          Monchy ~ Somme  N/A       49.8      3.05
## 3        63086 France  FR          Goudelin Côtes~ N/A       48.5     -3.02
## 4        63089 France  FR          Fiefs    Pas-d~ N/A       50.4      2.33
## 5        63090 France  FR          Villene~ Esson~ N/A       48.4      2.25
## 6        63099 France  FR          Tatingh~ Pas-d~ N/A       50.7      2.21
## # ... with 27 more variables: Long_BIN <dbl>, Lat_BIN <dbl>,
## #   Long_Lat_Bin <dbl>, Envi_BIN <chr>, Cultivar <chr>, Geneticgroup <S3:
## #   haven_labelled>, Racename <chr>, Yr_complexity <dbl>, miss_Yr <dbl>,
## #   Yr1 <dbl>, Yr2 <dbl>, Yr3 <dbl>, Yr4 <dbl>, Yr6 <dbl>, Yr7 <dbl>,
## #   Yr8 <dbl>, Yr9 <dbl>, Yr10 <dbl>, Yr15 <dbl>, Yr17 <dbl>, Yr24 <dbl>,
## #   Yr25 <dbl>, Yr27 <dbl>, Yr32 <dbl>, YrSp <dbl>, YrAvS <dbl>,
## #   YrAmb <dbl>

Data is read in tibble format, as demonstrated below.

class(data)
## [1] "tbl_df"     "tbl"        "data.frame"

When reading in data from e.g. a sav-file using the haven R-package value labels are translated into new labels, of class labelled. This ensures that the original semantic can be maintained. Labels can be turned into factors using as_factor().

The haven package does NOT convert character vectors into factors.

When reading in data using the haven package dates and times are converted into date and time class format.

Leave a Reply

Leave a Reply

Your email address will not be published. Required fields are marked *

This site uses Akismet to reduce spam. Learn how your comment data is processed.

Close

Meta