Making a wrapper to Import and Plot Data

To make research reproducible, we have to automate the way R performs tasks.

Paulo Ferreira Naibert https://github.com/pfnaibert/
2020-08-18

Last Updated 2020-09-03

In my last post, I showed how to import a .csv file and plot the data. This, entailed following the steps outlined in the post. In this post, I will make a wrapper function to automate the importing and formating.

Import Wrapper

Remembering that I already downloaded the data and saved the only column that I will need in a .csv file. Our wrapper function is:


import_gdp <- function(filename)
{

# import csv
data  <- read.csv(filename)

# delete last 2 lines
data  <- head(data, -2)

# save dates
dates <- data[,1]

# delete first row
data  <- data[,-1]

# rearrange strings in dates
dates <- gsub(" T", ":Q", dates)

# give names to values
names(data) <- dates

# `ts` object
data.ts <- ts(data, start=c(1980, 1), end=c(2014, 3), freq=4 )

return(data.ts)
}

And I can use import_gdp to import the file and format it as I like it:


# check loaded objects
ls()

[1] "import_gdp"

# import data
y <- import_gdp("../../data/PIB_BASE_2000.csv")

# quick visualization
head(y)

1980:Q1 1980:Q2 1980:Q3 1980:Q4 1981:Q1 1981:Q2 
   70.6    76.3    75.5    73.2    70.5    74.3 

# print whole series
y

      Qtr1  Qtr2  Qtr3  Qtr4
1980  70.6  76.3  75.5  73.2
1981  70.5  74.3  71.0  67.3
1982  67.3  74.8  73.7  69.5
1983  65.2  71.9  71.1  68.8
1984  68.0  75.3  74.9  73.8
1985  72.6  79.4  81.8  81.1
1986  77.9  85.4  88.3  86.9
1987  83.8  91.1  88.8  86.7
1988  83.8  90.8  90.8  84.8
1989  81.5  93.8  95.5  90.4
1990  83.6  85.2  91.8  85.0
1991  83.5  87.7  91.0  87.0
1992  80.8  85.9  90.2  90.6
1993  84.7  89.7  94.7  94.7
1994  87.6  91.6  99.9 103.9
1995  96.5  99.8 101.7 102.1
1996  95.6 100.8 107.8 104.4
1997  99.2 105.8 109.5 107.9
1998 100.0 107.4 109.4 105.8
1999 100.5 106.5 108.3 108.2
2000 105.3 110.7 112.9 112.9
2001 109.0 113.3 113.2 112.2
2002 109.1 115.4 117.4 117.6
2003 111.7 116.4 118.1 118.6
2004 116.4 123.6 125.5 125.9
2005 121.2 129.0 128.2 128.5
2006 126.5 131.5 134.3 134.8
2007 133.0 139.9 142.4 143.8
2008 141.4 148.9 152.5 145.1
2009 137.5 145.4 150.3 152.9
2010 150.4 158.1 160.7 161.0
2011 156.8 163.3 164.1 163.2
2012 158.0 164.2 165.7 166.2
2013 161.0 169.9 169.7 169.8
2014 164.1 168.4 169.3      

# print series time
time(y)

        Qtr1    Qtr2    Qtr3    Qtr4
1980 1980.00 1980.25 1980.50 1980.75
1981 1981.00 1981.25 1981.50 1981.75
1982 1982.00 1982.25 1982.50 1982.75
1983 1983.00 1983.25 1983.50 1983.75
1984 1984.00 1984.25 1984.50 1984.75
1985 1985.00 1985.25 1985.50 1985.75
1986 1986.00 1986.25 1986.50 1986.75
1987 1987.00 1987.25 1987.50 1987.75
1988 1988.00 1988.25 1988.50 1988.75
1989 1989.00 1989.25 1989.50 1989.75
1990 1990.00 1990.25 1990.50 1990.75
1991 1991.00 1991.25 1991.50 1991.75
1992 1992.00 1992.25 1992.50 1992.75
1993 1993.00 1993.25 1993.50 1993.75
1994 1994.00 1994.25 1994.50 1994.75
1995 1995.00 1995.25 1995.50 1995.75
1996 1996.00 1996.25 1996.50 1996.75
1997 1997.00 1997.25 1997.50 1997.75
1998 1998.00 1998.25 1998.50 1998.75
1999 1999.00 1999.25 1999.50 1999.75
2000 2000.00 2000.25 2000.50 2000.75
2001 2001.00 2001.25 2001.50 2001.75
2002 2002.00 2002.25 2002.50 2002.75
2003 2003.00 2003.25 2003.50 2003.75
2004 2004.00 2004.25 2004.50 2004.75
2005 2005.00 2005.25 2005.50 2005.75
2006 2006.00 2006.25 2006.50 2006.75
2007 2007.00 2007.25 2007.50 2007.75
2008 2008.00 2008.25 2008.50 2008.75
2009 2009.00 2009.25 2009.50 2009.75
2010 2010.00 2010.25 2010.50 2010.75
2011 2011.00 2011.25 2011.50 2011.75
2012 2012.00 2012.25 2012.50 2012.75
2013 2013.00 2013.25 2013.50 2013.75
2014 2014.00 2014.25 2014.50        

Extensions

On this function I could have created variables to pass options about the dates and the frequency of data. We won’t do this today, but I do intend to make a following post about how to scrape the brazilian GDP data from the SIDRA-IBGE site.

Plot Wrapper

Also, I can make a wrapper for the plot script.


plot_gdp <- function(data, rec_dates)
{

# blank plot
plot(NULL,
 main = "Cronologia Trimestral dos Ciclos de Negócios Brasileiros",
 xlim = c(1980, 2015), ylim = c(60,200),
 xlab = "", ylab = "")

# recession rectangles
for(i in 1:(length(rec_dates)/2) ) rect(rec_dates[1+2*(i-1)], 1, rec_dates[2*i], 250, col="gray", border=T)

# gdp line
lines(data, col="blue", lwd=1.5)
}

The plot_gdp() function uses the rec_dates input to make the recession rectangles, so I must create a vector with recession limit dates. We use this document to get the dates.


# fun to transform dates into numeric
qtr2num <- function(data) return( as.numeric(substring(data, 1, 4)) - 1/4 + as.numeric( substring(data, 7) )/4 )

# recession limits in numeric dates

recs <- c( "1981:Q1", "1983:Q1",
 "1987:Q3", "1988:Q3",
 "1989:Q3", "1992:Q1",
 "1995:Q2", "1995:Q3",
 "1998:Q1", "1999:Q1",
 "2001:Q2", "2001:Q4",
 "2003:Q1", "2003:Q2",
 "2008:Q4", "2009:Q1" )

saveRDS(recs, "../../data/gdp-recessions.rds")

newrecs <- qtr2num(recs)

Now I can make the plot with a single line:


# graph with wrapper
plot_gdp(y, newrecs)

Beacause the rec_dates argument generalizes the recession limit dates, I can make expansion (instead of recession) rectangles. Again, we use this document to get the dates.


# expansion limits in numeric dates
exps <- c( "1983:Q2", "1987:Q2",
"1989:Q1", "1989:Q2",
"1992:Q2", "1995:Q1",
"1995:Q4", "1997:Q4",
"1999:Q2", "2001:Q1",
"2002:Q1", "2002:Q4",
"2003:Q3", "2008:Q3",
"2009:Q2", "2014:Q1" )

saveRDS(recs, "../../data/gdp-expansions.rds")

newexps <- qtr2num(exps)

# graph with wrapper
plot_gdp(y, newexps)

Error Catching

Notice that I didn’t have any error catching functions, I did this because that would be beyond the scope of the post. If my goal was to make a more general function, I would absolutely have to include error catching lines.

Final Remarks

To make research reproducible, we have to automate the way R executes tasks. In this post I do this via wrappers.

Corrections

If you see mistakes or want to suggest changes, please create an issue on the source repository.