To make research reproducible, we have to automate the way R
performs tasks.
In my last post, I showed how to import a .csv
file and plot the data. This, entailed following the steps outlined in the post. In this post, I will make a wrapper function to automate the importing and formating.
Remembering that I already downloaded the data and saved the only column that I will need in a .csv
file. Our wrapper function is:
import_gdp <- function(filename)
{
# import csv
data <- read.csv(filename)
# delete last 2 lines
data <- head(data, -2)
# save dates
dates <- data[,1]
# delete first row
data <- data[,-1]
# rearrange strings in dates
dates <- gsub(" T", ":Q", dates)
# give names to values
names(data) <- dates
# `ts` object
data.ts <- ts(data, start=c(1980, 1), end=c(2014, 3), freq=4 )
return(data.ts)
}
And I can use import_gdp
to import the file and format it as I like it:
# check loaded objects
ls()
[1] "import_gdp"
# import data
y <- import_gdp("../../data/PIB_BASE_2000.csv")
# quick visualization
head(y)
1980:Q1 1980:Q2 1980:Q3 1980:Q4 1981:Q1 1981:Q2
70.6 76.3 75.5 73.2 70.5 74.3
# print whole series
y
Qtr1 Qtr2 Qtr3 Qtr4
1980 70.6 76.3 75.5 73.2
1981 70.5 74.3 71.0 67.3
1982 67.3 74.8 73.7 69.5
1983 65.2 71.9 71.1 68.8
1984 68.0 75.3 74.9 73.8
1985 72.6 79.4 81.8 81.1
1986 77.9 85.4 88.3 86.9
1987 83.8 91.1 88.8 86.7
1988 83.8 90.8 90.8 84.8
1989 81.5 93.8 95.5 90.4
1990 83.6 85.2 91.8 85.0
1991 83.5 87.7 91.0 87.0
1992 80.8 85.9 90.2 90.6
1993 84.7 89.7 94.7 94.7
1994 87.6 91.6 99.9 103.9
1995 96.5 99.8 101.7 102.1
1996 95.6 100.8 107.8 104.4
1997 99.2 105.8 109.5 107.9
1998 100.0 107.4 109.4 105.8
1999 100.5 106.5 108.3 108.2
2000 105.3 110.7 112.9 112.9
2001 109.0 113.3 113.2 112.2
2002 109.1 115.4 117.4 117.6
2003 111.7 116.4 118.1 118.6
2004 116.4 123.6 125.5 125.9
2005 121.2 129.0 128.2 128.5
2006 126.5 131.5 134.3 134.8
2007 133.0 139.9 142.4 143.8
2008 141.4 148.9 152.5 145.1
2009 137.5 145.4 150.3 152.9
2010 150.4 158.1 160.7 161.0
2011 156.8 163.3 164.1 163.2
2012 158.0 164.2 165.7 166.2
2013 161.0 169.9 169.7 169.8
2014 164.1 168.4 169.3
# print series time
time(y)
Qtr1 Qtr2 Qtr3 Qtr4
1980 1980.00 1980.25 1980.50 1980.75
1981 1981.00 1981.25 1981.50 1981.75
1982 1982.00 1982.25 1982.50 1982.75
1983 1983.00 1983.25 1983.50 1983.75
1984 1984.00 1984.25 1984.50 1984.75
1985 1985.00 1985.25 1985.50 1985.75
1986 1986.00 1986.25 1986.50 1986.75
1987 1987.00 1987.25 1987.50 1987.75
1988 1988.00 1988.25 1988.50 1988.75
1989 1989.00 1989.25 1989.50 1989.75
1990 1990.00 1990.25 1990.50 1990.75
1991 1991.00 1991.25 1991.50 1991.75
1992 1992.00 1992.25 1992.50 1992.75
1993 1993.00 1993.25 1993.50 1993.75
1994 1994.00 1994.25 1994.50 1994.75
1995 1995.00 1995.25 1995.50 1995.75
1996 1996.00 1996.25 1996.50 1996.75
1997 1997.00 1997.25 1997.50 1997.75
1998 1998.00 1998.25 1998.50 1998.75
1999 1999.00 1999.25 1999.50 1999.75
2000 2000.00 2000.25 2000.50 2000.75
2001 2001.00 2001.25 2001.50 2001.75
2002 2002.00 2002.25 2002.50 2002.75
2003 2003.00 2003.25 2003.50 2003.75
2004 2004.00 2004.25 2004.50 2004.75
2005 2005.00 2005.25 2005.50 2005.75
2006 2006.00 2006.25 2006.50 2006.75
2007 2007.00 2007.25 2007.50 2007.75
2008 2008.00 2008.25 2008.50 2008.75
2009 2009.00 2009.25 2009.50 2009.75
2010 2010.00 2010.25 2010.50 2010.75
2011 2011.00 2011.25 2011.50 2011.75
2012 2012.00 2012.25 2012.50 2012.75
2013 2013.00 2013.25 2013.50 2013.75
2014 2014.00 2014.25 2014.50
On this function I could have created variables to pass options about the dates and the frequency of data. We won’t do this today, but I do intend to make a following post about how to scrape the brazilian GDP data from the SIDRA-IBGE site.
Also, I can make a wrapper for the plot script.
plot_gdp <- function(data, rec_dates)
{
# blank plot
plot(NULL,
main = "Cronologia Trimestral dos Ciclos de Negócios Brasileiros",
xlim = c(1980, 2015), ylim = c(60,200),
xlab = "", ylab = "")
# recession rectangles
for(i in 1:(length(rec_dates)/2) ) rect(rec_dates[1+2*(i-1)], 1, rec_dates[2*i], 250, col="gray", border=T)
# gdp line
lines(data, col="blue", lwd=1.5)
}
The plot_gdp()
function uses the rec_dates
input to make the recession rectangles, so I must create a vector with recession limit dates. We use this document to get the dates.
# fun to transform dates into numeric
qtr2num <- function(data) return( as.numeric(substring(data, 1, 4)) - 1/4 + as.numeric( substring(data, 7) )/4 )
# recession limits in numeric dates
recs <- c( "1981:Q1", "1983:Q1",
"1987:Q3", "1988:Q3",
"1989:Q3", "1992:Q1",
"1995:Q2", "1995:Q3",
"1998:Q1", "1999:Q1",
"2001:Q2", "2001:Q4",
"2003:Q1", "2003:Q2",
"2008:Q4", "2009:Q1" )
saveRDS(recs, "../../data/gdp-recessions.rds")
newrecs <- qtr2num(recs)
Now I can make the plot with a single line:
# graph with wrapper
plot_gdp(y, newrecs)
Beacause the rec_dates
argument generalizes the recession limit dates, I can make expansion (instead of recession) rectangles. Again, we use this document to get the dates.
# expansion limits in numeric dates
exps <- c( "1983:Q2", "1987:Q2",
"1989:Q1", "1989:Q2",
"1992:Q2", "1995:Q1",
"1995:Q4", "1997:Q4",
"1999:Q2", "2001:Q1",
"2002:Q1", "2002:Q4",
"2003:Q3", "2008:Q3",
"2009:Q2", "2014:Q1" )
saveRDS(recs, "../../data/gdp-expansions.rds")
newexps <- qtr2num(exps)
# graph with wrapper
plot_gdp(y, newexps)
Notice that I didn’t have any error catching functions, I did this because that would be beyond the scope of the post. If my goal was to make a more general function, I would absolutely have to include error catching lines.
To make research reproducible, we have to automate the way R
executes tasks. In this post I do this via wrappers.
If you see mistakes or want to suggest changes, please create an issue on the source repository.