Graphing the Brazilian GDP in R
.
In this post, we want to make a similar plot as the figure in this pdf.
First, I already downloaded the data and saved the only column that I will need in a .csv
file. Next, I will import the data to R
and get some quick visualizations.
# Import
PIB <- read.csv("../../data/PIB_BASE_2000.csv")
# info about PIB
str(PIB)
'data.frame': 141 obs. of 2 variables:
$ Data : chr "1980 T1" "1980 T2" "1980 T3" "1980 T4" ...
$ PIB...preços.de.mercado...índice.encadeado..média.1995...100....ref..2000.......Instituto.Brasileiro.de.Geografia.e.Estatística..Sistema.de.Contas.Nacionais.Trimestrais.Referência.2000..IBGE.SCN.2000.Trim.....SCN4_PIBPM4: num 70.6 76.3 75.5 73.2 70.5 74.3 71 67.3 67.3 74.8 ...
# quick visualizations
head(PIB)
Data
1 1980 T1
2 1980 T2
3 1980 T3
4 1980 T4
5 1981 T1
6 1981 T2
PIB...preços.de.mercado...índice.encadeado..média.1995...100....ref..2000.......Instituto.Brasileiro.de.Geografia.e.Estatística..Sistema.de.Contas.Nacionais.Trimestrais.Referência.2000..IBGE.SCN.2000.Trim.....SCN4_PIBPM4
1 70.6
2 76.3
3 75.5
4 73.2
5 70.5
6 74.3
tail(PIB)
Data
136 2013 T4
137 2014 T1
138 2014 T2
139 2014 T3
140
141 Fonte: IPEADATA.
PIB...preços.de.mercado...índice.encadeado..média.1995...100....ref..2000.......Instituto.Brasileiro.de.Geografia.e.Estatística..Sistema.de.Contas.Nacionais.Trimestrais.Referência.2000..IBGE.SCN.2000.Trim.....SCN4_PIBPM4
136 169.8
137 164.1
138 168.4
139 169.3
140 NA
141 NA
We can see that we don’t need the last two lines and that the first row are the dates. (Yes, in PT-BR
date is called data; also, in case you are wondering, data is called dados, which is just the latin translation. Anyway, moving on). We are going to save the dates in vector in dates
and delete the first column.
# trim the last two lines
PIB <- head(PIB, -2)
# get first row and name it "dates"
dates <- PIB[,1]
# delete the first column
PIB <- PIB[,-1]
After deleting the first column, we can see that PIB
changed class:
# PIB is a different class now
str(PIB)
num [1:139] 70.6 76.3 75.5 73.2 70.5 74.3 71 67.3 67.3 74.8 ...
Before we give names to the values, let’s format it more to our liking.
# rearrange strings in dates
head(dates)
[1] "1980 T1" "1980 T2" "1980 T3" "1980 T4" "1981 T1" "1981 T2"
dates <- gsub(" T", ":Q", dates)
head(dates)
[1] "1980:Q1" "1980:Q2" "1980:Q3" "1980:Q4" "1981:Q1" "1981:Q2"
Note that PIB
is a numeric
, so we put names
, not rownames
.
# give names to values and rearrange string
names(PIB) <- dates
# visualize again
head(PIB)
1980:Q1 1980:Q2 1980:Q3 1980:Q4 1981:Q1 1981:Q2
70.6 76.3 75.5 73.2 70.5 74.3
tail(PIB)
2013:Q2 2013:Q3 2013:Q4 2014:Q1 2014:Q2 2014:Q3
169.9 169.7 169.8 164.1 168.4 169.3
Now we can plot our data.
plot(PIB, t="l")
This plot doesn’t have the correct x labels. We will correct that by using a ts
object.
ts
objectIt is pretty straightforward to make a ts
object.
PIB.ts <- ts(PIB, start=c(1980, 1), end=c(2014, 3), freq=4 )
Now, we plot PIB.ts
.
# graph it
plot(PIB.ts)
And the x labels are correct.
Now we get the dates to put the recession bars in the graph (yes, manually).
# get recession limits in dates
recs <- c("1981:Q1", "1983:Q1", "1987:Q3", "1988:Q3", "1989:Q3", "1992:Q1", "1995:Q2", "1995:Q3", "1998:Q1", "1999:Q1", "2001:Q2", "2001:Q4", "2003:Q1", "2003:Q2", "2008:Q4", "2009:Q1" )
And we can make a nicer display with kable()
.
# TABLE
tab.rec <- t( matrix( recs, nrow=2 ) )
colnames(tab.rec) <- c( "begin", "end")
kable(tab.rec)
begin | end |
---|---|
1981:Q1 | 1983:Q1 |
1987:Q3 | 1988:Q3 |
1989:Q3 | 1992:Q1 |
1995:Q2 | 1995:Q3 |
1998:Q1 | 1999:Q1 |
2001:Q2 | 2001:Q4 |
2003:Q1 | 2003:Q2 |
2008:Q4 | 2009:Q1 |
The dates in recs
are correct and are more intuitive, however, rect()
and abline()
cannot understand that format, so we will use numeric dates.
# get recession limits in numeric dates
qtr2num <- function(data) return( as.numeric(substring(data, 1, 4)) - 1/4 + as.numeric( substring(data, 7) )/4 )
recs2 <- qtr2num(recs)
recs2
[1] 1981.00 1983.00 1987.50 1988.50 1989.50 1992.00 1995.25 1995.50
[9] 1998.00 1999.00 2001.25 2001.75 2003.00 2003.25 2008.75 2009.00
Or we can get the series indices of the recession dates.
# get recession limits in indices
idx <- unlist( sapply(recs, function(x) which(x==dates) ) )
names(idx) <- recs2; idx
1981 1983 1987.5 1988.5 1989.5 1992 1995.25 1995.5
5 13 31 35 39 49 62 63
1998 1999 2001.25 2001.75 2003 2003.25 2008.75 2009
73 77 86 88 93 94 116 117
Now that we have the recession limit dates, we plot the data with some more options, and use the rect()
function to plot the graph with the recession bars.
# graph it
plot(PIB.ts, xlim = c(1980, 2015), ylim=c(60,200), col="blue", lwd=1.5, main ="Cronologia Trimestral dos Ciclos de Negócios Brasileiros", xlab = "", ylab="")
for(i in 1:NROW(tab.rec)) rect(recs2[1+2*(i-1)], 1, recs2[2*i], 250, col="gray", border=T)
Wellp, that didn’t go according to plan!
OK, we will use plot(NULL, ...)
to remedy the situation:
# graph it
plot(NULL, xlim = c(1980, 2015), ylim=c(60,200), main ="Cronologia Trimestral dos Ciclos de Negócios Brasileiros", xlab = "", ylab="")
for(i in 1:NROW(tab.rec)) rect(recs2[1+2*(i-1)], 1, recs2[2*i], 250, col="gray", border=T)
lines(PIB.ts, ylim=c(60,200), col="blue", lwd=1.5)
We did it!
Not quite. Our data is not Seasonally Adjusted, and our series ends in 2014:Q3, while the orginal graphs ends in 2020:Q1. However, this is just a matter of getting the data and plugging in the functions. For now, our work is done.
In the next post I intend to make wrappers of the functions used to automate the whole thing.
If you see mistakes or want to suggest changes, please create an issue on the source repository.