I have already introduced various interfaces in R and Python for querying data directly from within a script. Examples covered by me inlcuded the OECD database and its interface in R, i.e. the OECD package in R. I also e.g. covered twitteR, Google trends and the EUROSTAT R-package.
In this post I will use another package: comtradr. This package is an interface and provides acces to the UN Comtrade Database API and it has a focus on commodities. It allows you to query economic import and export data. This makes it a very useful package for high-level economic supply chain analysis or high-level strategic foreign trade analysis.
As in any data analysis I will facilitate a basic workflow that comprises: – Obtain the data: Querying data (in this case from a database via an interface) – Wrangle with the data: checking, Cleaning and adjusting dataset for visualization and other analysis – Explore and understand the data: Analyze and visualize data
Obtaining data: Query inter-country trade data in R with comtradr
In preparation of my data query I need to understand what I can query in the database. The comtradr package comprises multiple functions for this. I first need to find a code; I can query related data as soon as I have the code.
In this example I want to query commodity related import and export data. More specifically, I am interested in crude oil. The comtradr R-packages includes the ct_commodity_lookup look-up function for looking up query codes related to commodities. I use it to search for codes related to crude oil.
# import comtradr R-package
library(comtradr)
# search for commodit codes: oil - see result (display result)
ct_commodity_lookup("crude")
## $crude
## [1] "150510 - Animal fats and oils; wool grease, crude"
## [2] "150590 - Animal fats and oils; wool grease (other than crude) and fatty substances derived therefrom (including lanolin)"
## [3] "150710 - Vegetable oils; soya-bean oil and its fractions, crude, whether or not degummed, not chemically modified"
## [4] "150790 - Vegetable oils; soya-bean oil and its fractions, other than crude, whether or not refined, but not chemically modified"
## [5] "150810 - Vegetable oils; ground-nut oil and its fractions, crude, not chemically modified"
## [6] "150890 - Vegetable oils; ground-nut oil and its fractions, other than crude, whether or not refined, but not chemically modified"
## [7] "151110 - Vegetable oils; palm oil and its fractions, crude, not chemically modified"
## [8] "151190 - Vegetable oils; palm oil and its fractions, other than crude, whether or not refined, but not chemically modified"
## [9] "151211 - Vegetable oils; sunflower seed or safflower oil and their fractions, crude, not chemically modified"
## [10] "151219 - Vegetable oils; sunflower seed or safflower oil and their fractions, other than crude, whether or not refined, but not chemically modified"
## [11] "151221 - Vegetable oils; cotton-seed oil and its fractions; crude, whether or not gossypol has been removed, not chemically modified"
## [12] "151229 - Vegetable oils; cotton-seed oil and its fractions, other than crude, whether or not refined, but not chemically modified"
## [13] "151311 - Vegetable oils; coconut (copra) oil and its fractions, crude, not chemically modified"
## [14] "151319 - Vegetable oils; coconut (copra) oil and its fractions, other than crude, whether or not refined, but not chemically modified"
## [15] "151321 - Vegetable oils; palm kernel or babassu oil and their fractions, crude, not chemically modified"
## [16] "151329 - Vegetable oils; palm kernel or babassu oil and their fractions, other than crude, whether or not refined, but not chemically modified"
## [17] "151410 - Vegetable oils; rape, colza or mustard oil and their fractions, crude, not chemically modified"
## [18] "151411 - Vegetable oils; low erucic acid rape or colza oil and its fractions, crude"
## [19] "151419 - Vegetable oils; low erucic acid rape or colza oil and its fractions, other than crude, but not chemically modified"
## [20] "151490 - Vegetable oils; rape, colza or mustard oil and their fractions, other than crude, whether or not refined, but not chemically modified"
## [21] "151491 - Vegetable oils; excluding low erucic acid rape or colza oil and its fractions, crude"
## [22] "151499 - Vegetable oils; excluding low erucic acid rape or colza oil and its fractions, other than crude, but not chemically modified"
## [23] "151511 - Vegetable oils; linseed oil and its fractions, crude, not chemically modified"
## [24] "151519 - Vegetable oils; linseed oil and its fractions, other than crude, whether or not refined, but not chemically modified"
## [25] "151521 - Vegetable oils; maize (corn) oil and its fractions, crude, not chemically modified"
## [26] "151529 - Vegetable oils; maize (corn) oil and its fractions, other than crude, whether or not refined, but not chemically modified"
## [27] "1520 - Glycerol, crude; glycerol waters and glycerol lyes"
## [28] "152000 - Glycerol, crude; glycerol waters and glycerol lyes"
## [29] "152010 - Glycerol (glycerine); crude; glycerol waters and glycerol lyes"
## [30] "250310 - Sulphur; crude or unrefined"
## [31] "250390 - Sulphur; other kinds excluding crude, unrefined, sublimed, precipitated and colloidal"
## [32] "250621 - Quartzite; crude or roughly trimmed"
## [33] "250629 - Quartzite; cut, by sawing or otherwise, into blocks or slabs of a rectangular (including square) shape, (excluding crude or roughly trimmed)"
## [34] "251311 - Pumice stone; crude or in irregular pieces, including crushed pumice (bimskies), whether or not heat-treated"
## [35] "251321 - Emery, corundum, garnet and other abrasives; natural, crude or in irregular pieces, whether or not heat-treated"
## [36] "251329 - Emery, corundum, garnet and other abrasives; natural, in other forms excluding crude or in irregular pieces, whether or not heat-treated"
## [37] "251511 - Marble and travertine; having a specific gravity of 2.5 or more, crude or roughly trimmed by sawing or otherwise, into blocks or slabs of a rectangular (including square) shape"
## [38] "251611 - Granite; crude or roughly trimmed"
## [39] "251621 - Sandstone; crude or roughly trimmed"
## [40] "252510 - Mica; crude and rifted into sheets or splittings"
## [41] "2709 - Petroleum oils and oils obtained from bituminous minerals; crude"
## [42] "270900 - Oils; petroleum oils and oils obtained from bituminous minerals, crude"
## [43] "2710 - Petroleum oils, oils from bituminous minerals, not crude; preparations n.e.s. containing less than 70% petroleum oils, oils from bituminous minerals; these being the basic constituents of the preparations"
## [44] "271000 - Oils; petroleum oils and oils obtained from bituminous minerals, not crude; preparations n.e.s., containing by weight 70% or more of petroleum oils or oils obtained from bituminous minerals"
## [45] "271011 - Petroleum oils and oils from bituminous minerals, not crude or waste oils; preparations n.e.c. with 70% or more (weight),of petroleum oils or oils from bituminous minerals; being the basic constituents of the preparations: light oils and preparations"
## [46] "271019 - Petroleum oils and oils from bituminous minerals, not containing biodiesel, not crude, not waste oils; preparations n.e.c, containing by weight 70% or more of petroleum oils or oils from bituminous minerals; not light oils and preparations"
## [47] "271020 - Petroleum oils and oils from bituminous minerals, containing biodiesel, not crude, not waste oils; preparations n.e.c, containing by weight 70% or more of petroleum oils or oils from bituminous minerals"
## [48] "271091 - Waste Oils; of petroleum or obtained from bituminous minerals, not crude; and preparations n.e.c., weight 70% or preparations of the same, containing polychlorinated biphenyls (PCBs), polychorinated terphenyls (PCTs) or polybrominated biphenyls (PBBs)"
## [49] "271099 - Waste Oils;of petroleum or obtained from bituminous minerals, not crude;and preparations n.e.c., weight 70% or preparations of the same, not containing polychlorinated biphenyls (PCBs), polychorinated terphenyls (PCTs) or polybrominated biphenyls (PBBs)"
## [50] "310410 - Fertilizers, mineral or chemical; potassic, carnallite, sylvite and other crude natural potassium salts"
## [51] "3805 - Gum, wood or sulphate turpentine, other turpenic oils; crude dipentene; sulphite turpentine, other crude para-cymene; pine oil containing alpha-terpineol as the main constituent"
## [52] "380590 - Terpenic oils; produced by the treatment of coniferous woods, crude dipentene, sulphite turpentine and other crude para-cymene, n.e.s. in heading no. 3805"
## [53] "390931 - Amino-resins; n.e.c. in heading no. 3909, in primary forms, poly(methylene phenyl isocyanate) (Crude MDI, polymeric MDI)"
## [54] "390939 - Amino-resins; n.e.c. in heading no. 3909, in primary forms, other than poly(methylene phenyl isocyanate) (Crude MDI, polymeric MDI)"
Above is a long list of all API keys available for data queries and that are somehow related to “crude” and commodities. I will use API key 270900 for oils, petroleum oils and oils from bituminous minerals – in crude form. I will use this API to facilitate my data query using the comtradr R-package.
Querying crude oil data using relevant codes
For executing my crude oil comtradr query I can use the ct_search() function. It allows me to specify certain query parameters: – reporters: The countries for which data is retrieved – partners: Countries trading with the reporters – trade_direction: Direction of trade, specified in a character vector – freq: “Annual” or “monthly” (default is “annual”) – start_date & end_date: For setting time frame of analysis – commod_codes: Character vector with API query codes – max_rec: Maximum numer of record to be returned by the API – type: “goods” or “services” – url: Allows for customization of the base of the url string for accessing the Comtrade API; default value is “https://comtrade.un.org/api?”.
Let us focus on exports (trade_direction), from China (reporter) to South Korea and Japan. I set the parameter values accordingly:
# get relevant code for search term "crude"
codes = ct_commodity_lookup("crude",return_code=TRUE,return_char = TRUE)
# pull data for specific query
data = ct_search(reporters = "China",
partners = c("Rep. of Korea", "Japan"),
trade_direction = "exports",
commod_codes = codes[42])
# print head of data frame
head(data)
## classification year period period_desc aggregate_level is_leaf_code
## 1 H5 2017 2017 2017 6 1
## 2 H5 2017 2017 2017 6 1
## 3 H5 2018 2018 2018 6 1
## 4 H5 2018 2018 2018 6 1
## 5 H5 2019 2019 2019 6 1
## 6 H5 2020 2020 2020 6 1
## trade_flow_code trade_flow reporter_code reporter reporter_iso partner_code
## 1 2 Export 156 China CHN 392
## 2 2 Export 156 China CHN 410
## 3 2 Export 156 China CHN 392
## 4 2 Export 156 China CHN 410
## 5 2 Export 156 China CHN 410
## 6 2 Export 156 China CHN 392
## partner partner_iso second_partner_code second_partner
## 1 Japan JPN NA <NA>
## 2 Rep. of Korea KOR NA <NA>
## 3 Japan JPN NA <NA>
## 4 Rep. of Korea KOR NA <NA>
## 5 Rep. of Korea KOR NA <NA>
## 6 Japan JPN NA <NA>
## second_partner_iso customs_proc_code customs mode_of_transport_code
## 1 <NA> <NA> <NA> <NA>
## 2 <NA> <NA> <NA> <NA>
## 3 <NA> <NA> <NA> <NA>
## 4 <NA> <NA> <NA> <NA>
## 5 <NA> <NA> <NA> <NA>
## 6 <NA> <NA> <NA> <NA>
## mode_of_transport commodity_code
## 1 <NA> 270900
## 2 <NA> 270900
## 3 <NA> 270900
## 4 <NA> 270900
## 5 <NA> 270900
## 6 <NA> 270900
## commodity
## 1 Oils; petroleum oils and oils obtained from bituminous minerals, crude
## 2 Oils; petroleum oils and oils obtained from bituminous minerals, crude
## 3 Oils; petroleum oils and oils obtained from bituminous minerals, crude
## 4 Oils; petroleum oils and oils obtained from bituminous minerals, crude
## 5 Oils; petroleum oils and oils obtained from bituminous minerals, crude
## 6 Oils; petroleum oils and oils obtained from bituminous minerals, crude
## qty_unit_code qty_unit alt_qty_unit_code alt_qty_unit qty
## 1 8 Weight in kilograms NA <NA> 1895250000
## 2 8 Weight in kilograms NA <NA> 1089580000
## 3 8 Weight in kilograms NA <NA> 1519590500
## 4 8 Weight in kilograms NA <NA> 419468300
## 5 8 Weight in kilograms NA <NA> 317288400
## 6 8 Weight in kilograms NA <NA> 87962600
## alt_qty netweight_kg gross_weight_kg trade_value_usd cif_trade_value_usd
## 1 NA 1895250000 NA 703947696 NA
## 2 NA 1089580000 NA 379329744 NA
## 3 NA 1519590500 NA 721135596 NA
## 4 NA 419468300 NA 200784884 NA
## 5 NA 317288400 NA 148865369 NA
## 6 NA 87962600 NA 28996092 NA
## fob_trade_value_usd flag
## 1 NA 0
## 2 NA 0
## 3 NA 0
## 4 NA 0
## 5 NA 0
## 6 NA 0
Visualizing crude oil export development using ggplot2
The dataframe obtained includes a lot of variables. For this post I focus on the following variables: – partner – year – trade value in USD – reporter
In below lines of R-code I select these variables from the dataframe using dplyr. I then check for missing values. E.g., for some years, no exports were reported to one of the countries.
For a ggplot2-visualization missing values like this are not an issue. However, I want to finish this example of with a gganimate animation. For this I need to replace missing entries with a value. In this case, if no export was reported for a given partner and year, I add a row in the dataframe for that year and partner and set trade value to be 0 for that row. One I have done so I can proceed to plot the content of the dataframe in a line chart, usign ggplot2.
# import relevant libraries
library(ggplot2)
library(dplyr)
##
## Attaching package: 'dplyr'
## The following objects are masked from 'package:stats':
##
## filter, lag
## The following objects are masked from 'package:base':
##
## intersect, setdiff, setequal, union
# identify years with missing entries or with missing entries for just one of the two series
# dplyr operation - selecting only relevant new columns
data = select(data,year,trade_value_usd,partner,reporter)
for(i in min(data$year):max(data$year)){
subset = filter(data,year==i)
if(nrow(subset)<2){
if(subset$partner == "Japan"){
data[nrow(data)+1,] = subset[1,]
data$trade_value_usd[nrow(data)] = 0
data$partner[nrow(data)] = "Rep. of Korea"
}else{ # Rep. of Korea
data[nrow(data)+1,] = subset[1,]
data$trade_value_usd[nrow(data)] = 0
data$partner[nrow(data)] = "Japan"
}
}
}
data = dplyr::arrange(data,year)
# create plot
plot = ggplot(data, aes(x=year,
y=`trade_value_usd`/1000000,
color = factor(`partner`),
group = partner)) +
geom_point(size=3.5) +
geom_line(size=1) +
scale_x_continuous(limits = c(min(data$year), max(data$year)),
breaks = seq.int(min(data$year), max(data$year), 2)) +
scale_color_manual(values = c("blue", "red"),
name = "Destination\nCountry") +
scale_shape_discrete(name = "Destination\nCountry") +
labs(title = "Chinese crude oil export development (SK and JP)") +
theme(axis.text.x = element_text(angle = 45, vjust = 1, hjust = 1)) +
theme(axis.title = element_text(size=14))+
theme(plot.title = element_text(size=16))+
ylab("Trade value [mio. USD]") +
xlab("Year")
plot
Animate data with ggplot2 and gganimate
A line chart of this kind can, as it has been created with ggpplot2, be animated. All you need for this is gganimate, another package in R. I turn the visualization into an animation in the lines of code documented below. I save the animation as .gif-file.
library(gganimate)
plot + transition_reveal(year)
anim_save("comtradr.gif")
If you are interested in creating animation in R with ggplot2 and gganimate you can also check my other post on spatial data animation with ggmap and gganimate. ggmap is ggplot2-based, and is thus compatible with gganimate.
If you are interested in how to obtain data in R, without having to interact with csv-files, then you can also check out some other interfaces in R that I have documented on this blog. For example, I have covered oecd in R, eurstats in R, various automotive industry production and sales data bases, fredr in R and many other packages.
Data scientist focusing on simulation, optimization and modeling in R, SQL, VBA and Python
Leave a Reply