There is something I haven’t written about yet, mainly because I’ve been a little bit busy. Since mid-September I’ve been working at the European Forest Institute, EFI on a task related to the Forest Products Trade Flow Database, a project that the EFI has been working on since almost EFI foundation in 1995.
The main aim of the database is to track the trade of forest products around the World, and produce meaningful reports for the general public and of course the experts. To achieve that, it extracts and clean trade flows from the UN Comtrade database that contains data on trade since the ~80s.
UN Comtrade is a great resource that been helping to understand global trade, even more in the nowadays globalized world, and probably it has been an essential tool for developing trade policy internationally and locally. However, since it relies on the self reporting of the involved trading countries, sometimes the data isn’t totally reliable and it has to be checked and cleaned. Here, is were the work of EFI comes into place, developing a tool to clean and serve the part related to forest products.
EFI has developed a collection of scrips —basically a package— in R to download, clean and produce reports about the flows, on which —at lest on my opinion— the previous developers have done a really outstanding job. Yet, the project hasn’t been under active development for a while and some parts of the code need to be updated and some others need to be debugged. So I have had some fun with the code and I’ll probably have a little bit more in the near future.
The first part of my work, was mainly trying to understand how the process of cleaning and downloading was taking place. The downloading part doesn’t have a lot of mystery —it juts the use of an API— but it can be challenging —and it is— when we are talking about this volume of data. The cleaning has more complication, and in some cases it take a lot of guessing, comparing and extrapolation. This doesn’t make the final product less accurate, but absolutely the opposite. You have to think, that in some cases, some of the entries are completely missing1 missing and in others are totally wrong, so all the processes that take place are aiming not only to complete the database but to make it more accurate. You can read more about it in the database site.
I hope to write a little bit more in the future about the project because it’s really interesting. All related to data is interesting, even more if it’s forest data.
Soon, more to come!
Each flow of data has a mirror one. Since each country reports it own data, some countries report the import and others report the export flow. There is when the madness happens. Different countries, different people, different value, different way of reporting, different measurements,… ↩