Global Terrorism Database

14 minute read

Published:

Many data visualisations are presented as a final polished version which give little insight into how they were created. In this post I will try to articulate my thought process, mistakes and notes along the way to convey a realistic and iterative workflow to produce a plot.

This exercise is a quick data visualisation of the Global Terrorism Database from the National Consortium for the Study of Terrorism and Responses to Terrorism (START). My idea at the start is to plot each attack event as a dot on a world map view. It would be good to group the dots by type of attack and ultimately create an animated GIF over time of attacks.

I’m not too fussed about accuracy, The data set is quite complicated, and I cant be bothered reading all the documentation- its more just a minimum viable prototype to show the thought process of an analysis.

Data

The data used is the Global Terrorism Database which contains more than 170,000 terrorist attacks worldwide 1970-2016.

The data was downloded from Kaggle as zip file containing: globalterrorismdb_0617dist.csv

I placed this data in my ./raw directory. It has been saved in my github repo for reproducibility.

Import

# setup 
library(tidyverse)
library(lubridate)
library(summarytools)
library(ggmap)
library(ggthemes)
library(forcats)
library(maps)
library(animation)

# unzip  
unzip('./raw/globalterrorismdb_0617dist.csv.zip', exdir = './raw')

# read in csv
events <- read_csv('./raw/globalterrorismdb_0617dist.csv')

Analysis

EDA

There are 135 variables in the database with varying info on each attack event. For the purposes of this analysis I am limiting myself to looking at the spatio-temporal visualisation aspect. So the variables I will select are:

  • eventid
  • iyear
  • imonth
  • iday
  • latitude
  • longitude
  • attacktype1_txt
# getting rid of unneeded vars
eventlocs <- events %>% 
  select(eventid, iyear, imonth, iday, longitude, latitude, attacktype1_txt)

head(eventlocs)
## # A tibble: 6 x 7
##        eventid iyear imonth  iday longitude latitude attacktype1_txt      
##          <dbl> <int>  <int> <int>     <dbl>    <dbl> <chr>                
## 1 197000000001  1970      7     2    - 70.0     18.5 Assassination        
## 2 197000000002  1970      0     0    - 99.1     19.4 Hostage Taking (Kidn…
## 3 197001000001  1970      1     0     121       15.5 Assassination        
## 4 197001000002  1970      1     0      23.7     38.0 Bombing/Explosion    
## 5 197001000003  1970      1     0     130       33.6 Facility/Infrastruct…
## 6 197001010002  1970      1     1    - 89.2     37.0 Armed Assault

This looks okay. I might change eventid to a character variable as it’s an identifier and not a true numeric value. I will also use the date columns to make a lubridate date format.

# a date col might be useful to order the animation
eventloc_wdate <- eventlocs %>% 
  mutate(eventid = as.character(eventid),
         date = paste(iday, imonth, iyear),
         date2 = dmy(date))
## Warning: 891 failed to parse.

There are 891 records that don’t parse. On visual inspection it’s due to day and month being coded as zero if the exact date is presumably not known. I will default these to ‘1’ as I’m not too concerned with exact granularity for this quick visualisation.

# fixing an issue with the day and month
eventloc_wdate <- eventlocs %>% 
  mutate(eventid = as.character(eventid),
         iday = ifelse(iday == 0, 1, iday),
         imonth = ifelse(imonth == 0, 1, imonth),
         date = paste(iday, imonth, iyear),
         date = dmy(date))

Taking a look at the data we have:

dfSummary(eventloc_wdate, style='grid', plain.ascii = FALSE, graph.col = FALSE)
## ## Data Frame Summary   
## **eventloc_wdate**   
## **N:** 170350   
## 
## +------+-------------------+------------------------------------------+----------------------+----------+-----------+
## | No   | Variable          | Stats / Values                           | Freqs (% of Valid)   | Valid    | Missing   |
## +======+===================+==========================================+======================+==========+===========+
## | 1    | eventid           | 1. 197000000001   \                      | 1 (0.0%)   \         | 170350   | 0         |
## |      | [character]       | 2. 197000000002   \                      | 1 (0.0%)   \         | (100%)   | (0%)      |
## |      |                   | 3. 197001000001   \                      | 1 (0.0%)   \         |          |           |
## |      |                   | 4. 197001000002   \                      | 1 (0.0%)   \         |          |           |
## |      |                   | 5. 197001000003   \                      | 1 (0.0%)   \         |          |           |
## |      |                   | 6. 197001010002   \                      | 1 (0.0%)   \         |          |           |
## |      |                   | 7. 197001020001   \                      | 1 (0.0%)   \         |          |           |
## |      |                   | 8. 197001020002   \                      | 1 (0.0%)   \         |          |           |
## |      |                   | 9. 197001020003   \                      | 1 (0.0%)   \         |          |           |
## |      |                   | 10. 197001030001   \                     | 1 (0.0%)   \         |          |           |
## |      |                   | [ 170340 others ]                        | 170340 (0.0%)        |          |           |
## +------+-------------------+------------------------------------------+----------------------+----------+-----------+
## | 2    | iyear             | mean (sd) : 2001.71 (13.14)   \          | 46 distinct val.     | 170350   | 0         |
## |      | [integer]         | min < med < max :   \                    |                      | (100%)   | (0%)      |
## |      |                   | 1970 < 2007 < 2016   \                   |                      |          |           |
## |      |                   | IQR (CV) : 24 (0.01)                     |                      |          |           |
## +------+-------------------+------------------------------------------+----------------------+----------+-----------+
## | 3    | imonth            | mean (sd) : 6.47 (3.39)   \              | 12 distinct val.     | 170350   | 0         |
## |      | [numeric]         | min < med < max :   \                    |                      | (100%)   | (0%)      |
## |      |                   | 1 < 6 < 12   \                           |                      |          |           |
## |      |                   | IQR (CV) : 5 (0.52)                      |                      |          |           |
## +------+-------------------+------------------------------------------+----------------------+----------+-----------+
## | 4    | iday              | mean (sd) : 15.47 (8.81)   \             | 31 distinct val.     | 170350   | 0         |
## |      | [numeric]         | min < med < max :   \                    |                      | (100%)   | (0%)      |
## |      |                   | 1 < 15 < 31   \                          |                      |          |           |
## |      |                   | IQR (CV) : 15 (0.57)                     |                      |          |           |
## +------+-------------------+------------------------------------------+----------------------+----------+-----------+
## | 5    | longitude         | mean (sd) : 26.35 (58.57)   \            | 60602 distinct val.  | 165744   | 4606      |
## |      | [numeric]         | min < med < max :   \                    |                      | (97.3%)  | (2.7%)    |
## |      |                   | -176.18 < 43.13 < 179.37   \             |                      |          |           |
## |      |                   | IQR (CV) : 66.06 (2.22)                  |                      |          |           |
## +------+-------------------+------------------------------------------+----------------------+----------+-----------+
## | 6    | latitude          | mean (sd) : 23.4 (18.84)   \             | 61028 distinct val.  | 165744   | 4606      |
## |      | [numeric]         | min < med < max :   \                    |                      | (97.3%)  | (2.7%)    |
## |      |                   | -53.15 < 31.47 < 74.63   \               |                      |          |           |
## |      |                   | IQR (CV) : 23.48 (0.81)                  |                      |          |           |
## +------+-------------------+------------------------------------------+----------------------+----------+-----------+
## | 7    | attacktype1_txt   | 1. Armed Assault   \                     | 40223 (23.6%)   \    | 170350   | 0         |
## |      | [character]       | 2. Assassination   \                     | 18402 (10.8%)   \    | (100%)   | (0%)      |
## |      |                   | 3. Bombing/Explosion   \                 | 83073 (48.8%)   \    |          |           |
## |      |                   | 4. Facility/Infrastructure Attack   \    | 9581 ( 5.6%)   \     |          |           |
## |      |                   | 5. Hijacking   \                         | 598 ( 0.4%)   \      |          |           |
## |      |                   | 6. Hostage Taking (Barricade Incident)   | 902 ( 0.5%)   \      |          |           |
## |      |                   | \                                        | 10233 ( 6.0%)   \    |          |           |
## |      |                   | 7. Hostage Taking (Kidnapping)   \       | 913 ( 0.5%)   \      |          |           |
## |      |                   | 8. Unarmed Assault   \                   | 6425 ( 3.8%)         |          |           |
## |      |                   | 9. Unknown                               |                      |          |           |
## +------+-------------------+------------------------------------------+----------------------+----------+-----------+
## | 8    | date              |                                          | 15712 distinct val.  | 170350   | 0         |
## |      | [Date]            |                                          |                      | (100%)   | (0%)      |
## +------+-------------------+------------------------------------------+----------------------+----------+-----------+

There are 2.7% of records with no latitude or longitude. Given we are just doing a basic cartographic explanation, lets just drop these records. Everything else looks okay.

event_plot <- eventloc_wdate %>% 
  filter(complete.cases(.))

Plot

First I’ll attempt to use the ggmap package to download a base map of the earth, then use the predicable ggplot plotting grammar. I recall there are other packages with baked in ‘world’ maps but I’m not sure if they are the right solution for this.

# lets take a look at a base map 
base <- get_map("world", zoom = 1)
ggmap(base)

Worth a shot but I don’t really like this type of map, need to also think about how I’m going to limit its extent.

I googled the answer I found a comment here that helped, using a bounding box and Stamen Maps.

# Stamen terrain map, with bounding box
base <- get_stamenmap(bbox = c(left = -180, 
                               bottom = -80, 
                               right = 179.9999, 
                               top = 85), 
                      zoom = 3,
                      force = TRUE)


ggmap(base) 

This looks good, I might try and trim the bounding box to get rid of Antarctica. Change to black and white for contrast and use a map theme from the ggthemes package.

# Stamen terrain map, with bounding box
base <- get_stamenmap(maptype = 'terrain-background', 
                      bbox = c(left = -180, 
                               bottom = -60, 
                               right = 179.9999, 
                               top = 85), 
                      zoom = 3,
                      color = "bw",
                      force = TRUE)

ggmap(base) +
  theme_map()

This looks heaps better. Lets try to put some of the points of each terrorist attack in a layer on the top.

# attack events as points  
ggmap(base) +
  geom_point(aes(longitude, latitude), data = event_plot) +
  theme_map()

This seems to have worked. I need to try and colour the dots by attack type.

# attack events as points, by attack type  
ggmap(base) +
  geom_point(aes(longitude, latitude, colour = attacktype1_txt), data = event_plot) +
  theme_map()
## Warning: `panel.margin` is deprecated. Please use `panel.spacing` property
## instead

It’s hard to tell the classes apart due to there being so many levels. I might try to group some of them together using the forcats package.

# recoding attack type  
event_plot <- event_plot %>% 
  mutate(attack.type = fct_recode(attacktype1_txt,
                                  Assault.Assass = "Assassination",
                                  Assault.Assass ="Armed Assault",
                                  Assault.Assass ="Unarmed Assault",
                                  Explosion = "Bombing/Explosion", 
                                  Unknown = "Unknown",
                                  Hostage = "Hostage Taking (Barricade Incident)",
                                  Hostage = "Hostage Taking (Kidnapping)",
                                  Hostage = "Hijacking",
                                  Facility.Attack = "Facility/Infrastructure Attack"
                                  )) %>% 
  arrange(iyear)
  
event_plot %>% count(attack.type)
## # A tibble: 5 x 2
##   attack.type         n
##   <fct>           <int>
## 1 Assault.Assass  57391
## 2 Explosion       81665
## 3 Facility.Attack  9416
## 4 Hostage         11195
## 5 Unknown          6077
ggmap(base) +
  geom_point(aes(longitude, 
                 latitude, 
                 colour = attack.type), 
             data = event_plot) +
  theme_map() 

This is better, but there is overplotting due to the number of data points. I can try:

  • Reducing point size
  • Adding transparency with alpha channel
  • using another geom like geom_count to summarize
# using geom_count and alpha  
ggmap(base) +
  geom_count(aes(longitude, 
                 latitude, 
                 colour = attack.type), 
             data = event_plot,
             alpha = 0.3) +
  guides(colour = guide_legend(override.aes = list(size=3)),
         size = "none") +
  theme_map() 

I’m finding the interpretation a little hard due to the terrain background. If we look at the simplified map package approach we can get a basic grey world map to work with.

map <- borders("world", colour="gray50", fill="gray50") 

ggplot() +   
  map +
  geom_count(aes(longitude, 
                 latitude, 
                 colour = attack.type), 
             data = event_plot,
             alpha = 0.3) +
  theme_map() 

This seems a little clearer, I also think this map projection is more suitable. Now Touching up the map to add titles, source and legend formatting.

map <- borders("world", colour="grey20", fill= "grey20") 

ggplot() +   
  map +
  geom_count(aes(longitude, 
                 latitude, 
                 colour = attack.type), 
             data = event_plot,
             alpha = 0.1) +
  guides(colour = guide_legend(override.aes = list(size=3, 
                                                   alpha = 1), 
                               title = "Attack Type"),
         size = "none") +
  labs(title = "Global Terrorism Database",
       subtitle = "More than 170,000 terrorist attacks worldwide  1970-2016",
       caption = "Source: National Consortium for the Study of Terrorism and Responses to Terrorism (START). (2017). Global Terrorism Database [Data file]. Retrieved from https://www.kaggle.com/START-UMD/gtd") +
  theme_map() 

This looks good, the transparency looks better against the darker background.

You can really see concentrated events in Central and South America, with more Assault type attacks in Peru versus Colombia. Northern Europe is dominated by facility and infrastructure attacks with Eastern Europe and Central Asia reporting more explosion events.

A further step could be looping though each date to create an animated GIF from 1970.

I tried using the gganimate package butgot stuck on an error I couldn’t resolve. So I used the animation package thanks to the help of this post on stack overflow.

# creating animated GIF

# base map
map <- borders("world", colour="grey20", fill= "grey20")  

# number of years in data set
npoints <- length(unique(event_plot$iyear))

# function to create plots
plotfoo <- function(){
  for(i in 1:npoints){
    
    take_df <- event_plot %>% 
      filter(iyear == unique(event_plot$iyear)[i])
    
    p <- ggplot() +
      map +
      geom_point(aes(longitude, latitude, colour = attack.type, group = iyear), data = take_df, alpha = 0.1) +
      guides(colour = guide_legend(override.aes = list(size=3, alpha = 1), title = "Attack Type"), size = "none") +
      labs(title = "Global Terrorism Database - More than 170,000 terrorist attacks worldwide  1970-2016",
           subtitle = unique(event_plot$iyear)[i],
           caption = "Source: National Consortium for the Study of Terrorism and Responses to Terrorism (START). (2017). Global Terrorism     
           Database [Data file]. Retrieved from https://www.kaggle.com/START-UMD/gtd") +
      theme_map() 
    
    print(p)
  }
}

# create GIF
oopt = ani.options(interval = 1, nmax = npoints)  
saveGIF(plotfoo(),interval = 0.5, ani.width = 600)

References

National Consortium for the Study of Terrorism and Responses to Terrorism (START). (2017). Global Terrorism Database - Data file. Retrieved from https://www.kaggle.com/START-UMD/gtd

Session Info

sessionInfo()
## R version 3.4.3 (2017-11-30)
## Platform: x86_64-pc-linux-gnu (64-bit)
## Running under: Ubuntu 14.04.5 LTS
## 
## Matrix products: default
## BLAS: /usr/lib/libblas/libblas.so.3.0
## LAPACK: /usr/lib/lapack/liblapack.so.3.0
## 
## locale:
##  [1] LC_CTYPE=en_AU.UTF-8       LC_NUMERIC=C              
##  [3] LC_TIME=en_AU.UTF-8        LC_COLLATE=en_AU.UTF-8    
##  [5] LC_MONETARY=en_AU.UTF-8    LC_MESSAGES=en_AU.UTF-8   
##  [7] LC_PAPER=en_AU.UTF-8       LC_NAME=C                 
##  [9] LC_ADDRESS=C               LC_TELEPHONE=C            
## [11] LC_MEASUREMENT=en_AU.UTF-8 LC_IDENTIFICATION=C       
## 
## attached base packages:
## [1] stats     graphics  grDevices utils     datasets  methods   base     
## 
## other attached packages:
##  [1] bindrcpp_0.2       animation_2.5      maps_3.2.0        
##  [4] ggthemes_3.2.0     ggmap_2.7          summarytools_0.8.1
##  [7] lubridate_1.7.1    forcats_0.2.0      stringr_1.2.0     
## [10] dplyr_0.7.4        purrr_0.2.4        readr_1.1.1       
## [13] tidyr_0.8.0        tibble_1.4.2       ggplot2_2.2.1     
## [16] tidyverse_1.2.1   
## 
## loaded via a namespace (and not attached):
##  [1] Rcpp_0.12.15       lattice_0.20-34    png_0.1-7         
##  [4] utf8_1.1.3         assertthat_0.2.0   digest_0.6.15     
##  [7] psych_1.6.6        R6_2.2.2           cellranger_1.1.0  
## [10] plyr_1.8.4         evaluate_0.9       httr_1.3.1        
## [13] pillar_1.1.0       RgoogleMaps_1.4.1  rlang_0.1.6       
## [16] lazyeval_0.2.1     readxl_1.0.0       rstudioapi_0.7    
## [19] geosphere_1.5-7    rmarkdown_1.1      labeling_0.3      
## [22] proto_1.0.0        pander_0.6.0       RCurl_1.95-4.8    
## [25] munsell_0.4.3      broom_0.4.3        compiler_3.4.3    
## [28] modelr_0.1.1       pkgconfig_2.0.1    mnormt_1.5-4      
## [31] htmltools_0.3.6    codetools_0.2-14   matrixStats_0.53.0
## [34] crayon_1.3.4       bitops_1.0-6       grid_3.4.3        
## [37] nlme_3.1-128       jsonlite_1.5       gtable_0.2.0      
## [40] magrittr_1.5       formatR_1.2.1      scales_0.5.0      
## [43] cli_1.0.0          stringi_1.1.6      mapproj_1.2-5     
## [46] pryr_0.1.3         reshape2_1.4.3     sp_1.2-7          
## [49] xml2_1.2.0         rapportools_1.0    rjson_0.2.15      
## [52] tools_3.4.3        glue_1.2.0         hms_0.4.1         
## [55] jpeg_0.1-8         parallel_3.4.3     yaml_2.1.13       
## [58] colorspace_1.3-2   rvest_0.3.2        knitr_1.14        
## [61] bindr_0.1          haven_1.1.1