Emojis are now fully supported in {ggplot2} thanks to the {ragg} package. Read more about it here: Modern Text Features in R.
This code has been lightly revised to make sure it works as of 2018-12-16.
With Reporters Without Borders coming out with its 2017 World Press Freedom Index in the same week as Hadley Wickham coming out with the emo(ji) package, I had no choice but to explore both of them at the same time.
Disclaimer! This post is not an exercise in statistical inference but rather a proof of concept of how to use the emo(ji) package with ggplot2.
Loading packages
library(hrbrthemes)
library(tidyverse)
library(stringr)
library(tibble)
# remotes::install_github('hadley/emo')
library(emo)
The hrbrthemes is not necessary for this project but it is one of my personal favorite ggplot2 themes.
Gathering data
Here we collect the data from Reporters Without Borders, emoji flags, and The World Bank (so we have something to plot against).
2017 World Press Freedom Index
We have the 2017 World Press Freedom Index (direct download link) data which we load in as normal.
<- read_csv("https://rsf.org/sites/default/files/index_format_upload_2017-v2_1_0.csv"))
(freedom_index ## # A tibble: 180 x 12
## ISO Rank FR_Country EN_country ES_country `Underlying situation score …
## <chr> <dbl> <chr> <chr> <chr> <dbl>
## 1 NOR 1 Norvège Norway Noruega 760
## 2 SWE 2 Suede Sweden Suecia 759
## 3 FIN 3 Finlande Finland Finlandia 892
## 4 DNK 4 Danemark Denmark Dinamarca 1036
## 5 NLD 5 Pays-Bas Netherlands Países Bajos 963
## 6 CRI 6 Costa Rica Costa Rica Costa Rica 1193
## 7 CHE 7 Suisse Switzerland Suiza 1213
## 8 JAM 8 Jamaïque Jamaica Jamaica 1273
## 9 BEL 9 Belgique Belgium Bélgica 1247
## 10 ISL 10 Islande Iceland Islandia 1303
## # … with 170 more rows, and 6 more variables: Abuse score 2016 <chr>,
## # Overall Score 2016 <dbl>, Progression RANK <dbl>, Rank 2015 <dbl>,
## # Score 2015 <dbl>, Zone <chr>
and we see that a total of 180 countries have a score (Overall Score 2016).
GDP per capita
To have something somehow meaningful to compare the freedom index to. I’ve found some data about GDP per capita, mostly because I figured it would have data for quite a lot of the countries covered by the freedom index. So from The World Bank (direct download link) which we load in as normal.
<- read_csv("API_NY.GDP.PCAP.CD_DS2_en_csv_v2.csv", skip = 3))
(gdp_pcap ## # A tibble: 264 x 62
## `Country Name` `Country Code` `Indicator Name` `Indicator Code` `1960` `1961`
## <chr> <chr> <chr> <chr> <dbl> <dbl>
## 1 Aruba ABW GDP per capita … NY.GDP.PCAP.CD NA NA
## 2 Afghanistan AFG GDP per capita … NY.GDP.PCAP.CD 59.8 59.9
## 3 Angola AGO GDP per capita … NY.GDP.PCAP.CD NA NA
## 4 Albania ALB GDP per capita … NY.GDP.PCAP.CD NA NA
## 5 Andorra AND GDP per capita … NY.GDP.PCAP.CD NA NA
## 6 Arab World ARB GDP per capita … NY.GDP.PCAP.CD NA NA
## 7 United Arab E… ARE GDP per capita … NY.GDP.PCAP.CD NA NA
## 8 Argentina ARG GDP per capita … NY.GDP.PCAP.CD NA NA
## 9 Armenia ARM GDP per capita … NY.GDP.PCAP.CD NA NA
## 10 American Samoa ASM GDP per capita … NY.GDP.PCAP.CD NA NA
## # … with 254 more rows, and 56 more variables: 1962 <dbl>, 1963 <dbl>,
## # 1964 <dbl>, 1965 <dbl>, 1966 <dbl>, 1967 <dbl>, 1968 <dbl>, 1969 <dbl>,
## # 1970 <dbl>, 1971 <dbl>, 1972 <dbl>, 1973 <dbl>, 1974 <dbl>, 1975 <dbl>,
## # 1976 <dbl>, 1977 <dbl>, 1978 <dbl>, 1979 <dbl>, 1980 <dbl>, 1981 <dbl>,
## # 1982 <dbl>, 1983 <dbl>, 1984 <dbl>, 1985 <dbl>, 1986 <dbl>, 1987 <dbl>,
## # 1988 <dbl>, 1989 <dbl>, 1990 <dbl>, 1991 <dbl>, 1992 <dbl>, 1993 <dbl>,
## # 1994 <dbl>, 1995 <dbl>, 1996 <dbl>, 1997 <dbl>, 1998 <dbl>, 1999 <dbl>,
## # 2000 <dbl>, 2001 <dbl>, 2002 <dbl>, 2003 <dbl>, 2004 <dbl>, 2005 <dbl>,
## # 2006 <dbl>, 2007 <dbl>, 2008 <dbl>, 2009 <dbl>, 2010 <dbl>, 2011 <dbl>,
## # 2012 <dbl>, 2013 <dbl>, 2014 <dbl>, 2015 <dbl>, 2016 <dbl>, X62 <lgl>
which have quite a few variables but for now we will just focus on the 2015 variable as the 2016 appears empty. Now that we have two data sets that we would like to combine, a general question would be if the gdp_pcap data have information matching our 180 countries. So with the following bit of code, we join the two datasets together by the ICO ALPHA-3 Code available in both datasets and select the countries that don’t have a value for the year 2015.
left_join(freedom_index, gdp_pcap, by = c("ISO" = "Country Code")) %>%
filter(is.na(`2015`)) %>%
select(EN_country)
## # A tibble: 12 x 1
## EN_country
## <chr>
## 1 Liechtenstein
## 2 Andorra
## 3 OECS
## 4 Taiwan
## 5 Papua New Guinea
## 6 Cyprus North
## 7 Kosovo
## 8 Venezuela
## 9 Libya
## 10 Syrian Arab Republic
## 11 Eritrea
## 12 Democratic People's Republic of Korea
which leaves us with 166 countries. I could have looked for the data for these countries, but that is outside the reach of this post.
Flag emoji
I would like to use the different flag emojis
## 🇦🇨🇦🇩🇦🇪🇦🇫🇦🇬🇦🇮🇦🇱🇦🇲🇦🇴🇦🇶🇦🇷🇦🇸🇦🇹🇦🇺🇦🇼🇦🇽🇦🇿🇧🇦🇧🇧🇧🇩🇧🇪🇧🇫🇧🇬🇧🇭🇧🇮🇧🇯🇧🇱🇧🇲🇧🇳🇧🇴🇧🇶🇧🇷🇧🇸🇧🇹🇧🇻🇧🇼🇧🇾🇧🇿🇨🇦🇨🇨🇨🇩🇨🇫🇨🇬🇨🇭🇨🇮🇨🇰🇨🇱🇨🇲🇨🇳🇨🇴🇨🇵🇨🇷🇨🇺🇨🇻🇨🇼🇨🇽🇨🇾🇨🇿🇩🇪🇩🇪🇩🇬🇩🇯🇩🇰🇩🇲🇩🇴🇩🇿🇪🇦🇪🇨🇪🇪🇪🇬🇪🇭🇪🇷🇪🇸🇪🇹🇪🇺🇫🇮🇫🇯🇫🇰🇫🇲🇫🇴🇫🇷🇬🇦🇬🇧🇬🇧🇬🇩🇬🇪🇬🇫🇬🇬🇬🇭🇬🇮🇬🇱🇬🇲🇬🇳🇬🇵🇬🇶🇬🇷🇬🇸🇬🇹🇬🇺🇬🇼🇬🇾🇭🇰🇭🇲🇭🇳🇭🇷🇭🇹🇭🇺🇮🇨🇮🇩🇮🇪🇮🇱🇮🇲🇮🇳🇮🇴🇮🇶🇮🇷🇮🇸🇮🇹🇯🇪🇯🇲🇯🇴🇯🇵🇰🇪🇰🇬🇰🇭🇰🇮🇰🇲🇰🇳🇰🇵🇰🇷🇰🇼🇰🇾🇰🇿🇱🇦🇱🇧🇱🇨🇱🇮🇱🇰🇱🇷🇱🇸🇱🇹🇱🇺🇱🇻🇱🇾🇲🇦🇲🇨🇲🇩🇲🇪🇲🇫🇲🇬🇲🇭🇲🇰🇲🇱🇲🇲🇲🇳🇲🇴🇲🇵🇲🇶🇲🇷🇲🇸🇲🇹🇲🇺🇲🇻🇲🇼🇲🇽🇲🇾🇲🇿🇳🇦🇳🇨🇳🇪🇳🇫🇳🇬🇳🇮🇳🇱🇳🇴🇳🇵🇳🇷🇳🇺🇳🇿🇴🇲🇵🇦🇵🇪🇵🇫🇵🇬🇵🇭🇵🇰🇵🇱🇵🇲🇵🇳🇵🇷🇵🇸🇵🇹🇵🇼🇵🇾🇶🇦🇷🇪🇷🇴🇷🇸🇷🇺🇷🇼🇸🇦🇸🇧🇸🇨🇸🇩🇸🇪🇸🇬🇸🇭🇸🇮🇸🇯🇸🇰🇸🇱🇸🇲🇸🇳🇸🇴🇸🇷🇸🇸🇸🇹🇸🇻🇸🇽🇸🇾🇸🇿🇹🇦🇹🇨🇹🇩🇹🇫🇹🇬🇹🇭🇹🇯🇹🇰🇹🇱🇹🇲🇹🇳🇹🇴🇹🇷🇹🇹🇹🇻🇹🇼🇹🇿🇺🇦🇺🇬🇺🇲🇺🇳🇺🇸🇺🇸🇺🇾🇺🇿🇻🇦🇻🇨🇻🇪🇻🇬🇻🇮🇻🇳🇻🇺🇼🇫🇼🇸🇽🇰🇾🇪🇾🇹🇿🇦🇿🇲🇿🇼🏴🏴🏴
which all can be found with the new emo(ji) package
::ji_find("flag")
emo## # A tibble: 264 x 2
## name emoji
## <chr> <chr>
## 1 Ascension_Island 🇦🇨
## 2 andorra 🇦🇩
## 3 united_arab_emirates 🇦🇪
## 4 afghanistan 🇦🇫
## 5 antigua_barbuda 🇦🇬
## 6 anguilla 🇦🇮
## 7 albania 🇦🇱
## 8 armenia 🇦🇲
## 9 angola 🇦🇴
## 10 antarctica 🇦🇶
## # … with 254 more rows
we first notice that the first two emojis are not country flags and that the name of the countries are not in the same format as what we have from earlier, so we replace the underscores with spaces and translate everything to lowercase before joining. This time by country name. Again we check for missed joints.
left_join(freedom_index, gdp_pcap, by = c("ISO" = "Country Code")) %>%
mutate(EN_country = tolower(EN_country)) %>%
left_join(emo::ji_find("flag") %>%
mutate(name = str_replace_all(name, "_", " ")) %>%
filter(name != "japan", name != "crossed flags"),
by = c("EN_country" = "name")) %>%
filter(!is.na(`2015`)) %>%
filter(is.na(emoji)) %>%
select(EN_country)
## # A tibble: 22 x 1
## EN_country
## <chr>
## 1 germany
## 2 spain
## 3 trinidad and tobago
## 4 france
## 5 united kingdom
## 6 united states
## 7 italy
## 8 south korea
## 9 bosnia and herzegovina
## 10 japan
## # … with 12 more rows
Which is quite a few. It turns out that the naming convention for the emoji names has not been that consistent, “de” used instead of “germany” etc. To clear up code later on we make a new emoji tibble with all the changes.
<- emo::ji_find("flag") %>%
newemoji mutate(name = str_replace_all(string = name,
pattern = "_",
replacement = " ")) %>%
filter(name != "japan", name != "crossed flags") %>%
mutate(name = str_replace(name, "^de$", "germany"),
name = str_replace(name, "^es$", "spain"),
name = str_replace(name, "^trinidad tobago$", "trinidad and tobago"),
name = str_replace(name, "^fr$", "france"),
name = str_replace(name, "^uk$", "united kingdom"),
name = str_replace(name, "^us$", "united states"),
name = str_replace(name, "^it$", "italy"),
name = str_replace(name, "^kr$", "south korea"),
name = str_replace(name, "^bosnia herzegovina$", "bosnia and herzegovina"),
name = str_replace(name, "^guinea bissau$", "guinea-bissau"),
name = str_replace(name, "^cote divoire$", "ivory coast"),
name = str_replace(name, "^timor leste$", "east timor"),
name = str_replace(name, "^congo brazzaville$", "congo"),
name = str_replace(name, "^palestinian territories$", "palestine"),
name = str_replace(name, "^ru$", "russian federation"),
name = str_replace(name, "^congo kinshasa$", "the democratic republic of the congo"),
name = str_replace(name, "^tr$", "turkey"),
name = str_replace(name, "^brunei$", "brunei darussalam"),
name = str_replace(name, "^laos$", "lao people's democratic republic"),
name = str_replace(name, "^cn$", "china"),
name = str_replace(name, "^jp$", "japan"))
newemoji## # A tibble: 264 x 2
## name emoji
## <chr> <chr>
## 1 Ascension Island 🇦🇨
## 2 andorra 🇦🇩
## 3 united arab emirates 🇦🇪
## 4 afghanistan 🇦🇫
## 5 antigua barbuda 🇦🇬
## 6 anguilla 🇦🇮
## 7 albania 🇦🇱
## 8 armenia 🇦🇲
## 9 angola 🇦🇴
## 10 antarctica 🇦🇶
## # … with 254 more rows
Plotting it all with ggplot2
Now with all the preparation done we do a naive first plot.
left_join(freedom_index, gdp_pcap, by = c("ISO" = "Country Code")) %>%
mutate(EN_country = tolower(EN_country)) %>%
left_join(newemoji, by = c("EN_country" = "name")) %>%
ggplot(aes(x = `2015`, y = `Overall Score 2016`)) +
geom_text(aes(label = emoji))
## Warning: Removed 14 rows containing missing values (geom_text).
But wait, we have a couple of problems:
- The emojis don’t show up.
- The freedom score is 100 times too much as the actual.
- The gdp_pcap is quite skewed.
But these are not problems too great for us. It turns out that R’s graphical devices don’t support AppleColorEmoji font. We can alleviate that problem by saving the plot as a svg file. And we will do a simple log transformation of the gdp_pcap.
Our final plot is thus the following:
left_join(freedom_index, gdp_pcap, by = c("ISO" = "Country Code")) %>%
mutate(EN_country = tolower(EN_country),
`Overall Score 2016` = `Overall Score 2016` / 100) %>%
left_join(newemoji, by = c("EN_country" = "name")) %>%
ggplot(aes(x = `2015`, y = `Overall Score 2016`)) +
stat_smooth(method = "lm", color = "grey", se = FALSE) +
geom_text(aes(label = emoji)) +
scale_x_log10() +
annotation_logticks(sides = "b") +
theme_ipsum() +
labs(x = "GDP per capita (current US$)", y = "2017 World Press Freedom Index",
title = "Countries with high GDP per capita\ntend to have low Freedom Index",
subtitle = "Visualized with emojis")
session information
─ Session info ───────────────────────────────────────────────────────────────
setting value 4.0.5 (2021-03-31)
version R version 10.16
os macOS Big Sur .0
system x86_64, darwin17
ui X11 language (EN)
-8
collate en_US.UTF-8
ctype en_US.UTF/Honolulu
tz Pacific2021-07-02
date
─ Packages ───────────────────────────────────────────────────────────────────* version date lib source
package 0.2.1 2019-03-21 [1] CRAN (R 4.0.0)
assertthat 1.2.1 2020-12-09 [1] CRAN (R 4.0.2)
backports 1.3.2 2021-06-06 [1] Github (rstudio/blogdown@00a2090)
blogdown 0.22 2021-04-22 [1] CRAN (R 4.0.2)
bookdown 0.7.6 2021-04-05 [1] CRAN (R 4.0.2)
broom 0.2.4.9003 2021-05-05 [1] Github (rstudio/bslib@ba6a80d)
bslib 1.1.0 2016-07-27 [1] CRAN (R 4.0.0)
cellranger 2.5.0 2021-04-26 [1] CRAN (R 4.0.2)
cli 0.7.1 2020-10-08 [1] CRAN (R 4.0.2)
clipr 0.2-18 2020-11-04 [1] CRAN (R 4.0.5)
codetools 2.0-1 2021-05-04 [1] CRAN (R 4.0.2)
colorspace 1.4.1 2021-02-08 [1] CRAN (R 4.0.2)
crayon 1.1.1 2021-01-15 [1] CRAN (R 4.0.2)
DBI 2.1.1 2021-04-06 [1] CRAN (R 4.0.2)
dbplyr 1.3.0 2021-03-05 [1] CRAN (R 4.0.2)
desc * 0.2.1 2020-01-12 [1] CRAN (R 4.0.0)
details 0.6.27 2020-10-24 [1] CRAN (R 4.0.2)
digest * 1.0.7 2021-06-18 [1] CRAN (R 4.0.2)
dplyr 0.3.2 2021-04-29 [1] CRAN (R 4.0.2)
ellipsis * 0.0.0.9000 2020-05-12 [1] Github (hadley/emo@3f03b11)
emo 0.14 2019-05-28 [1] CRAN (R 4.0.0)
evaluate 0.17 2014-12-08 [1] CRAN (R 4.0.0)
extrafont 1.0 2012-06-11 [1] CRAN (R 4.0.0)
extrafontdb 0.5.0 2021-05-25 [1] CRAN (R 4.0.2)
fansi 2.1.0 2021-02-28 [1] CRAN (R 4.0.2)
farver * 0.5.1 2021-01-27 [1] CRAN (R 4.0.2)
forcats 1.5.0 2020-07-31 [1] CRAN (R 4.0.2)
fs 0.2.3 2021-01-06 [1] CRAN (R 4.0.2)
gdtools 0.1.0 2020-10-31 [1] CRAN (R 4.0.2)
generics * 3.3.3 2020-12-30 [1] CRAN (R 4.0.2)
ggplot2 1.4.2 2020-08-27 [1] CRAN (R 4.0.2)
glue 0.3.0 2019-03-25 [1] CRAN (R 4.0.0)
gtable 2.4.1 2021-04-23 [1] CRAN (R 4.0.2)
haven 0.9 2021-04-16 [1] CRAN (R 4.0.2)
highr 1.1.0 2021-05-17 [1] CRAN (R 4.0.2)
hms * 0.8.0 2020-03-06 [1] CRAN (R 4.0.2)
hrbrthemes 0.5.1.1 2021-01-22 [1] CRAN (R 4.0.2)
htmltools 1.4.2 2020-07-20 [1] CRAN (R 4.0.2)
httr 0.1.4 2021-04-26 [1] CRAN (R 4.0.2)
jquerylib 1.7.2 2020-12-09 [1] CRAN (R 4.0.2)
jsonlite * 1.33 2021-04-24 [1] CRAN (R 4.0.2)
knitr 0.4.2 2020-10-20 [1] CRAN (R 4.0.2)
labeling 1.0.0 2021-02-15 [1] CRAN (R 4.0.2)
lifecycle 1.7.10 2021-02-26 [1] CRAN (R 4.0.2)
lubridate 2.0.1 2020-11-17 [1] CRAN (R 4.0.2)
magrittr 0.1.8 2020-05-19 [1] CRAN (R 4.0.0)
modelr 0.5.0 2018-06-12 [1] CRAN (R 4.0.0)
munsell 1.6.1 2021-05-16 [1] CRAN (R 4.0.2)
pillar 2.0.3 2019-09-22 [1] CRAN (R 4.0.0)
pkgconfig 0.1-7 2013-12-03 [1] CRAN (R 4.0.0)
png * 0.3.4 2020-04-17 [1] CRAN (R 4.0.0)
purrr 2.5.0 2020-10-28 [1] CRAN (R 4.0.2)
R6 1.0.6 2021-01-15 [1] CRAN (R 4.0.2)
Rcpp * 1.4.0 2020-10-05 [1] CRAN (R 4.0.2)
readr 1.3.1 2019-03-13 [1] CRAN (R 4.0.2)
readxl 2.0.0 2021-04-02 [1] CRAN (R 4.0.2)
reprex 0.4.11 2021-04-30 [1] CRAN (R 4.0.2)
rlang 2.8.6 2021-06-06 [1] Github (rstudio/rmarkdown@9dc5d97)
rmarkdown 2.0.2 2020-11-15 [1] CRAN (R 4.0.2)
rprojroot 0.13 2020-11-12 [1] CRAN (R 4.0.2)
rstudioapi 1.3.8 2020-01-10 [1] CRAN (R 4.0.0)
Rttf2pt1 1.0.0 2021-03-09 [1] CRAN (R 4.0.2)
rvest 0.3.1.9003 2021-05-05 [1] Github (rstudio/sass@6166162)
sass 1.1.1 2020-05-11 [1] CRAN (R 4.0.0)
scales 1.1.1 2018-11-05 [1] CRAN (R 4.0.0)
sessioninfo 1.6.2 2021-05-17 [1] CRAN (R 4.0.2)
stringi * 1.4.0 2019-02-10 [1] CRAN (R 4.0.0)
stringr 1.0.1 2021-02-09 [1] CRAN (R 4.0.2)
systemfonts * 3.1.2 2021-05-16 [1] CRAN (R 4.0.2)
tibble * 1.1.3 2021-03-03 [1] CRAN (R 4.0.2)
tidyr 1.1.1 2021-04-30 [1] CRAN (R 4.0.2)
tidyselect * 1.3.1 2021-04-15 [1] CRAN (R 4.0.2)
tidyverse 1.2.1 2021-03-12 [1] CRAN (R 4.0.2)
utf8 0.3.8 2021-04-29 [1] CRAN (R 4.0.2)
vctrs 2.4.2 2021-04-18 [1] CRAN (R 4.0.2)
withr 0.23 2021-05-15 [1] CRAN (R 4.0.2)
xfun 1.3.2 2020-04-23 [1] CRAN (R 4.0.0)
xml2 2.2.1 2020-02-01 [1] CRAN (R 4.0.0)
yaml
1] /Library/Frameworks/R.framework/Versions/4.0/Resources/library [