Hans Rosling is one of the most popular data scientists on the web. His original TED talk was very popular when it came out. We are going to create some graphics using his formatted data as our weekly case study. Note that we need to remove Kuwait from the data (discussion on this)
In this exercise you will recreate the two graphics shown below using
gapminder
dataset from library(gapminder)
(get
them to match as closely as you can). Specific instructions/steps are
listed in the ‘Detailed Steps’ section.
library(ggplot2); library(gapminder); library(dplyr)
to
load the necessary packages.
filter()
to remove “Kuwait” from the
gapminder
dataset for reasons noted in the backgroundggplot()
and the theme_bw()
to
duplicate the first plot using the filtered dataset (without
Kuwait)aes()
) to
color by contintent and adjust the size of the point with
size=pop/100000
. Remember that if you adjust the data like
this you will also need to update the legend later.scale_y_continuous(trans = "sqrt")
to get the
correct scale on the y-axis.facet_wrap(~year,nrow=1)
to divide the plot into
separate panels.labs()
to specify more informative x, y, size, and
color keys.ggplot(...) + geom_point() + facet_wrap(...) + scale_y_continuous(...) +
theme_bw() + labs(...)
group_by()
to group by continent
and
year
summarize()
with the below commands to calculate
the data for the black continent average line on the second plot:
gdpPercapweighted = weighted.mean(x = gdpPercap, w = pop)
pop = sum(as.numeric(pop))
gapminder_continent
ggplot()
and the theme_bw()
to
duplicate the second plot. In this plot you will add elements from both
the raw gapminder dataset and your dataset summarized by continent. You
will need to use the new data you summarized to add the black lines and
dots showing the continent average. So it will look something like this:
ggplot(gapminder,...) + geom_line() + geom_point() + geom_line(data=newdata,...) +
geom_point(data=newdata,...) + facet_wrap() + theme_bw() + labs(...)
ggsave()
or png()
and save each plot
as a .png
with a width of 15 inchesSource
to confirm that your script runs from
start to finish without errors and saves the graphics.case_study_03.R
or
case_study_03.Rmd
(if you are starting to play with R
Markdown) in your course repository for week 3.It’s possible to do some data aggregation like this ‘within’ a ggplot
call using the stat_summary()
and friends. See here
for more details.
Adapted from BYU M335 Data Science Course