Making a Better Map Part 2: ggplot and Tigris
Continuing Where we Left Off
In Part One of this series we worked on getting data using the TidyCensus package to recreate the map below from a recent New York Times Story. Now we’re going to actually do the mapping bit.
It Floats…kind of
So this should be pretty straightforward: we’ve got a dataframe with shapefiles called bay_area_housing. It’s got one column with a shapefile of each census tract in the bay area and another with a ratio of dwellings where more than one person resides per room. Using the geom_sf() function we can make a basic map with just a couple lines of code.
ggplot(data = bay_area_housing) +
geom_sf(color = NA, aes(geometry = geometry, fill = ratio_over_1))
It’s not pretty. At all. And it’s nothing like the map we are trying to replicate. In no particular order, if we want to recreate the map from the NYT, we’re going to need:
1) Roads
2) The landmass outside of our chosen area
3) City Names
4) A better color palette
5) Water. And that water should be blue!
So how do we get all of that?
Enter Tigris
By an almost obscene stroke of luck (and by luck I mean the hard work of the authors), the Tigris library can help us get almost all of these. Seriously, getting all of the data we need for roads, landmass and water is just this simple:
library(tigris)
my_state = 'CA'
my_county = c('San Francisco','San Mateo','Santa Cruz','Santa Clara','Alameda','Contra Costa','Los Angeles')
roads_data = primary_secondary_roads(my_state) %>%
filter(RTTYP %in% c('U','S','I')) #this will limit the roads to just highways.
landmass_data = counties(cb = TRUE, state = my_state)
water_data = area_water(my_state,my_county)
Let’s see what we can muster now with these supporting layers. Note that I’m moving my data arguments into each individual geom_sf() call since we’re pulling from more than one data frame. Also since these layers will stack on top of eachother, the order that we feed them into ggplot is very important.
my_map = ggplot() +
theme_void() + #whiteout
geom_sf(data = landmass_data, fill = '#e6e6e6', color = NA,aes(geometry = geometry)) + #grey landmass
geom_sf(data = bay_area_housing, color = NA, aes(geometry = geometry, fill = ratio_over_1)) + #our census map
geom_sf(data = water_data, fill = '#cce6ff',color = NA, aes(geometry = geometry)) + #add water
theme(panel.background = element_rect(fill = '#cce6ff', color = NA)) + #blue background to mesh with water
geom_sf(data = roads_data, color = 'white', aes(geometry = geometry)) + #add roads
NULL
my_map
Crap. We’ve got what’s probably a pretty decent map, but it’s for the whole state. This kind of makes sense actually, since some of our calls to Tigris only specify a state and not a county, and ggplot doesn’t know that we only care to frame the parts of the data that are included in our bay_area_housing layer. Luckily there is a way around this. Using the coord_sf() function we can set x and y limits based on our original dataframe. This is kind of tricky since shapefiles don’t behave the way a lot of other R objects do but we can extract them like so using the st_bbox() function contained in the sf library.
limits = st_bbox(bay_area_housing$geometry)
limits
## xmin ymin xmax ymax
## -123.01392 36.85065 -121.20823 38.09988
Now lets try again with our new limited frame.
my_map = my_map +
coord_sf(xlim = c(as.numeric(limits$xmin),as.numeric(limits$xmax)),
ylim = c(as.numeric(limits$ymin),as.numeric(limits$ymax)))
my_map
Last Touches
Okay! This is starting to look more professional. We definitely need to get a color scheme that’s a bit more expressive and get our legend a bit more legible. There’s lots of palettes in the viridis library that can make your maps much more legible and accessible but in the spirit of recreating the original map, I’m going to hand pick a gradient to match the levels in the original, while also getting my legend cleaned up.
my_map = my_map +
labs(fill = 'Share of housing\nunits with more\nthan one person\nper room') + #label legend
theme(legend.justification = c(0,0), legend.position = c(0.02,0.04)) + # move legend to bottom left
scale_fill_gradient(low = '#ffff99', high = '#b10026', na.value = '#ffffcc' ) # custom fill palette
my_map
Very nice. All that’s left to do is add our cities and we’ll be all set. I found that one area where Tigris isn’t great is getting simple points for the locations of cities (it really just deals with polygons), but there is a maps package that has built in data for US cities. Lets bring in that data and add in the last layer of our map. In order to keep things from getting too crowded we’ll add a cutoff for cities that are under a certain size.
library(maps)
cities_data = us.cities %>%
filter(country.etc %in% my_state & pop > 75000) %>%
mutate(fixed_name = str_replace(name,country.etc,'')) # this will drop the state name from the name column
my_map +
geom_text(color = 'black',data = cities_data ,check_overlap = TRUE ,aes(x = long, y = lat, label = fixed_name))
Wow…lets just take minute to recall our journey here. Nicely done!
We Hunger for More Maps
Ok! We’ve recreated the original map. Remember though, one of the goals of this project was to be able to make a tool that we can use to create similar visualizations for ANYWHERE in the US.
Well…if we’re gonna be running the same code several times over, this sounds like a job for a function. This next part is gonna involve a lot of code we’ve already gone over but the basic idea is to make two functions: one to pull the data we need to make the map and one to actually render it. First the data puller: This is going to let us grab our underlying info from the TidyCensus, Tigris and Maps package.
pull_data = function(my_state, my_county){
acs = load_variables(2018, 'acs5', cache = TRUE)
concepts = acs %>%
filter(concept == 'TENURE BY OCCUPANTS PER ROOM')
concepts_string = concepts %>%
select(1) %>%
pull() # this creates a character vector of just the concepts we want
raw = get_acs(geography = 'tract', variables = concepts_string, year = 2018, state = my_state,
county = my_county,geometry = TRUE, cache = TRUE)
fixed_data = concepts %>%
select(name, label) %>%
right_join(raw, by = c('name' = 'variable')) %>%
select(2,3,5,7) %>%
spread(key = label, value = estimate, -4)
names(fixed_data) = make.names(names(fixed_data))
fill_data = fixed_data %>%
mutate(
total_under_1 = Estimate..Total..Owner.occupied..0.50.or.less.occupants.per.room + Estimate..Total..Owner.occupied..0.51.to.1.00.occupants.per.room + Estimate..Total..Renter.occupied..0.50.or.less.occupants.per.room + Estimate..Total..Renter.occupied..0.51.to.1.00.occupants.per.room,
total_over_1 = Estimate..Total..Owner.occupied..1.01.to.1.50.occupants.per.room + Estimate..Total..Owner.occupied..1.51.to.2.00.occupants.per.room + Estimate..Total..Owner.occupied..2.01.or.more.occupants.per.room + Estimate..Total..Renter.occupied..1.01.to.1.50.occupants.per.room + Estimate..Total..Renter.occupied..1.51.to.2.00.occupants.per.room + Estimate..Total..Renter.occupied..2.01.or.more.occupants.per.room
) %>%
select(GEOID,geometry ,total_under_1, total_over_1, Estimate..Total) %>%
mutate(ratio_over_1 = total_over_1/ Estimate..Total)
roads_data = primary_secondary_roads(my_state) %>%
filter(RTTYP %in% c('U','S','I'))
counties_data = counties(cb = TRUE)
water_data = area_water(my_state,my_county)
cities_data = us.cities %>%
filter(country.etc %in% my_state) %>%
mutate(fixed_name = str_replace(name,country.etc,'')) # this will drop the state name from the name column
return(list(fill = fill_data, roads = roads_data, counties = counties_data, water = water_data, cities = cities_data))
}
Second is the actual mapper. I’ve got two extra parameters in here to make this more flexible for various geographies. One allows you to specify a population cutoff for which cities will be included on the map. Another lets you specify which corner of the map to stick the legend.
mapper = function(input_data, city_size_floor = 100000000, legend_position = 'bottom left'){ #accepts list object from pull_data function
map_frame = input_data$fill
limits = st_bbox(map_frame$geometry)
roads = input_data$roads
county = input_data$counties
water = input_data$water
cities = input_data$cities %>%
filter(pop > city_size_floor & long > as.numeric(limits$xmin) & long < as.numeric(limits$xmax) &
lat>(as.numeric(limits$ymin) & lat < as.numeric(limits$ymax)))
legend_placer = function(position = 'bottom left'){
if(position == 'bottom left'){
ret = (list(c(0,0),c(0.02,0.04)))
}else if(position == 'top left'){
ret =(list(c(0,1),c(0.02,0.96)))
}else if (position == 'bottom right'){
ret=(list(c(1,0),c(0.98,0.04)))
}else if(position == 'top right'){
ret=(list(c(1,1),c(0.98,0.96)))
}else(stop('invalid input: must supply one of: bottom left, bottom right, top left, top right'))
return(theme(legend.justification = ret[[1]], legend.position = ret[[2]]))
}
ggplot() +
theme_void() + #whiteout
legend_placer(legend_position) +
labs(fill = 'Share of housing\nunits with more\nthan one person\nper room')+ #label legend
geom_sf(data = county, alpha = 1, fill = '#e6e6e6', color = NA,aes(geometry = geometry)) + #grey landmass
geom_sf(data = map_frame, alpha = 1,color = NA, aes(geometry = geometry, fill = ratio_over_1)) + #our census map
scale_fill_gradient(low = '#ffff99', high = '#b10026', na.value = '#ffffcc' ) + # custom fill palette
geom_sf(data = water, fill = '#cce6ff',color = NA, aes(geometry = geometry)) + #add water
theme(panel.background = element_rect(fill = '#cce6ff', color = NA)) + #blue background to mesh with water
geom_sf(data = roads, color = 'white', aes(geometry = geometry)) + #add roads
geom_text(color = 'black',data = cities,check_overlap = TRUE ,aes(x = long, y = lat, label = fixed_name)) + #add city names
coord_sf(xlim = c(as.numeric(limits$xmin),as.numeric(limits$xmax)), ylim = c(as.numeric(limits$ymin),as.numeric(limits$ymax))) + #set framing(very important)
NULL
}
So lets give it a whirl: let’s start with New York City:
nyc = pull_data('NY',c('Bronx','New York','Queens','Kings'))
mapper(nyc, legend_position = 'top left')
How about my hometown?
atx = pull_data('TX',c('Travis','Williamson','Hays','Caldwell','Bastrop'))
mapper(atx, 40000, legend_position = 'top left')
I think we’ve done it. If you want to see maps for your own city or region, feel free to copy my functions and run them locally. If you find a way to improve on them let me know!