Tagged with RColorBrewer

geom_point Legend with Custom Colors in ggplot

Formerly, I showed how to make line segments using ggplot.

Working from that previous example, there are only a few things we need to change to add custom colors to our plot and legend in ggplot.

First, we'll add the colors of our choice. I'll do this using RColorBrewer, but you can choose whatever method you'd like.

library(RColorBrewer)
colors = brewer.pal(8, "Dark2")

The next section will be exactly the same as the previous example, except for removing the scale_color_discrete line to make way for the scale_color_manual we'll be adding later.

library(ggplot2)

data <- as.data.frame(USPersonalExpenditure) # data from package datasets
data$Category <- as.character(rownames(USPersonalExpenditure)) # this makes things simpler later

ggplot(data,
    aes(x = Expenditure,
        y = Category)) +
labs(x = "Expenditure",
    y = "Category") +
geom_segment(aes(x = data$"1940",
        y = Category,
        xend = data$"1960",
        yend = Category),
    size = 1) +
geom_point(aes(x = data$"1940",
        color = "1940"), # these can be any string, they just need to be unique identifiers
    size = 4,
    shape = 15) +
geom_point(aes(x = data$"1960",
        color = "1960"),
    size = 4,
    shape = 15) +
theme(legend.position = "bottom") +

And finally, we'll add a scale_color_manual line to our plot. We need to define the name, labels, and colors of the plot.

scale_color_manual(name = "Year", # or name = element_blank()
    labels = c(1940, 1960),
    values = colors)

And here's our final plot, complete with whatever custom colors we've chosen in both the plot and legend:

geom_point in ggplot with custom colors in the graph and legend

I've updated the gist from the previous post to also include a file that has custom colors.

Tagged , , , , , ,

Shapefiles in R

Let's learn how to use Shapefiles in R. This will allow us to map data for complicated areas or jurisdictions like zipcodes or school districts. For the United States, many shapefiles are available from the [Census Bureau](http://www.census.gov/geo/www/tiger/tgrshp2010/tgrshp2010.html. Our example will map U.S. national parks.

First, download the U.S. Parks and Protected Lands shape files from Natural Earth. We'll be using the ne_10m_parks_and_protected_lands_area.shp file.

Next, start working in R. First, we'll load the shapefile and maptools:

# load up area shape file:
library(maptools)
area <- readShapePoly("ne_10m_parks_and_protected_lands_area.shp")

# # or file.choose:
# area <- readShapePoly(file.choose())

Next we can set the colors we want to use. And then we can set up our basemap.

library(RColorBrewer)
colors <- brewer.pal(9, "BuGn")

library(ggmap)
mapImage <- get_map(location = c(lon = -118, lat = 37.5),
    color = "color",
    source = "osm",
    # maptype = "terrain",
    zoom = 6)

Next, we can use the fortify function from the ggplot2 package. This converts the crazy shape file with all its nested attributes into a data frame that ggmap will know what to do with.

area.points <- fortify(area)

Finally, we can map our shape files!

ggmap(mapImage) +
    geom_polygon(aes(x = long,
            y = lat,
            group = group),
        data = area.points,
        color = colors[9],
        fill = colors[6],
        alpha = 0.5) +
labs(x = "Longitude",
    y = "Latitude")

National Parks and Protected Lands in California and Nevada

Same figure, with a Stamen terrain basemap with ColorBrewer palette "RdPu"

Citations and Further Reading

Tagged , , , , , , , , ,

Stacked Bar Charts in R

Reshape Wide to Long

Let's use the Loblolly dataset from the datasets package. These data track the growth of some loblolly pine trees.

> Loblolly[1:10,]
   height age Seed
1    4.51   3  301
15  10.89   5  301
29  28.72  10  301
43  41.74  15  301
57  52.70  20  301
71  60.92  25  301
2    4.55   3  303
16  10.92   5  303
30  29.07  10  303
44  42.83  15  303

First, we need to convert the data to wide form, so each age (i.e. 3, 5, 10, 15, 20, 25) will have its own variable.

wide <- reshape(Loblolly,
    v.names = "height",
    timevar = "age",
    idvar = "Seed",
    direction = "wide")

> wide[1:5,]
  Seed height.3 height.5 height.10 height.15 height.20 height.25
1  301     4.51    10.89     28.72     41.74     52.70     60.92
2  303     4.55    10.92     29.07     42.83     53.88     63.39
3  305     4.79    11.37     30.21     44.40     55.82     64.10
4  307     3.91     9.48     25.66     39.07     50.78     59.07
5  309     4.81    11.20     28.66     41.66     53.31     63.05

Create Variables

Then we want to create new columns showing how much each tree has grown between data points. For example, instead of knowing a tree's height at age 10, we want to know how much it's grown between age 5 and age 10, so that can be a bar in our graph.

wide$h0.3 <- wide$height.3
wide$h3.5 <- wide$height.5 - wide$height.3
wide$h5.10 <- wide$height.10 - wide$height.5
wide$h10.15 <- wide$height.15 - wide$height.10
wide$h15.20 <- wide$height.20 - wide$height.15
wide$h20.25 <- wide$height.25 - wide$height.20

Plot Stacked Bar Chart

Finally, we want to plot all the new data points:

library(RColorBrewer)
sequential <- brewer.pal(6, "BuGn")
barplot(t(wide[,8:13]),
    names.arg = wide$Seed, # x-axis labels
    cex.names = 0.7, # makes x-axis labels small enough to show all
    col = sequential, # colors
    xlab = "Seed Source",
    ylab = "Height, Feet",
    xlim = c(0,20), # these two lines allow space for the legend
    width = 1) # these two lines allow space for the legend
legend("bottomright", 
    legend = c("20-25", "15-20", "10-15", "5-10", "3-5", "0-3"), #in order from top to bottom
    fill = sequential[6:1], # 6:1 reorders so legend order matches graph
    title = "Years")

Stacked bar chart

If you decide you'd rather have clustered bars instead of stacked bars, you can just add the option beside = TRUE to the barplot.

The full code is available in a gist.

Citations and Further Reading

Tagged , , , , ,

Palettes in R

In its simplest form, a palette in R is simply a vector of colors. This vector can be include the hex triplet or R color names.

The default palette can be seen through palette():

> palette("default") # you'll only need this line if you've previously changed the palette from the default
> palette()
[1] "black"   "red"     "green3"  "blue"    "cyan"    "magenta" "yellow"
[8] "gray"

Defining your own palettes

If you want to make your own palette, you can just create your own vector of colors. It's fine for your vector to include a mixture of hex triplets and R color names. You can use the palette function above, but generally it's best to just store each palette as a standard vector. For one thing, you can use more than one palette that way. Here's how you can define your own palette:

colors <- c("#A7A7A7",
    "dodgerblue",
    "firebrick",
    "forestgreen",
    "gold")

Now let's try using our palette. For now let's just color each bar of a histogram. This is a silly example, but I think it's the easiest way to show how to get R to utilize your palette. In the following example, there are six bars, but only five colors. You can see that R will cycle through your palette to fill all the shapes.

hist(discoveries,
    col = colors)

One color per bar

A more sensible use of color is to use a different color for each of a number of summary statistics:

colors <- c("#A7A7A7",
    "dodgerblue",
    "firebrick",
    "forestgreen",
    "gold")
hist(discoveries,
    col = colors[1])
abline(v = mean(discoveries),
    col = colors[2],
    lwd = 5)
abline(v = median(discoveries),
    col = colors[3],
    lwd = 5)
abline(v = min(discoveries),
    col = colors[4],
    lwd = 5)
abline(v = max(discoveries),
    col = colors[5],
    lwd = 5)
legend(x = "topright", # location of legend within plot area
    col = colors[2:5],

c("Mean", "Median", "Minimum", "Maximum"),
    lwd = 5)

Summary statistics

Predefined palettes: default R palettes

The package grDevices (you probably already have this loaded) contains a number of palettes.

?rainbow
rainbowcols <- rainbow(6)
hist(discoveries,
    col = rainbowcols)

Rainbow palette

For rainbow, you can adjust the saturation and value. For example:

rainbowcols <- rainbow(6,
    s = 0.5)
hist(discoveries,
    col = rainbowcols)

Rainbow palette desaturated

heatcols <- heat.colors(6)
hist(discoveries,
    col = heatcols)

heat.colors palette

As well as rainbow and heat.colors, there are also terrain.colors, topo.colors, and cm.colors.

Predefined palettes: RColorBrewer

library(RColorBrewer)
display.brewer.all()

The above lines will show us all the RColorBrewer palettes (output shown below). The top section of palettes are sequential, the middle section are qualitative, and the lower section are diverging. Here is some information about how to choose a good palette.

RColorBrewer palettes

RColorBrewer works a little different than how we've defined palettes previously. We'll have to use brewer.pal to create a palette.

library(RColorBrewer)
darkcols <- brewer.pal(8, "Dark2")
hist(discoveries,
    col = darkcols)

Dark2 color palette

Even though we have to provide brewer.pal with the number of colors we want, we won't necessarily need to use all those colors later. We can still choose a color from the vector like we have previously. When we're setting a col setting to the full palette, we'll be more concerned with how many colors are included in the palette , but even there, we can choose a subset of the whole palette:

darkcols <- brewer.pal(8, "Dark2")
hist(discoveries,
    col = darkcols[1:2])

Dark 2 color palette, two colors

Here's the code from this post.

Now that we're familiar with making our own palettes and using the built-in palettes in grDevices and RColorBrewer, I'm planning a future post about a more practical (but also more complicated) example of using palettes: making maps.

Tagged , , , ,