Tagged with legend

Custom Legend in R

This particular custom legend was designed with three purposes:

  • To effectively bin values based on a theoretical minimum and maximum value for that variable (e.g. -1 and 1 or 0 and 100)
  • To use a different interval notation than the default
  • To handle NA values

Even though this particular legend was designed with those needs, it should be simple to extrapolate from that to build legends based on other criteria.

Standard Legend

For this post, I'll be assuming you've looked through the Oregon map tutorial or have other experience making legends in R. If not, you'll probably want to check that link out. It's an awesome tutorial.

Let's start by creating a map with a standard legend, and then we move on to customization later.

First, we'll load the packages we need and the data from OIdata:

library(OIdata)
library(RColorBrewer)
library(classInt)

# load state data from OIdata package:
data(state)

Next we want to set some constants. This will save us a bunch of typing and will make the code easier to read, especially once we start creating a custom legend. Also, it will allow us to easily change the values if we want a different number of bins or a different min and max.

In this example, we're assuming we have a theoretical minimum and maximum and want to determine our choropleth bins based on that.

nclr <- 8 # number of bins
min <- 0 # theoretical minimum
max <- 100 # theoretical maximum
breaks <- (max - min) / nclr

Next, we'll set up our choropleth colors (this should look familiar from the Oregon tutorial):

# set up colors:
plotclr <- brewer.pal(nclr, "Oranges")
plotvar <- state$coal
class <- classIntervals(plotvar,
    nclr,
    style = "fixed",
    fixedBreaks = seq(min, max, breaks))
colcode <- findColours(class, 
    plotclr)

And now let's map the data:

# map data:
map("state", # base
    col = "gray80",
    fill = TRUE,
    lty = 0)
map("state", # data
    col = colcode,
    fill = TRUE,
    lty = 0,
    add = TRUE)
map("state", # border
    col = "gray",
    lwd = 1.4,
    lty = 1,
    add = TRUE)

And finally let's add our default legend:

legend("bottomleft", # position
    legend = names(attr(colcode, "table")), 
    title = "Percent",
    fill = attr(colcode, "palette"),
    cex = 0.56,
    bty = "n") # border

Here's the output of this code (see map-standard-legend.R in the gist):

Percent of power coming from coal sources (standard legend)

Custom Legend

Next we want to add a few lines here and there to enhance the legend.

For starters, let's deal with NA values. We don't have any in this particular dataset, but if we did, we would have seen they were left as the base color of the map and not included in the legend.

After our former code setting up the colors, we should add the color for NAs. It's important that these lines go after all the other set up code, or the wrong colors will be mapped.

# set up colors:
plotclr <- brewer.pal(nclr, "Oranges")
plotvar <- state$coal
class <- classIntervals(plotvar,
    nclr,
    style = "fixed",
    fixedBreaks = seq(min, max, breaks))
colcode <- findColours(class, 
    plotclr)
NAColor <- "gray80"
plotclr <- c(plotclr, NAColor)

We also want to let the map know to have our NA color as the default color, so the map will use that instead of having those areas be transparent:

# map data:
map("state", # base
    col = NAColor,
    fill = TRUE,
    lty = 0)
map("state", # data
    col = colcode,
    fill = TRUE,
    lty = 0,
    add = TRUE)
map("state", # border
    col = "gray",
    lwd = 1.4,
    lty = 1,
    add = TRUE)

Next, we want to set up the legend text. For all but the last interval, we want it to say i ≤ n < (i + breaks). The last interval should be i ≤ n ≤ (i + breaks). This can be accomplished by

# set legend text:
legendText <- c()
for(i in seq(min, max - (max - min) / nclr, (max - min) / nclr)) {
    if (i == max(seq(min, max - (max - min) / nclr, (max - min) / nclr))) {
        legendText <- c(legendText, paste(round(i,3), "\u2264 n \u2264", round(i + (max - min) / nclr,3)))
    } else
        legendText <- c(legendText, paste(round(i,3), "\u2264 n <", round(i + (max - min) / nclr,3))) 
}

But we also want to include NAs in the legend, so we need to add a line:

# set legend text:
legendText <- c()
for(i in seq(min, max - (max - min) / nclr, (max - min) / nclr)) {
    if (i == max(seq(min, max - (max - min) / nclr, (max - min) / nclr))) {
        legendText <- c(legendText, paste(round(i,3), "\u2264 n \u2264", round(i + (max - min) / nclr,3)))
        if (!is.na(NAColor)) legendText <- c(legendText, "NA")
    } else
        legendText <- c(legendText, paste(round(i,3), "\u2264 n <", round(i + (max - min) / nclr,3))) 
}

And finally we need to add the legend to the map:

legend("bottomleft", # position
    legend = legendText, 
    title = "Percent",
    fill = plotclr,
    cex = 0.56,
    bty = "n") # border

The new map (see map-new-legend.R) meets all the criteria we started with that the original legend didn't have.

Percent of power coming from coal sources (custom legend)

Code is available in a gist.

Citations and Further Reading

Tagged , , , , , ,

Using Line Segments to Compare Values in R

Sometimes you want to create a graph that will allow the viewer to see in one glance:

  • The original value of a variable
  • The new value of the variable
  • The change between old and new

One method I like to use to do this is using geom_segment and geom_point in the ggplot2 package.

First, let's load ggplot2 and our data:

library(ggplot2)

data <- as.data.frame(USPersonalExpenditure) # data from package datasets
data$Category <- as.character(rownames(USPersonalExpenditure)) # this makes things simpler later

Next, we'll set up our plot and axes:

ggplot(data,
    aes(y = Category)) +
labs(x = "Expenditure",
    y = "Category") +

For geom_segment, we need to provide four variables. (Sometimes two of the four will be the same, like in this case.) x and y provide the start points, and xend and yend provide the endpoints.

In this case, we want to show the change between 1940 and 1960 for each category. Therefore our variables are the following:

  • x: "1940"
  • y: Category
  • xend: "1960"
  • yend: Category
geom_segment(aes(x = data$"1940",
  y = Category,
  xend = data$"1960",
  yend = Category),
 size = 1) +

Next, we want to plot points for the 1940 and 1960 values. We could do the same for the 1945, 1950, and 1955 values, if we wanted to.

geom_point(aes(x = data$"1940",
    color = "1940"),
    size = 4, shape = 15) +
geom_point(aes(x = data$"1960",
    color = "1960"),
    size = 4, shape = 15) +

Finally, we'll finish up by touching up the legend for the plot:

scale_color_discrete(name = "Year") +
theme(legend.position = "bottom")

geom_segment, then geom_point

The order of geom_segment and the geom_points matters. The first geom line in the code will get plotted first. Therefore, if you want the points displayed over the segments, put the segments first in the code. Likewise, if you want the segments displayed over the points, put the points first in the code.

For example, we could change the middle section of the code to:

geom_point(aes(x = data$"1940",
  color = "1940"),
  size = 4, shape = 15) +
geom_point(aes(x = data$"1960",
  color = "1960"),
  size = 4, shape = 15) +

geom_segment(aes(x = data$"1940",
    y = Category,
    xend = data$"1960",
    yend = Category),
  size = 1) +

And the output would look like:

geom_point, then geom_segment

Similarly, if you have points that will be overlapping, make sure you think about which of the point lines you want R to plot first.

The code is available in a gist.

Tagged , , , , , ,

Stacked Bar Charts in R

Reshape Wide to Long

Let's use the Loblolly dataset from the datasets package. These data track the growth of some loblolly pine trees.

> Loblolly[1:10,]
   height age Seed
1    4.51   3  301
15  10.89   5  301
29  28.72  10  301
43  41.74  15  301
57  52.70  20  301
71  60.92  25  301
2    4.55   3  303
16  10.92   5  303
30  29.07  10  303
44  42.83  15  303

First, we need to convert the data to wide form, so each age (i.e. 3, 5, 10, 15, 20, 25) will have its own variable.

wide <- reshape(Loblolly,
    v.names = "height",
    timevar = "age",
    idvar = "Seed",
    direction = "wide")

> wide[1:5,]
  Seed height.3 height.5 height.10 height.15 height.20 height.25
1  301     4.51    10.89     28.72     41.74     52.70     60.92
2  303     4.55    10.92     29.07     42.83     53.88     63.39
3  305     4.79    11.37     30.21     44.40     55.82     64.10
4  307     3.91     9.48     25.66     39.07     50.78     59.07
5  309     4.81    11.20     28.66     41.66     53.31     63.05

Create Variables

Then we want to create new columns showing how much each tree has grown between data points. For example, instead of knowing a tree's height at age 10, we want to know how much it's grown between age 5 and age 10, so that can be a bar in our graph.

wide$h0.3 <- wide$height.3
wide$h3.5 <- wide$height.5 - wide$height.3
wide$h5.10 <- wide$height.10 - wide$height.5
wide$h10.15 <- wide$height.15 - wide$height.10
wide$h15.20 <- wide$height.20 - wide$height.15
wide$h20.25 <- wide$height.25 - wide$height.20

Plot Stacked Bar Chart

Finally, we want to plot all the new data points:

library(RColorBrewer)
sequential <- brewer.pal(6, "BuGn")
barplot(t(wide[,8:13]),
    names.arg = wide$Seed, # x-axis labels
    cex.names = 0.7, # makes x-axis labels small enough to show all
    col = sequential, # colors
    xlab = "Seed Source",
    ylab = "Height, Feet",
    xlim = c(0,20), # these two lines allow space for the legend
    width = 1) # these two lines allow space for the legend
legend("bottomright", 
    legend = c("20-25", "15-20", "10-15", "5-10", "3-5", "0-3"), #in order from top to bottom
    fill = sequential[6:1], # 6:1 reorders so legend order matches graph
    title = "Years")

Stacked bar chart

If you decide you'd rather have clustered bars instead of stacked bars, you can just add the option beside = TRUE to the barplot.

The full code is available in a gist.

Citations and Further Reading

Tagged , , , , ,

Sorting Within Lattice Graphics in R

Default

By default, lattice sorts the observations by the axis values, starting at the bottom left.

For example,

library(lattice)
colors = c("#1B9E77", "#D95F02", "#7570B3")
dotplot(rownames(mtcars) ~ mpg,
    data = mtcars,
    col = colors[1],
    pch = 1)

produces:

Default lattice dotplot

(Note: The rownames(cars) bit is just because of how this data set is stored. For your data, you might just type the variable name (model, for example) instead.)

Graphing one variable, sorting by another

If we want to show the same data, but we want to sort by another variable (or the same variable, for that matter), we can just add reorder(yvar, sortvar). For example, to sort by the number of cylinders, we could:

dotplot(reorder(rownames(mtcars), cyl) ~ mpg,
    data = mtcars,
    col = colors[1],

Sorted by number of cylinders

Graphing two variables

To better show how this works, let's graph cyl alongside mpg, so we can see how it is sorting:

dotplot(reorder(rownames(mtcars), cyl) ~ mpg + cyl,
    data = mtcars,
    col = colors,
    pch = c(1, 0))
    pch = 1)

Graph of mpg and cyl, sorted by cyl

Reverse order

We can also sort in reverse order, by adding a "-" before the variable name:

dotplot(reorder(rownames(mtcars), -cyl) ~ mpg + cyl,
    data = mtcars,
    col = colors,
    pch = c(1, 0))

Graph of mpg and cyl, sorted by cyl, reversed

Adding a legend

We can also add a legend:

dotplot(reorder(rownames(mtcars), -cyl) ~ mpg + cyl,
    data = mtcars,
    xlab = "",
    col = colors,
    pch = c(1, 0),
    key = list(points = list(col = colors[1:2], pch = c(1, 0)),
        text = list(c("Miles per gallon", "Number of cylinders")),
        space = "bottom"))

With legend

Other lattice types

The same technique will work with other lattice graphs, such as barchart, bwplot, and stripplot.

Full code available as a gist.

Tagged , , , , , , ,