Tagged with abline

Line Breaks Between Words in Axis Labels in ggplot in R

Sometimes when plotting factor variables in R, the graphics can look pretty messy thanks to long factor levels. If the level attributes have multiple words, there is an easy fix to this that often makes the axis labels look much cleaner.

Without Line Breaks

Here's the messy looking example:

No line breaks in axis labels

And here's the code for the messy looking example:

library(OIdata)
data(birds)
library(ggplot2)

ggplot(birds,
    aes(x = effect,
        y = speed)) +
geom_boxplot()

With Line Breaks

We can use regular expressions to add line breaks to the factor levels by substituting any spaces with line breaks:

library(OIdata)
data(birds)
library(ggplot2)

levels(birds$effect) <- gsub(" ", "\n", levels(birds$effect))
ggplot(birds,
    aes(x = effect,
        y = speed)) +
geom_boxplot()

Line breaks in axis labels

Just one line made the plot look much better, and it will carry over to other plots you make as well. For example, you could create a table with the same variable.

Horizontal Boxes

Here we can see the difference in a box plot with horizontal boxes. It's up to you to decide which style looks better:

No line breaks in axis labels

Line breaks in axis labels

library(OIdata)
data(birds)
library(ggplot2)

levels(birds$effect) <- gsub(" ", "\n", levels(birds$effect))
ggplot(birds,
    aes(x = effect,
        y = speed)) +
geom_boxplot() + 
coord_flip()

Just a note: if you're not using ggplot, the multi-line axis labels might overflow into the graph.

The code is available in a gist.

Citations and Further Reading

In a comment, Jason Bryer mentioned that you can also break the lines by using a set character width instead of breaking at every space. Here's the code he suggested: :::r sapply(strwrap(as.character(value), width=25, simplify=FALSE), paste, collapse="\n")

Tagged , , , , , , ,

Adding Measures of Central Tendency to Histograms in R

Building on the basic histogram with a density plot, we can add measures of central tendency (in this case, mean and median) and a legend.

Like last time, we'll use the beaver data from the datasets package.

hist(beaver1$temp, # histogram
    col = "peachpuff", # column color
    border = "black", 
    prob = TRUE, # show densities instead of frequencies
    xlim = c(36,38.5),
    ylim = c(0,3),
    xlab = "Temperature",
    main = "Beaver #1")
lines(density(beaver1$temp), # density plot
    lwd = 2, # thickness of line
    col = "chocolate3")

Next we'll add a line for the mean:

abline(v = mean(beaver1$temp),
    col = "royalblue",
    lwd = 2)

And a line for the median:

abline(v = median(beaver1$temp),
    col = "red",
    lwd = 2)

And then we can also add a legend, so it will be easy to tell which line is which. :::r legend(x = "topright", # location of legend within plot area c("Density plot", "Mean", "Median"), col = c("chocolate3", "royalblue", "red"), lwd = c(2, 2, 2))

All of this together gives us the following graphic:

Beaver #1 central tendency

In this example, the mean and median are very close, as we can see by using median() and mode().

> mean(beaver1$temp)
[1] 36.86219
> median(beaver1$temp)
[1] 36.87

We can do like we did in the previous post and graph beaver1 and beaver2 together by adding a layout line and changing the limits of x and y. The full code for this is available in a gist.

Here's the output from that code:

Beaver #1 and #2 central tendency

Tagged , , ,