(Update: The functions introduced here is deprecated now. Please use
gghighlight(), which is far nicer one)
Suppose we have a data that has too many series like this:
set.seed(2) d <- purrr::map_dfr( letters, ~ data.frame(idx = 1:400, value = cumsum(runif(400, -1, 1)), type = ., stringsAsFactors = FALSE))
For such data, it is almost impossible to identify a series by its colour as their differences are so subtle.
library(ggplot2) ggplot(d) + geom_line(aes(idx, value, colour = type))
Highlight lines with ggplot2 + dplyr
So, I am motivated to filter data and map colour only on that, using dplyr:
library(dplyr, warn.conflicts = FALSE) d_filtered <- d %>% group_by(type) %>% filter(max(value) > 20) %>% ungroup() ggplot() + # draw the original data series with grey geom_line(aes(idx, value, group = type), data = d, colour = alpha("grey", 0.7)) + # colourise only the filtered data geom_line(aes(idx, value, colour = type), data = d_filtered)
But, what if I want to change the threshold in predicate (
max(.data$value) > 20) and highlight other series as well? It’s a bit tiresome to type all the code above again every time I replace
20 with some other value.
Highlight lines with gghighlight
gghighlight_line() is the one for lines. The code equivalent to above (and more) can be this few lines:
library(gghighlight) gghighlight_line(d, aes(idx, value, colour = type), predicate = max(value) > 20)
gghighlight_*() returns a ggplot object, it is fully customizable just as we usually do with ggplot2 like custom themes and facetting.
library(ggplot2) gghighlight_line(d, aes(idx, value, colour = type), max(value) > 20) + theme_minimal()
gghighlight_line(d, aes(idx, value, colour = type), max(value) > 20) + facet_wrap(~ type)
predicate per group, more precisely,
dplyr::summarise(). So if the predicate expression returns multiple values per group, it ends up with an error like this:
gghighlight_line(d, aes(idx, value, colour = type), value > 20) #> Error in summarise_impl(.data, dots): Column `predicate..........` must be length 1 (a summary value), not 400
Highlight points with gghighlight
gghighlight_point() highlight points. While
predicate by grouped calculation (
dplyr::group_by()), by default, this function evaluates it by ungrouped calculation.
set.seed(19) d2 <- sample_n(d, 100L) gghighlight_point(d2, aes(idx, value), value > 10) #> Warning in gghighlight_point(d2, aes(idx, value), value > 10): Using type #> as label for now, but please provide the label_key explicity!
As the job is done without grouping, it’s better to provide
gghighlight_point() a proper key for label, though it tries to choose proper one automatically. Specifying
label_key = type will stop the warning above:
gghighlight_point(d2, aes(idx, value), value > 10, label_key = type)
You can control whether to do things with grouping by
use_group_by argument. If this set to
predicate by grouped calculation.
gghighlight_point(d2, aes(idx, value, colour = type), max(value) > 15, label_key = type, use_group_by = TRUE)
(Does “non-logical predicate” make sense…? Due to my poor English skill, I couldn’t come up with a good term other than this. Any suggestions are wellcome.)
By the way, to construct a predicate expression like bellow, we need to determine a threshold (in this example,
20). But it is difficult to choose a nice one before we draw plots. This is a chicken or the egg situation.
max(value) > 20
gghiglight_*() allows predicates that will be evaluated into non-logical values. The result value will be used to sort data, and the top
max_highlight data points/series will be highlighted. For example:
gghighlight_line(d, aes(idx, value, colour = type), predicate = max(value), max_highlight = 6)
Seems cool? gghighlight is good to explore data by changing a threshlold little by little. But, the internals are not so efficient, as it does almost the same calculation everytime you execute
gghighlight_*(), which may get slower when it works with larger data. Consider doing this by using vanilla dplyr to filter data.
gghighlight package is a tool to highlight charactaristic data series among too many ones. Please try!
Bug reports or feature requests are welcome! -> https://github.com/yutannihilation/gghighlight/issues