video

How to create color-coded calendars in R

Track your goals with color-coded calendars created with R packages ggplot2 and ggcal

How to create color-coded calendars in R
Thinkstock

A color-coded calendar can be a quick and easy way to see whether you’re achieving a daily goal. Did you meet a daily business metric like sales or social-media posts? Or, how are you doing with personal goals, like exercising every day? With one glance, you can get a feel for how you’ve been doing. It’s great for tracking those New Year’s resolutions—and a whole lot more.

R can help. For this example, I’ll create a calendar that tracks daily exercise—more specifically, whether you did cardio, did strength training, or rested each day.

You need to get your data before you can visualize it. For simple manual data entry, I usually use Microsoft Excel or Google Sheets. (As much of an R enthusiast as I am, R generally isn’t ideal for data entry.) 

One way to set up the spreadsheet is with two columns: one for Day and another for Activity. What I don’t want to do, however, is enter freeform text into a column where R expects specific categories. Even if I’ll remember that the exact format for strength training is “strength training” and not “weights,” there’s always the risk of typos. So, I suggest either creating a form for spreadsheet data entry or adding data validation to the category column (in this case, Activity).

Data options column Sharon Machlis/IDG

Acceptable data-entry options in Excel

For a task like this, I prefer data validation instead of complicating things with a separate form. An easy way to set up the validation is to create a column of acceptable options in another tab—in this case, Cardio and Strength Training. Next, select the cells where you want to restrict data entry—in this case, the whole Activity row except for the header.

Then choose Data Validation in the Excel data ribbon and select list, and enter the cells with the acceptable options in the source field. Now you can enter the data you want to use in R.

Excel data validation Sharon Machlis/IDG

Excel data validation

To make an easy color-coded calendar, I’ll use the ggplot2 library and the ggcal package by Jay Jacobs on GitHub. I’ll also load dplyr, because I almost always end up using dplyr, whatever I’m doing; readxl to read the spreadsheet; and lubridate to work with dates.

Install the ggcal package if it’s not yet on your system with devtools::install_github("jayjacobs/ggcal") or remotes::install_github("jayjacobs/ggcal") .

Here’s code to load needed packages and import data from a spreadsheet called tracker.xlsx into an R object called daily_exercise:

library(ggplot2)
library(ggcal)
library(dplyr)
library(readxl)
library(lubridate)
daily_exercise <- readxl::read_xlsx("tracker.xlsx", col_types = c("date", "text"))

If you want to follow along with the sample data I’m using but don’t want to set up a tracker.xlsx spreadsheet right now, there’s code to create that initial daily_exercise object at the end of this article. (You’ll need the tibble package installed.)

The readxl package imports dates as POSIXct objects, but the ggcal function wants them as Dates. You’ll need to change the column class with daily_exercise <- mutate(daily_exercise, Day = as.Date(Day)).

The daily_exercise data frame only has a few days of the month. If you want an entire month’s calendar to print, you’ll need to fill in the rest of month with additional code. Here’s one way to do that (explanation below the code):

last_day_in_file <- max(daily_exercise$Day)
end_this_month <- as.Date(cut(last_day_in_file, "month")) + months(1) - 1
alldates <- data.frame(Day = seq.Date(min(daily_exercise$Day), end_this_month, by ="1 day"))
daily_exercise <- left_join(alldates, daily_exercise)

Line 1 finds the latest date in the data frame. Line 2 calculates the last day of the month for that date, in a bit of a roundabout way. Initially, I calculate the first day of the month for that last date in the file—that would be January 1 for any date in January—and sets it to be a Date class. I then add one month to the result; in this case, the value is February 1 for any date in January. I don’t want February 1, though; I want 1 day earlier than that. So I subtract 1 (which means one day), and then I’ve got the end of the month. Why? It’s a lot easier to find the beginning of a month, which is always the first, than the end of a month, which can be the 28th, 29th, 30th, or 31st.)

Line 3 generates all dates starting with the earliest date in my data and ending with the end of the month that we just calculated. I can use base R’s seq.Date() function, creating a sequence incrementing by 1 day. I store that in a new data frame with one column.

Why did I create a data frame of 1 column instead of a vector? Because now I can use a dplyr left_join() to combine the two data frames. A left join keeps everything in the left, or first, data frame (in this case alldates) and merges it with a second data frame (daily_exercise) by a common column (Day).  Now, the data is ready for ggcal.

The syntax for the ggcal function is ggcal(myDateVector, myDataVector)—in other words, dates as the first argument and values as the second argument. The values can be categories, like we’re using now, or numbers, if you want a calendar heatmap. Run 

ggcal(daily_exercise$Day, daily_exercise$Activity)

and you should see a color-coded calendar visualization with ggplot2 default colors.

color-coded calendar Sharon Machlis/IDG

A color-coded calendar created with the ggcal package using default ggplot2 colors.

Customize colors

If you want to set your own color scheme, you can use the same functions you’d use for other ggplot2 visualizations. For example, below I used scale_fill_manual() and added a legend name, color values for each category, and a lighter grey color for NA values. That last theme() line adds back a title for the legend.

ggcal(daily_exercise$Day, daily_exercise$Activity) +
scale_fill_manual(name ="Exercise",
values = c(
"Cardio" ="steelblue",
"Strength Training" ="forestgreen"
),
na.value ="grey88"
) +
theme(legend.title = element_text())
ggcal calendar with customized colors Sharon Machlis/IDG

A color-coded calendar with customized colors

Calendar heatmap

I set up another Excel worksheet that includes minutes in addition to categories for daily exercise, so I can demonstrate a calendar heatmap. Code to create that second daily_exercise object is at the end of the article.

I process that data for ggcal in the same way that I did for the first version: changing the Day column to Date objects and merging it with my alldates data frame to fill in blank values for the rest of the current month.

daily_exercise <- mutate(daily_exercise, Day = as.Date(Day)) 
daily_exercise <- left_join(alldates, daily_exercise)

Here’s what a heatmap of minutes looks like with ggcal defaults:

ggcal(daily_exercise$Day, daily_exercise$Minutes)
ggcal default heatmap Sharon Machlis/IDG

A calendar heatmap with ggcal and ggplot2 default colors

I’d rather have the darkest color for the highest number of minutes, though, not the lowest. And, I’d like a lighter gray for the empty blocks. Here’s code for that:

ggcal(daily_exercise$Day, daily_exercise$Minutes) +
scale_fill_gradient(low ="#f7fbff", high ="#08519c", na.value ="grey75")
ggcal heatmap with customized color palette Sharon Machlis/IDG

A ggcal heatmap with a color palette going from light for low numbers to dark with high numbers

Other ggplot2 customizations work as well, such as the scale_fill_distiller() function to use an RColorBrewer palette for continuous, numerical data. Below, I use a yellow-to-orange-to-red palette. 

ggcal(daily_exercise$Day, daily_exercise$Minutes) +
scale_fill_distiller(palette ="YlOrRd", na.value ="grey75")
Calendar heatmap with an RColorBrewer palette. Sharon Machlis/IDG

A calendar heatmap created with ggcal and an RColorBrewer palette.

Check out the video at the top of this story to see all of this in action! And for more R tips, head to the Do More With R video page.

Code to create the first daily_exercise object

datapasta::df_paste()
daily_exercise <- tibble::tibble(
Day = as.POSIXct(c("2019-01-01", "2019-01-02", "2019-01-03", "2019-01-04",
"2019-01-05", "2019-01-06", "2019-01-07", "2019-01-08"), tz ="UTC"),
Activity = c("Cardio", "Strength Training", "Cardio", "Cardio", NA,
"Strength Training", "Cardio", "Cardio")

Code to create the second daily_exercise object with minutes

daily_exercise <- tibble::tibble(
Day = as.POSIXct(c("2019-01-01", "2019-01-02", "2019-01-03", "2019-01-04",
"2019-01-05", "2019-01-06", "2019-01-07", "2019-01-08"), tz ="UTC"),
Activity = c("Cardio", "Strength Training", "Cardio", "Cardio", NA,
"Strength Training", "Cardio", "Cardio"),
Minutes = c(40, 35, 30, 60, 0, 25, 45, 40)
)

Copyright © 2019 IDG Communications, Inc.