Skip to content

Heat map visualization of sick day trends in Finland with R, ggplot2 and Google Correlate

Inspired by Margintale’s post “ggplot2 Time Series Heatmaps” and Google Flu Trends I decided to use a heat map to visualize sick days logged by HeiaHeia.com Finnish users.

I got the data from our database, filtering results by country (Finnish users only) in a tab separated form with the first line as the header. Three columns contained date, count of sick days logged on that date and count of Finnish users in the service on that date.

date count(*) user_cnt
2011-01-01 123 12345
2011-01-02 456 67890
...

Below is R source code for plotting the heat map. I made some small changes to the original code:

  • data normalization (line 9): this is specific to the data used in this example
  • days of the week have to be 1..7, not 0..6 as returned by $wday (line 19): dat$weekday = as.numeric(format(as.POSIXlt(dat$date),”%u”))
  • date format (line 31): week of year calculation required date conversion to POSIX dat$week <- as.numeric(format(as.POSIXlt(dat$date),”%W”))
  • custom header for the legend (line 39): adding + labs(fill=”per user per day”) allows you to customize legend header
require(zoo)
require(ggplot2)
require(plyr)

dat<-read.csv("~/data/sick_days_per_day.txt",header=TRUE,sep="\t")</pre>
colnames(dat) <- c("date", "count", "user_cnt")

# normalize data by number of users on each date
dat$norm_count <- dat$count / dat$user_cnt

# facet by year ~ month, and each subgraph will show week-of-month versus weekday the year is simple
dat$year<-as.numeric(as.POSIXlt(dat$date)$year+1900)
dat$month<-as.numeric(as.POSIXlt(dat$date)$mon+1)

# turn months into ordered facors to control the appearance/ordering in the presentation
dat$monthf<-factor(dat$month,levels=as.character(1:12),labels=c("Jan","Feb","Mar","Apr","May","Jun","Jul","Aug","Sep","Oct","Nov","Dec"),ordered=TRUE)

# the day of week is again easily found
dat$weekday = as.numeric(format(as.POSIXlt(dat$date),"%u"))

# again turn into factors to control appearance/abbreviation and ordering
# I use the reverse function rev here to order the week top down in the graph
# you can cut it out to reverse week order
dat$weekdayf<-factor(dat$weekday,levels=rev(1:7),labels=rev(c("Mon","Tue","Wed","Thu","Fri","Sat","Sun")),ordered=TRUE)

# the monthweek part is a bit trickier - first a factor which cuts the data into month chunks
dat$yearmonth<-as.yearmon(dat$date)
dat$yearmonthf<-factor(dat$yearmonth)

# then find the "week of year" for each day
dat$week <- as.numeric(format(as.POSIXlt(dat$date),"%W"))

# and now for each monthblock we normalize the week to start at 1
dat<-ddply(dat,.(yearmonthf),transform,monthweek=1+week-min(week))

# Now for the plot
P<- ggplot(dat, aes(monthweek, weekdayf, fill = dat$norm_count)) +
 geom_tile(colour = "white") + facet_grid(year~monthf) + scale_fill_gradient(low="green", high="red") +
 opts(title = "Time-Series Calendar Heatmap - HeiaHeia.com sick days logged") + xlab("Week of Month") + ylab("") + labs(fill="per user per day")
P

Here are the results. Green indicates the healthiest days with lowest values of sick days logged per user, red indicates the worst days with highest values of sick days logged per user. It’s quite clear that there are seasonal peaks around February, and 2011 was a lot worse than 2012 (one should note that January-February of 2011 were exceptionally cold in Finland). It matches quite well with the coverage in the national press: Flu season reaching peak (Feb’2012), Employers grapple with sick leaves brought by flu wave (Feb’2012).

It’s interesting that there are less sick days logged on the weekends than on the work days, and traditional holiday month of July is the healthiest month of all.


(click to see full-sized image)

To get a more formal validation of the data logged by HeiaHeia users, I used Google Correlate lab tool to check that heat map results make sense. I uploaded sick days per user weekly time series and plotted a correlation with Google search queries for “kuumeen hoito” (treatment of fever in Finnish).


(click to see full-sized image)

Pearson Correlation Coefficient r between HeiaHeia sick days time series and Google search activity σ (both normalized so that mean is 0 and standard deviation is 1) is 0.8257 – this is a pretty good match.

Share

13 Comments

  1. Hi there colleagues, nice article and pleasant arguments commented
    at this place, I am genuinely enjoying by these.

    Sunday, December 9, 2012 at 8:38 | Permalink
  2. Hi there every one, here every one is sharing such know-how, so it’s nice to read this web site, and I used to pay a quick visit this weblog daily.

    Tuesday, December 11, 2012 at 16:18 | Permalink
  3. Appreciating the time and energy you put into your blog and in depth information you present.
    It’s good to come across a blog every once in a while that isn’t the same out of date rehashed material.
    Excellent read! I’ve saved your site and I’m including
    your RSS feeds to my Google account.

    Monday, February 25, 2013 at 7:25 | Permalink
  4. Hi there are using WordPress for your blog platform?

    I’m new to the blog world but I’m trying to get started and create my
    own. Do you require any html coding expertise to make your own blog?
    Any help would be really appreciated!

    Thursday, March 28, 2013 at 8:20 | Permalink
  5. A motivating discussion is definitely worth comment.
    I think that you need to write more about this subject, it may
    not be a taboo matter but usually people do not speak about these subjects.
    To the next! All the best!!

    Sunday, April 7, 2013 at 15:13 | Permalink
  6. gov.cn wrote:

    Hello there, You’ve done a fantastic job.

    Thursday, April 11, 2013 at 4:28 | Permalink
  7. You capable to hit the nail upon the most effective and also defined your event without needing side-effect , people today can have to have a
    signal. Will likely go back to their office for substantially
    more. Thanks

    Sunday, April 14, 2013 at 23:33 | Permalink
  8. http:// wrote:

    I have to thnkx to the efforts you’ve take note of this internet site. I will be hoping exactly the same high-grade web page post from you within the upcoming also. Actually your original writing abilities has encouraged me to obtain our site now. Actually the blogging is spreading its wings quickly. Your make note of is a good sort of it.

    Tuesday, April 16, 2013 at 14:19 | Permalink
  9. whatsapp wrote:

    Thanks for your marvelous posting! I really enjoyed reading it,
    you can be a great author. I will be sure to bookmark your blog
    and may come back later on. I want to encourage
    one to continue your great writing, have a nice holiday weekend!

    Sunday, April 21, 2013 at 12:51 | Permalink
  10. Ford ranger wrote:

    Wow that was odd. I just wrote an incredibly long comment but after I clicked submit my comment
    didn’t show up. Grrrr… well I’m not writing all that over again.
    Anyway, just wanted to say fantastic blog!

    Sunday, April 21, 2013 at 23:53 | Permalink
  11. My coder is trying to persuade me to move to .net from PHP.

    I have always disliked the idea because of the expenses. But he’s tryiong none the less. I’ve been using Movable-type on several websites for about
    a year and am anxious about switching to another platform. I have heard great things about blogengine.
    net. Is there a way I can import all my wordpress posts into it?
    Any help would be really appreciated!

    Monday, April 29, 2013 at 5:02 | Permalink
  12. Thank you for the auspicious writeup. It in truth was a entertainment
    account it. Glance complicated to far introduced
    agreeable from you! However, how could we keep in touch?

    Thursday, May 2, 2013 at 18:10 | Permalink
  13. I nurture such info a lot.

    Thursday, May 2, 2013 at 21:57 | Permalink

2 Trackbacks/Pingbacks

  1. [...] example, or even mental health issues. One Finnish data junkie, for example, has used the data to compare it to sick days taken. It correlates. . . [...]

  2. [...] example, or even mental health issues. One Finnish data junkie, for example, has used the data to compare it to sick days taken. It correlates. . . [...]

Post a Comment

Your email is never published nor shared. Required fields are marked *
*
*