Skip to content

Heat map visualization of sick day trends in Finland with R, ggplot2 and Google Correlate

Inspired by Margintale’s post “ggplot2 Time Series Heatmaps” and Google Flu Trends I decided to use a heat map to visualize sick days logged by HeiaHeia.com Finnish users.

I got the data from our database, filtering results by country (Finnish users only) in a tab separated form with the first line as the header. Three columns contained date, count of sick days logged on that date and count of Finnish users in the service on that date.

date count(*) user_cnt
2011-01-01 123 12345
2011-01-02 456 67890
...

Below is R source code for plotting the heat map. I made some small changes to the original code:

  • data normalization (line 9): this is specific to the data used in this example
  • days of the week have to be 1..7, not 0..6 as returned by $wday (line 19): dat$weekday = as.numeric(format(as.POSIXlt(dat$date),”%u”))
  • date format (line 31): week of year calculation required date conversion to POSIX dat$week <- as.numeric(format(as.POSIXlt(dat$date),”%W”))
  • custom header for the legend (line 39): adding + labs(fill=”per user per day”) allows you to customize legend header
require(zoo)
require(ggplot2)
require(plyr)

dat<-read.csv("~/data/sick_days_per_day.txt",header=TRUE,sep="\t")</pre>
colnames(dat) <- c("date", "count", "user_cnt")

# normalize data by number of users on each date
dat$norm_count <- dat$count / dat$user_cnt

# facet by year ~ month, and each subgraph will show week-of-month versus weekday the year is simple
dat$year<-as.numeric(as.POSIXlt(dat$date)$year+1900)
dat$month<-as.numeric(as.POSIXlt(dat$date)$mon+1)

# turn months into ordered facors to control the appearance/ordering in the presentation
dat$monthf<-factor(dat$month,levels=as.character(1:12),labels=c("Jan","Feb","Mar","Apr","May","Jun","Jul","Aug","Sep","Oct","Nov","Dec"),ordered=TRUE)

# the day of week is again easily found
dat$weekday = as.numeric(format(as.POSIXlt(dat$date),"%u"))

# again turn into factors to control appearance/abbreviation and ordering
# I use the reverse function rev here to order the week top down in the graph
# you can cut it out to reverse week order
dat$weekdayf<-factor(dat$weekday,levels=rev(1:7),labels=rev(c("Mon","Tue","Wed","Thu","Fri","Sat","Sun")),ordered=TRUE)

# the monthweek part is a bit trickier - first a factor which cuts the data into month chunks
dat$yearmonth<-as.yearmon(dat$date)
dat$yearmonthf<-factor(dat$yearmonth)

# then find the "week of year" for each day
dat$week <- as.numeric(format(as.POSIXlt(dat$date),"%W"))

# and now for each monthblock we normalize the week to start at 1
dat<-ddply(dat,.(yearmonthf),transform,monthweek=1+week-min(week))

# Now for the plot
P<- ggplot(dat, aes(monthweek, weekdayf, fill = dat$norm_count)) +
 geom_tile(colour = "white") + facet_grid(year~monthf) + scale_fill_gradient(low="green", high="red") +
 opts(title = "Time-Series Calendar Heatmap - HeiaHeia.com sick days logged") + xlab("Week of Month") + ylab("") + labs(fill="per user per day")
P

Here are the results. Green indicates the healthiest days with lowest values of sick days logged per user, red indicates the worst days with highest values of sick days logged per user. It’s quite clear that there are seasonal peaks around February, and 2011 was a lot worse than 2012 (one should note that January-February of 2011 were exceptionally cold in Finland). It matches quite well with the coverage in the national press: Flu season reaching peak (Feb’2012), Employers grapple with sick leaves brought by flu wave (Feb’2012).

It’s interesting that there are less sick days logged on the weekends than on the work days, and traditional holiday month of July is the healthiest month of all.


(click to see full-sized image)

To get a more formal validation of the data logged by HeiaHeia users, I used Google Correlate lab tool to check that heat map results make sense. I uploaded sick days per user weekly time series and plotted a correlation with Google search queries for “kuumeen hoito” (treatment of fever in Finnish).


(click to see full-sized image)

Pearson Correlation Coefficient r between HeiaHeia sick days time series and Google search activity σ (both normalized so that mean is 0 and standard deviation is 1) is 0.8257 – this is a pretty good match.

Share

19 Comments

  1. Hi there colleagues, nice article and pleasant arguments commented
    at this place, I am genuinely enjoying by these.

    Sunday, December 9, 2012 at 8:38 | Permalink
  2. Hi there every one, here every one is sharing such know-how, so it’s nice to read this web site, and I used to pay a quick visit this weblog daily.

    Tuesday, December 11, 2012 at 16:18 | Permalink
  3. Hello every one, here every person is sharing these
    kinds of experience, so it’s nice to read this webpage, and I used to visit this blog everyday.

    Wednesday, July 17, 2013 at 1:39 | Permalink
  4. air wrote:

    I think that is among the most important info for me. And i’m satisfied reading your article. However should statement on few basic issues, The web site style is great, the articles is really excellent : D. Excellent job, cheers

    Thursday, July 18, 2013 at 0:39 | Permalink
  5. This website was… how do you say it? Relevant!
    ! Finally I have found something which helped me.
    Thanks!

    Thursday, July 18, 2013 at 8:46 | Permalink
  6. This piece of writing is genuinely a nice one it helps new web people, who are wishing for
    blogging.

    Monday, July 22, 2013 at 17:13 | Permalink
  7. An outstanding share! I have just forwarded this onto
    a coworker who had been conducting a little homework on this.
    And he actually ordered me lunch because I stumbled
    upon it for him… lol. So allow me to reword this…

    . Thank YOU for the meal!! But yeah, thanx for spending time to discuss
    this subject here on your internet site.

    Wednesday, July 31, 2013 at 17:10 | Permalink
  8. It’s very effortless to find out any topic on net as compared to textbooks, as I found this paragraph at this web page.

    Saturday, August 3, 2013 at 2:40 | Permalink
  9. If some one desires expert view concerning blogging and site-building
    after that i advise him/her to pay a quick visit this webpage, Keep up the pleasant work.

    Tuesday, August 6, 2013 at 23:48 | Permalink
  10. John Nicholas wrote:

    could you post the file “sick_days_per_day.txt” or something like it?

    Friday, August 30, 2013 at 7:52 | Permalink
  11. Heya i am for the first time here. I found
    this board and I to find It truly helpful & it helped
    me out much. I’m hoping to provide one thing again and aid others
    like you helped me.

    Sunday, September 29, 2013 at 0:25 | Permalink
  12. Baltic amber wrote:

    you’re in point of fact a just right webmaster. The site loading pace is amazing.
    It seems that you are doing any distinctive trick.
    Furthermore, The contents are masterpiece. you have performed
    a great job on this topic!

    Wednesday, October 9, 2013 at 14:44 | Permalink
  13. You could definitely see your expertise in the work you write.
    The world hopes for more passionate writers like you who aren’t afraid
    to mention how they believe. Always go after your heart.

    Thursday, October 24, 2013 at 15:23 | Permalink
  14. http://www. wrote:

    Thank yyou for another fantastic article. The place else may anyone get that type of info in such an ideal way
    of writing? I’ve a presentation subsequent week, and I’m att the look for such information.

    Tuesday, December 17, 2013 at 22:15 | Permalink
  15. I am not certain the place you are getting your information, however good topic.
    I must spend some time finding out more or understanding more.

    Thanks for fantastic info I was lookinng for this
    info for my mission.

    Thursday, December 19, 2013 at 11:46 | Permalink
  16. It’s appropriate time to make some plans for the fuyture and it’s
    ttime to be happy. I’ve reaad this post and if I could I want too suggest you
    some interesting things or tips. Perhaps yoou can write next articles rwferring to tis article.
    I want to read even more things about it!

    Saturday, February 15, 2014 at 13:53 | Permalink
  17. Caleb wrote:

    Hey I know this is offf topic but I was wondering iif youu knew of
    any widgets I could add to my blog that automatically tweet my
    newest twitter updates. I’ve been looking for a plug-in like this for qiite some time and was hoping maybe
    you would have some experience with something like this. Please let me know iff you run into anything.

    I truly enjoy reading your blog and I look forward to your new updates.

    Friday, May 30, 2014 at 11:30 | Permalink
  18. Cliff wrote:

    Like a pet owner, you may put your dog to work assisting
    you to shed weight. You need to know exactly how you will get from where you are to where you want to be.
    I’ve had a customer of mine decide to give up his
    long time love of chocolate, and decide to just focus on calorie
    counting, minor exercise, and portion sizing.

    Wednesday, October 8, 2014 at 22:22 | Permalink
  19. En la versión doscientos once.cuatrocientos siete
    de WhatsApp ya es posible sólo salir de una charla,
    esto es, se puede dejar de participar sin que se borre de forma
    automática el registro de charla a fin de que los usuarios puedan revisarlo en caso necesario.
    La versión doscientos once.cuatrocientos siete
    está disponible desde la sección de descargas de la página oficial de WhatsApp para Android, iOS, Windows
    Phone, Symbian (el antiguo sistema operativo de Nokia) y BlackBerry En los próximos días
    se espera que llegue a las tiendas de aplicaciones
    oficiales.

    Sunday, October 19, 2014 at 18:37 | Permalink

2 Trackbacks/Pingbacks

  1. […] example, or even mental health issues. One Finnish data junkie, for example, has used the data to compare it to sick days taken. It correlates. . . […]

  2. […] example, or even mental health issues. One Finnish data junkie, for example, has used the data to compare it to sick days taken. It correlates. . . […]

Post a Comment

Your email is never published nor shared. Required fields are marked *
*
*