According to Evan Weaver from Twitter it is possible for a typical production Rails app on Ruby 1.8 to recover 20% to 40% of user CPU by simply adjusting Ruby garbage collector settings. In August I set out on a quest to verify that statement on HeiaHeia servers. Results have really exceeded my expectations. Time to execute application tests locally decreased by 46%. On production servers CPU utilisation decreased by almost 40%.
But let’s start from the beginning. I should say right away that we at HeiaHeia are using Ruby Enterprise Edition, so I didn’t have to apply patches to Ruby source code that Evan is talking about in his post. Before starting to analyse GC current usage it will be useful to read a brilliant overview of Ruby garbage collection by Joe Damato. It’ll help to understand what’s going to happen next. It is also useful to read REE documentation on garbage collector performance tuning.
And as before any optimisation it is good to get the reference metrics, so that you know you actually improved something by changing settings, and didn’t make it worse. In this case I measured:
- number of garbage collector calls when loading HeiaHeia-feed
- local tests execution time (unit + functional)
- application server CPU load and average response time
To measure number of garbage collector calls per one page I used Scrap – a nice tool by Chris Heald. Chris also describes the tuning process in great detail, so I’m not going to repeat it – just go and read his blog.
To measure local test execution time I just ran rake test 5 times and took the average of all runs.
To measure application server CPU load and average response time I used NewRelic tool (free version should be enough to do the measurement).
When I first loaded feed page with Scrap enabled I saw 36 GC cycles, and Ruby spend 1.12s in GC cycles (these figures are from the development server, so response time is big). After playing a bit with the settings and monitoring GC cycles number and unused heap after each allocation with Scrap, I ended up with the same settings as Twitter uses in production:
export RUBY_HEAP_MIN_SLOTS=500000 export RUBY_HEAP_SLOTS_INCREMENT=250000 export RUBY_HEAP_SLOTS_GROWTH_FACTOR=1 export RUBY_GC_MALLOC_LIMIT=50000000
These settings reduced GC cycles number down to 7 (from 36), and Ruby spent now only 0.62s in GC instead of 1.12s when loading feed page (again, load times are bigger on our development server).
After introducing same settings on my local machine project tests took only 148s – down from 274 seconds before optimisation – a whopping 46% improvement.
We have multiple identical application servers, so I introduced the new settings only on one of the application servers, to compare the results on a live system (during low traffic hours). Here’s the picture from NewRelic:
Server Instances Apdex Resp.time Throughput CPU Memory hh-app1 3 Instances 0.86 407 ms 30 rpm 14 % 349 MB hh-app2 3 Instances 0.91 311 ms 30 rpm 8 % 413 MB
hh-app2 had optimised garbage collector. With the same throughput CPU load was only 8% vs 14% with non-optimised GC. However that improvement came at a cost of increased memory consumption (413M vs 349M). However response time and lower CPU load proved to be a lot more important than memory consumption, so I rolled out the new settings on all production servers.
Making nginx utilise GC settings when spawning Passenger instances is easy, and is well described by Chris. Here are instructions that work nice with server and Nginx setup on Ubuntu as I described in earlier posts.
Create /usr/local/bin/ruby-with-env file (as a root) that will set GC settings in the environment and then launch Ruby:
#!/bin/bash export RUBY_HEAP_MIN_SLOTS=1500000 export RUBY_HEAP_SLOTS_INCREMENT=500000 export RUBY_HEAP_SLOTS_GROWTH_FACTOR=1 export RUBY_GC_MALLOC_LIMIT=50000000 exec "/usr/local/bin/ruby" "$@"
sudo chmod a+x /usr/local/bin/ruby-with-env