Increasing Ruby interpreter performance by adjusting garbage collector settings

According to Evan Weaver from Twitter it is possible for a typical production Rails app on Ruby 1.8 to recover 20% to 40% of user CPU by simply adjusting Ruby garbage collector settings. In August I set out on a quest to verify that statement on HeiaHeia servers. Results have really exceeded my expectations. Time to execute application tests locally decreased by 46%. On production servers CPU utilisation decreased by almost 40%.

But let’s start from the beginning. I should say right away that we at HeiaHeia are using Ruby Enterprise Edition, so I didn’t have to apply patches to Ruby source code that Evan is talking about in his post. Before starting to analyse GC current usage it will be useful to read a brilliant overview of Ruby garbage collection by Joe Damato. It’ll help to understand what’s going to happen next. It is also useful to read REE documentation on garbage collector performance tuning.

And as before any optimisation it is good to get the reference metrics, so that you know you actually improved something by changing settings, and didn’t make it worse. In this case I measured:

  • number of garbage collector calls when loading HeiaHeia-feed
  • local tests execution time (unit + functional)
  • application server CPU load and average response time

To measure number of garbage collector calls per one page I used Scrap – a nice tool by Chris Heald. Chris also describes the tuning process in great detail, so I’m not going to repeat it – just go and read his blog.

To measure local test execution time I just ran rake test 5 times and took the average of all runs.

To measure application server CPU load and average response time I used NewRelic tool (free version should be enough to do the measurement).

When I first loaded feed page with Scrap enabled I saw 36 GC cycles, and Ruby spend 1.12s in GC cycles (these figures are from the development server, so response time is big). After playing a bit with the settings and monitoring GC cycles number and unused heap after each allocation with Scrap, I ended up with the same settings as Twitter uses in production:

export RUBY_HEAP_MIN_SLOTS=500000
export RUBY_HEAP_SLOTS_INCREMENT=250000
export RUBY_HEAP_SLOTS_GROWTH_FACTOR=1
export RUBY_GC_MALLOC_LIMIT=50000000

These settings reduced GC cycles number down to 7 (from 36), and Ruby spent now only 0.62s in GC instead of 1.12s when loading feed page (again, load times are bigger on our development server).

After introducing same settings on my local machine project tests took only 148s – down from 274 seconds before optimisation – a whopping 46% improvement.

We have multiple identical application servers, so I introduced the new settings only on one of the application servers, to compare the results on a live system (during low traffic hours). Here’s the picture from NewRelic:

Server        Instances        Apdex          Resp.time    Throughput    CPU          Memory
hh-app1     3 Instances         0.86      	407 ms       30 rpm      14 %	      349 MB
hh-app2     3 Instances         0.91   	        311 ms	     30 rpm       8 %	      413 MB

hh-app2 had optimised garbage collector. With the same throughput CPU load  was only 8% vs 14% with non-optimised GC. However that improvement came at a cost of increased memory consumption (413M vs 349M). However response time and lower CPU load proved to be a lot more important than memory consumption, so I rolled out the new settings on all production servers.

Making nginx utilise GC settings when spawning Passenger instances is easy, and is well described by Chris. Here are instructions that work nice with server and Nginx setup on Ubuntu as I described in earlier posts.

Create /usr/local/bin/ruby-with-env file (as a root) that will set GC settings in the environment and then launch Ruby:

#!/bin/bash
export RUBY_HEAP_MIN_SLOTS=1500000
export RUBY_HEAP_SLOTS_INCREMENT=500000
export RUBY_HEAP_SLOTS_GROWTH_FACTOR=1
export RUBY_GC_MALLOC_LIMIT=50000000
exec "/usr/local/bin/ruby" "$@"
Make this file executable by all:
sudo chmod a+x /usr/local/bin/ruby-with-env
Now tell Passenger to use that file instead of launching Ruby directly – edit /opt/nginx/conf/nginx.conf and replace
passenger_ruby /usr/local/bin/ruby;
with
passenger_ruby /usr/local/bin/ruby-with-env;
Now restart Nginx – and you’ve got yourself a faster Ruby!

Comments

One response to “Increasing Ruby interpreter performance by adjusting garbage collector settings”

  1. Ruby GC tuning run my specs 1.5-2times faster. see Basecamp, Twitter and default. Use ruby enterprise to test https://gist.github.com/865706

Leave a Reply

Your email address will not be published. Required fields are marked *

This site uses Akismet to reduce spam. Learn how your comment data is processed.