REE segfaults when Rails application has too many localisation files

We ran into an interesting problem – at some point of time our Rails application started to fail occaionally because of REE segfaults on startup. Even starting the console with ‘script/console production’ was occasionally failing with REE segfault. Application was growing, new features were added and segfaults started happening more and more often. There was no one single place where crashes occurred, so there was no clear understanding how to tackle this problem.

Examples of crashes we observed:

/vendor/rails/actionpack/lib/action_controller/routing/route.rb:205):2:
   [BUG] Segmentation fault
/opt/ruby-enterprise-1.8.7-2011.03/lib/ruby/1.8/yaml.rb:133: 
   [BUG] Segmentation fault
/vendor/rails/activesupport/lib/active_support/vendor/i18n-0.3.7/i18n/
   backend/base.rb:257: [BUG] Segmentation fault
/vendor/rails/actionpack/lib/action_view/template.rb:226: [BUG] Segmentation fault
/opt/ruby-enterprise-1.8.7-2011.03/lib/ruby/gems/1.8/gems/pauldix-sax-machine-0.0.14/
   lib/sax-machine/sax_document.rb:30: [BUG] Segmentation fault
/vendor/rails/activesupport/lib/active_support/memoizable.rb:32: [BUG] Segmentation fault

After banging my head against the wall for a week I found a solution (even two) and what might seem to be a likely reason for the segfaults. Two “suspects” – lack of available memory and incorrect version of libxml were ruled out. What seems to be the actual reason is the total size of the localisation files in config/locales loaded upon startup:

$ du -shb config/locales
1665858    config/locales
$ cd config/locales
$ find . -type f | wc -l
805

So ~1.6Mb in 805 files give occasional segfaults. Adding 200Kb of localisation files more started giving 100% segfaults on script/console startup.

Now I’ve found two workarounds for this problem.

1. Recompile REE with –no-tcmalloc flag

./ruby-enterprise-1.8.7-2011.03/installer --no-tcmalloc

Note that on 64-bit platforms tcmalloc is disabled by default.

2. Enable large pages feature in tcmalloc

This is described in tcmalloc documentation as: “Internally, tcmalloc divides its memory into “pages.”  The default page size is chosen to minimize memory use by reducing fragmentation. The cost is that keeping track of these pages can cost tcmalloc time. We’ve added a new, experimental flag to tcmalloc that enables a larger page size.  In general, this will increase the memory needs of applications using tcmalloc.  However, in many cases it will speed up the applications as well, particularly if they allocate and free a lot of memory.  We’ve seen average speedups of 3-5% on Google applications.”

There’s a warning – “this feature is still very experimental”, but it works to solve the problem with too many localisation files.

To compile REE with tcmalloc with large pages enables I just edited ruby-enterprise-1.8.7-2011.03/source/distro/google-perftools-1.7/src/common.h – replaced

#if defined(TCMALLOC_LARGE_PAGES)
static const size_t kPageShift  = 15;
static const size_t kNumClasses = 95;
static const size_t kMaxThreadCacheSize = 4 << 20;
#else
static const size_t kPageShift  = 12;
static const size_t kNumClasses = 61;
static const size_t kMaxThreadCacheSize = 2 << 20;
#endif

with

static const size_t kPageShift  = 15;
static const size_t kNumClasses = 95;
static const size_t kMaxThreadCacheSize = 4 << 20;

On production servers I opted for no tcmalloc for now – but I hope there’ll be a better way to deal with this issue soon.


Comments

One response to “REE segfaults when Rails application has too many localisation files”

  1. […] Now enable tc_malloc large pages feature (32bit systems only): http://www.ivankuznetsov.com/2011/07/ree-segfaults-when-rails-application-has-too-many-localisation-… […]

Leave a Reply

Your email address will not be published. Required fields are marked *

This site uses Akismet to reduce spam. Learn how your comment data is processed.