We ran into an interesting problem – at some point of time our Rails application started to fail occaionally because of REE segfaults on startup. Even starting the console with ‘script/console production’ was occasionally failing with REE segfault. Application was growing, new features were added and segfaults started happening more and more often. There was no one single place where crashes occurred, so there was no clear understanding how to tackle this problem.
Examples of crashes we observed:
/vendor/rails/actionpack/lib/action_controller/routing/route.rb:205):2: [BUG] Segmentation fault /opt/ruby-enterprise-1.8.7-2011.03/lib/ruby/1.8/yaml.rb:133: [BUG] Segmentation fault /vendor/rails/activesupport/lib/active_support/vendor/i18n-0.3.7/i18n/ backend/base.rb:257: [BUG] Segmentation fault /vendor/rails/actionpack/lib/action_view/template.rb:226: [BUG] Segmentation fault /opt/ruby-enterprise-1.8.7-2011.03/lib/ruby/gems/1.8/gems/pauldix-sax-machine-0.0.14/ lib/sax-machine/sax_document.rb:30: [BUG] Segmentation fault /vendor/rails/activesupport/lib/active_support/memoizable.rb:32: [BUG] Segmentation fault
After banging my head against the wall for a week I found a solution (even two) and what might seem to be a likely reason for the segfaults. Two “suspects” – lack of available memory and incorrect version of libxml were ruled out. What seems to be the actual reason is the total size of the localisation files in config/locales loaded upon startup:
$ du -shb config/locales 1665858 config/locales
$ cd config/locales $ find . -type f | wc -l 805
So ~1.6Mb in 805 files give occasional segfaults. Adding 200Kb of localisation files more started giving 100% segfaults on script/console startup.
Now I’ve found two workarounds for this problem.
1. Recompile REE with –no-tcmalloc flag
Note that on 64-bit platforms tcmalloc is disabled by default.
2. Enable large pages feature in tcmalloc
This is described in tcmalloc documentation as: “Internally, tcmalloc divides its memory into “pages.” The default page size is chosen to minimize memory use by reducing fragmentation. The cost is that keeping track of these pages can cost tcmalloc time. We’ve added a new, experimental flag to tcmalloc that enables a larger page size. In general, this will increase the memory needs of applications using tcmalloc. However, in many cases it will speed up the applications as well, particularly if they allocate and free a lot of memory. We’ve seen average speedups of 3-5% on Google applications.”
There’s a warning – “this feature is still very experimental”, but it works to solve the problem with too many localisation files.
To compile REE with tcmalloc with large pages enables I just edited ruby-enterprise-1.8.7-2011.03/source/distro/google-perftools-1.7/src/common.h – replaced
#if defined(TCMALLOC_LARGE_PAGES) static const size_t kPageShift = 15; static const size_t kNumClasses = 95; static const size_t kMaxThreadCacheSize = 4 << 20; #else static const size_t kPageShift = 12; static const size_t kNumClasses = 61; static const size_t kMaxThreadCacheSize = 2 << 20; #endif
static const size_t kPageShift = 15; static const size_t kNumClasses = 95; static const size_t kMaxThreadCacheSize = 4 << 20;
On production servers I opted for no tcmalloc for now – but I hope there’ll be a better way to deal with this issue soon.