Why do we have stack overflows?

Stefan O'Rear stefanor at cox.net
Thu May 3 21:21:19 EDT 2007


On Thu, May 03, 2007 at 05:36:45PM -0700, Brandon Michael Moore wrote:
> On Thu, May 03, 2007 at 04:59:58PM -0700, John Meacham wrote:
> > I believe it is because a stack cannot be garbage collected, and must be
> > traversed as roots for every garbage collection. I don't think there are
> > any issues with a huge stack per se, but it does not play nice with
> > garbage collection so may hurt your performance and memory usage in
> > unforeseen ways.
> 
> Isn't it just the top of the stack that has to be treadted as a root?
> (maybe you need to walk the stack to find exception handlers and so on.)
> Maybe it shouldn't be so much worse than a heap. The Chicken Scheme
> system allocates everything on the C stack, and runs some sort of
> compacting collector when it is about to fill.

GHC uses a simple exponential-backoff algorithm for handling stacks.
When the stack pointer passes the stack limit, the thread yields to
the scheduler, where the stack size is doubled, and the old stack is
moved.  Perhaps instead we could modify the algorithm such that up to
16K stack size the behaivor is the same, but use linked lists for
larger? 

1. Allocate a new chunk, of size 16KB.

2. Copy the topmost 1KB of stack to the new block.  This decreases
   storage efficiency slightly (but not much in time - memcpy is
   several times faster than anything the current ghc code generator
   can generate), but it avoids a nasty corner case (stack size
   fluctuating across 0 mod 16K) by acting as a form of hysteresis.

3. Create a special frame at the bottom of the new stack chunk that
   returns into a stack underflow handler, thus avoiding the need for
   yet another conditional. 

Yes, 16KB unrolled linked lists are virtually as fast as flat byte
arrays; see Data.ByteString.Lazy if you want proof. 

The only hard part appears to be step 3, as it requires finding an
appropriate place to insert the trampoline frame; it seems plausible
that stack frames are sufficiently self describing to make this a
simple exercise of loops, but it could be much much harder. 

With this change GHC stacks could fill the whole heap with little more
performance degradation than ordinary objects already give. 

Stefan


More information about the Glasgow-haskell-users mailing list