GHC Weekly News - 2015/08/06

bgamari - 2015-08-25

Hello *,

Here is a rather belated Weekly News which I found sitting nearly done on my work-queue. I hope this will make for a good read despite its age. The next edition of the Weekly News will be posted soon.

Warnings for missed specialization opportunities

Simon Peyton Jones recently [a4261549afaee56b00fbea1b4bc1a07c95e60929 introduced] a warning in master to alert users when the compiler was unable to specialize an imported binding despite it being marked as INLINEABLE. This change was motivated by #10720, where the reporter observed poor runtime performance despite taking care to ensure his binding could be inlined. Up until now, ensuring that the compiler’s optimizations meet the user’s expectation would require a careful look at the produced Core. With this change the user is notified of exactly where the compiler had to stop specializing, along with a helpful hint on where to add a INLINABLE pragma.

Ticky-Ticky profiling

Recently I have been looking into breathing life back into GHC’s ticky-ticky profiling mechanism. When enabled, ticky-ticky maintains low-level counters of various runtime-system events. These include closure entries, updates, and allocations. While ticky doesn’t provide nearly the detail that the cost-center profiler allows, it is invisible to the Core-to-Core optimization passes and has minimal runtime overhead (manifested as a bit more memory traffic due to counter updates). For this reason, the ticky-ticky profiler can be a useful tool for those working on the Core simplifier.

Sadly, ticky-ticky has fallen into quite a state of disrepair in recent years as the runtime system and native code generator have evolved. As the beginning of an effort to resuscitate the ticky-ticky profiler I’ve started putting together a list of the counters currently implemented and whether they can be expected to do something useful. Evaluating the functionality of these counters is non-trivial, however, so this will be an on-going effort.

One of our goals is to eventually do a systematic comparison of the heap allocation numbers produced by the ticky-ticky profiler, the cost-center profiler, and ticky-ticky. While this will help validate some of the more coarse-grained counters exposed by ticky, most of them will need a more thorough read-through of the runtime system to verify.

integer-gmp Performance

Since the 7.10.2 release much of my effort has been devoted to characterizing the performance of various benchmarks over various GHC versions. This is part of an effort to find places where we have regressed in the past few versions. One product of this effort is a complete comparison of results from our nofib benchmark suite ranging from 7.4.2 to 7.10.1.

The good news is there are essentially no disastrous regressions. Moreover, on the mean runtimes are over 10% faster than they were in 7.4.2. There are, however, a few cases which have regressed. The runtime of the integer test, for instance, has increased by 7%. Looking at the trend across versions, it becomes apparent that the regression began with 7.10.1.

One of the improvements that was introduced with 7.10 was a rewrite of the integer-gmp library, which this benchmark tests heavily. To isolate this potential cause, I recompiled GHC 7.10.1 with the old integer-gmp-0.5. Comparing 7.10.1 with the two integer-gmp versions reveals a 4% increase in allocations.

While we can’t necessarily attribute all of the runtime increase to these allocations, they are something that should be addressed if possible. Herbert Valerio Riedel, the author of the integer-gmp rewrite, believes that the cause may be due to the tendency for the rewrite to initially allocate a conservatively-sized backing ByteArray# for results. This leads to increased allocations due to the reallocations that are later required to accommodate larger-than-expected results.

While being more liberal in the initial allocation sizes would solve the reallocation issue, this approach may substantially increase working-set sizes and heap fragmentation for integer-heavy workloads. For this reason, Herbert will be looking into exploiting a feature of our heap allocator. Heap allocations in GHC occur by bumping a pointer into an allocation block. Not only is this a very efficient means of allocating, it potentially allows one to efficiently grow an existing allocation. In this case, if we allocate a buffer and soon after realize that our request was too small we can simply bump the heap pointer by the size deficit, so long as no other allocations have occurred since our initial allocation. We can do this since we know that the memory after the heap pointer is available; we merely need to ensure that the current block we are allocating into is large enough.

Simon Marlow and Herbert will be investigating this possibility in the coming weeks.

== D924: mapM_ and traverse_

As discussed in the most recent Weekly News, one issue on our plate at the moment is Phab:D924, which attempted to patch up two remaining facets of the Applicative-Monad Proposal,

  1. Remove the override of mapM for the [] Traversal instance
  2. Rewrite mapM_ in terms of traverse_

While (1) seems like an obvious cleanup, (2) is a bit tricky. As noted last time, traverse_ appears to give rise to non-linear behavior in this context.

akio has contributed an insightful analysis shedding light on the cause of this behavior. Given that the quadratic behavior is intrinsic to the Applicative formulation, we’ll be sending this matter back to the Core Libraries Committee to inform their future design decisions.

That is all for this week!

Cheers,

Ben