Rewriting Haskell Strings

A rewrite system for fusing operations over ByteStrings

or... How to make ByteStrings even faster!

Duncan Coutts, Don Stewart and Roman Leshchinskiy

Programming Tools Group, Oxford University Computing Laboratory
Computer Science & Engineering, University of New South Wales

Data.ByteString

A quick recap:

Representation

[Char]

ByteString

Data.ByteString.Lazy

Like Data.ByteString but lazy (and therefore cooler!)

Fusion

(also known as deforestation)

The purpose

Fusion

Existing fusion systems

build/fold fusion

destroy/unfoldr fusion

Functional array fusion

Functional array fusion

Limitations

We want something more flexible

Unfoldr again

Consider unfoldr again:

unfoldr :: (s -> Maybe (a, s)) -> s -> [a]

Take (b -> Maybe (a,b)) and b as the representation of an abstract sequence:

data Stream a = forall s. Stream (s -> Step s a) s data Step s a = Done | Yield a s | Skip s

Streams

Operations on Streams

We can convert many common list functions to work on streams

mapS :: (a -> b) -> Stream a -> Stream b mapS f (Stream next s n) = Stream next' s n where next' s = case next s of Done -> Done Yield x s' -> Yield (f x) s' Skip s' -> Skip s'

Which then can be lifted to work on arrays:

map :: (a -> b) -> Array a -> Array b map f = write . mapS f . read

Fusion on Streams

Eliminate intermediate conversions to arrays with the rule

read . write = id

For example:

filter p . map f = write . filterS p . read . write . mapS f . read = write . filterS p . mapS f . read

Fusion on Streams

Producers and consumers are also fusable

foldl' k z = foldlS' k z . read replicate x n = write . replicateS x n

For example:

foldl' k z . filter p . map f = foldlS' k z . read . write . filterS p . read . write . mapS f . read = foldlS' k z . filterS p . mapS f . read

Example benchmark: String

Haskell String version:

main = print . foldl' hash 5381 . map toLower . filter isAlpha =<< readFile f where hash h c = h * 33 + ord c

Example benchmark: ByteString

Lazy ByteString version:

import Data.ByteString.Lazy.Char8 as B main = print . B.foldl' hash 5381 . B.map toLower . B.filter isAlpha =<< B.readFile f where hash h c = h * 33 + ord c

Example benchmark: Naive C

#include <stdio.h> void main () { long h = 5381; int c; while ((c = getchar()) != EOF) if (isalpha(c)) h = h * 33 + tolower(c); printf("%li\n", h); }

Example benchmark: Fast C

#include <stdio.h> void main() { long h = 5381; char buf[4096]; int i, count; while (count = fread(&buf, 1, 4096, stdin)) for (i = 0; i < count; i++) if (isalpha(buf[i])) h = h * 33 + tolower (buf[i]); printf("%li\n", h); }

Example benchmark: results

Comparison with Strings and C

Traversal direction

Folds

Arrays give a new perspective on folds

Traversal direction again

New formulation

Bidirectional traversals

Bidirectional fusion

Future directions

Practical applications

Fusion improvements