Converting things to and from binary

George Russell ger@tzi.de
Mon, 19 May 2003 19:55:11 +0200


Sorry, I know there was an ongoing discussion on this topic somewhere, but I
can't find it, so I'll have to hope this list is the most appropriate.

The general problem to solve is "How to we convert things to and from a
binary format, so they can be efficiently written to and from disk".  I have
solved this (incompletely) twice myself, I expect Glasgow Haskell has solved
it too, there really ought to be a standard solution.

 From the deficiencies of my incomplete solutions I conclude:
(1) you don't want to force everything to go via single Haskell
characters.  This is horrendously inefficient when what you want is
to blast in and out large quantities of binary data, and of course
that's precisely where you probably want efficiency.
(2) you don't want a binary converter only to be able to write to
Handles.  I've found myself that it's useful to be able to
convert to (for example) vectors of bytes.

Now my idea is that this be implemented using monads.  For example,
consider the problem of writing out data to be converted into
binary.  We might consider there to be two primitive operations,
based on the types Byte (a single Byte) and Bytes (an array of bytes).
(I'm not sure exactly what these types should actually be in GHC.)

The consumer of binary data should provide a record like

data WriteBinaryData m = WriteBinaryData {
    writeByte :: Byte -> m (),
    writeBytes :: Bytes -> m ()
    }

Then something which can be written out in a binary form would instance
the class

class HasBinaryWrite a where
    writeA :: Monad m => WriteBinaryData m -> a -> m ()

The advantage of this approach is that an instance of HasBinary can both
be written to a file (m = IO) and to a byte vector.  Also this should
be fairly efficient I think, and it should be easy to build up more
complex instances of binary.

The converse functions would likewise use monads

data ReadBinaryData m = ReadBinaryData {
    readByte :: m Byte,
    readBytes :: Int -> m Bytes
    }

class HasBinaryRead a where
    readA :: Monad m => ReadBinaryData m -> m a

and again we can instance this either for files or for byte vectors, and
again building up more complex instances should be easy enough.

I wonder if this is the best way of doing this kind of thing, and if
so should it be implemented and put in the standard libraries?
Also, what should Byte and Bytes actually be?