Pure File Reading (was: Dealing with configuration data)

Hal Daume III hdaume@ISI.EDU
Fri, 11 Oct 2002 08:03:46 -0700 (PDT)


This message seems to have been lost and I'd like to try to breathe some
life into it.

First, a question: could such "readFilePure" functions be implemented on
TOP of the current IO module (perhaps in IO.Pure or something).  Of
course, one could do something like:
  readFileOnce :: FilePath -> Maybe String
  readFileOnce = unsafePerformIO ....
  {-# NOINLINE readFilePure #-}
but this is the sort of thing we're trying to get away from anyway.  There
doesn't (to me, at least) seem to be an obvious way to do this.  It seems
to be the sort of thing that requires compiler support.  In this case, do
any of the compiler implementers have a heart to tackle such a thing?  On
the other hand, if there's a way to do it on top of what already exists, I
would be more than happy to implement it if someone were to point me in
the right direction...

> The point is that the use of this function will typically
> happen at the beginning of a program, when reading the
> configuration file(s). When all this has happened, the
> function readFileOnce, and its memo table, will be garabage
> collected.

I like this, and it works for configuration files, but I have another
problem I would like to solve with this whole ...Once business which does
not fit into this model.

I have a large database-like-file which essentially contains an index at
the beginning.  When you want to look up something, you binary search for
the term in the index, find the position of the entity you want, seek to
that location and then read a specified amount.

The way I have this currently set up is that everything in my program is
embedded in the IO monad because

  1) the database is huge and i cannot store it all in memory
  2) usually only about 100 out of 250000 entries are queried per run,
     but which entries these are change from run to run

Unfortunately, this means all my functions are monadic.  However, there's
no reason for them to be (in a sense): they are perfectly pure.  In fact,
I don't even have write access to the database :), but no one would ever
change it anyway.

So while I like the 'readFileOnce' and variants, I think that if someone
is serious about this '...Once' stuff, we should have more or less the
entire reading portion of the IO library in pure format for cases like
this.

Thoughts?

 - Hal