The Foreign Function Interface (FFI) has two purposes: it enables (1) to describe in Haskell the interface to foreign language functionality and (2) to use from foreign code Haskell routines. More generally, its aim is to support the implementation of programs in a mixture of Haskell and other languages such that the source code is portable across different implementations of Haskell and non-Haskell systems as well as independent of the architecture and operating system.
The Haskell FFI currently only specifies the interaction between Haskell code and foreign code that follows the C calling convention. However, the design of the FFI is such that it enables the modular extension of the present definition to include the calling conventions of other programming languages, such as C++ and Java. A precise definition of the support for those languages is expected to be included in later versions of the language. The second major omission is the definition of the interaction with multithreading in the foreign language and, in particular, the treatment of thread-local state, and so these details are currently implementation-defined.
The core of the present specification is independent of the foreign language that is used in conjunction with Haskell. However, there are two areas where FFI specifications must become language specific: (1) the specification of external names and (2) the marshalling of the basic types of a foreign language. As an example of the former, consider that in C [9] a simple identifier is sufficient to identify an object, while Java [5], in general, requires a qualified name in conjunction with argument and result types to resolve possible overloading. Regarding the second point, consider that many languages do not specify the exact representation of some basic types. For example the type int in C may be 16, 32, or 64 bits wide. Similarly, Haskell guarantees only that Int covers at least the range [−229,229 − 1] (Section 6.4). As a consequence, to reliably represent values of C’s int in Haskell, we have to introduce a new type CInt, which is guaranteed to match the representation of int.
The specification of external names, dependent on a calling convention, is described in Section 8.5, whereas the marshalling of the basic types in dependence on a foreign language is described in Section 8.6.
For a given Haskell system, we define the Haskell context to be the execution context of the abstract machine on which the Haskell system is based. This includes the heap, stacks, and the registers of the abstract machine and their mapping onto a concrete architecture. We call any other execution context an external context. Generally, we cannot assume any compatibility between the data formats and calling conventions between the Haskell context and a given external context, except where Haskell explicitly prescribes a specific data format.
The principal goal of a foreign function interface is to provide a programmable interface between the Haskell context and external contexts. As a result Haskell threads can access data in external contexts and invoke functions that are executed in an external context as well as vice versa. In the rest of this definition, external contexts are usually identified by a calling convention.
Given that many external languages support static types, the question arises whether the consistency of Haskell types with the types of the external language can be enforced for foreign functions. Unfortunately, this is, in general, not possible without a significant investment on the part of the implementor of the Haskell system (i.e., without implementing a dedicated type checker). For example, in the case of the C calling convention, the only other approach would be to generate a C prototype from the Haskell type and leave it to the C compiler to match this prototype with the prototype that is specified in a C header file for the imported function. However, the Haskell type is lacking some information that would be required to pursue this route. In particular, the Haskell type does not contain any information as to when const modifiers have to be emitted.
As a consequence, this definition does not require the Haskell system to check consistency with foreign types. Nevertheless, Haskell systems are encouraged to provide any cross language consistency checks that can be implemented with reasonable effort.
The FFI reserves a single keyword foreign, and a set of special identifiers. The latter have a special meaning only within foreign declarations, but may be used as ordinary identifiers elsewhere.
The special identifiers ccall, cplusplus, dotnet, jvm, and stdcall are defined to denote calling conventions. However, a concrete implementation of the FFI is free to support additional, system-specific calling conventions whose name is not explicitly listed here.
To refer to objects of an external C context, we introduce the following phrases:
chname | → | {chchar} . h | (C header filename) |
cid | → | letter {letter | ascDigit} | (C identifier) |
chchar | → | letter | ascSymbol⟨&⟩ | |
letter | → | ascSmall | ascLarge | _ |
The range of lexemes that are admissible for chname is a subset of those permitted as arguments to the #include directive in C. In particular, a file name chname must end in the suffix .h. The lexemes produced by cid coincide with those allowed as C identifiers, as specified in [9].
The syntax of foreign declarations is as follows:
topdecl | → | foreign fdecl | |
fdecl | → | import callconv [safety] impent var :: ftype | (define variable) |
| | export callconv expent var :: ftype | (expose variable) | |
callconv | → | ccall | stdcall | cplusplus | (calling convention) |
| | jvm | dotnet | ||
| | system-specific calling conventions | ||
impent | → | [string] | |
expent | → | [string] | |
safety | → | unsafe | safe |
There are two flavours of foreign declarations: import and export declarations. An import declaration makes an external entity, i.e., a function or memory location defined in an external context, available in the Haskell context. Conversely, an export declaration defines a function of the Haskell context as an external entity in an external context. Consequently, the two types of declarations differ in that an import declaration defines a new variable, whereas an export declaration uses a variable that is already defined in the Haskell module.
The external context that contains the external entity is determined by the calling convention given in the foreign declaration. Consequently, the exact form of the specification of the external entity is dependent on both the calling convention and on whether it appears in an import declaration (as impent) or in an export declaration (as expent). To provide syntactic uniformity in the presence of different calling conventions, it is guaranteed that the description of an external entity lexically appears as a Haskell string lexeme. The only exception is where this string would be the empty string (i.e., be of the form ""); in this case, the string may be omitted in its entirety.
The binary interface to an external entity on a given architecture is determined by a calling convention. It often depends on the programming language in which the external entity is implemented, but usually is more dependent on the system for which the external entity has been compiled.
As an example of how the calling convention is dominated by the system rather than the programming language, consider that an entity compiled to byte code for the Java Virtual Machine (JVM) [11] needs to be invoked by the rules of the JVM rather than that of the source language in which it is implemented (the entity might be implemented in Oberon, for example).
Any implementation of the Haskell FFI must at least implement the C calling convention denoted by ccall. All other calling conventions are optional. Generally, the set of calling conventions is open, i.e., individual implementations may elect to support additional calling conventions. In addition to ccall, Table 8.1 specifies a range of identifiers for common calling conventions.
Identifier | Represented calling convention |
ccall | Calling convention of the standard C compiler on a system |
cplusplus | Calling convention of the standard C++ compiler on a system |
dotnet | Calling convention of the .net platform |
jvm | Calling convention of the Java Virtual Machine |
stdcall | Calling convention of the Win32 API (matches Pascal conventions) |
Implementations need not implement all of these conventions, but if any is implemented, it must use the listed name. For any other calling convention, implementations are free to choose a suitable name.
Only the semantics of the calling conventions ccall and stdcall are defined herein; more calling conventions may be added in future versions of Haskell.
It should be noted that the code generated by a Haskell system to implement a particular calling convention may vary widely with the target code of that system. For example, the calling convention jvm will be trivial to implement for a Haskell compiler generating Java code, whereas for a Haskell compiler generating C code, the Java Native Interface (JNI) [10] has to be targeted.
The following types constitute the set of basic foreign types:
A Haskell system that implements the FFI needs to be able to pass these types between the Haskell and the external context as function arguments and results.
Foreign types are produced according to the following grammar:
ftype | → | frtype | |
| | fatype → ftype | ||
frtype | → | fatype | |
| | () | ||
fatype | → | qtycon atype1 … atypek | (k ≥ 0) |
A foreign type is the Haskell type of an external entity. Only a subset of Haskell’s types are permissible as foreign types, as only a restricted set of types can be canonically transferred between the Haskell context and an external context. A foreign type has the form at1 -> -> atn -> rt where n ≥ 0. It implies that the arity of the external entity is n.
External functions are strict in all arguments.
Marshallable foreign types. The argument types ati produced by fatype must be marshallable foreign types; that is, either
newtype T a1 … an = N t
and
Consequently, in order for a type defined by newtype to be used in a foreign declaration outside of the module that defines it, the type must not be exported abstractly. The module Foreign.C.Types that defines the Haskell equivalents for C types follows this convention; see Chapter 28.
Marshallable foreign result types. The result type rt produced by frtype must be a marshallable foreign result type; that is, either
newtype T a1 … an = N t
and
Generally, an import declaration has the form foreign import c e v :: t which declares the variable v of type t to be defined externally. Moreover, it specifies that v is evaluated by executing the external entity identified by the string e using calling convention c. The precise form of e depends on the calling convention and is detailed in Section 8.5. If a variable v is defined by an import declaration, no other top-level declaration for v is allowed in the same module. For example, the declaration
introduces the function cstrlen, which invokes the external function strlen using the standard C calling convention. Some external entities can be imported as pure functions; for example,
Such a declaration asserts that the external entity is a true function; i.e., when applied to the same argument values, it always produces the same result.
Whether a particular form of external entity places a constraint on the Haskell type with which it can be imported is defined in Section 8.5. Although, some forms of external entities restrict the set of Haskell types that are permissible, the system can generally not guarantee the consistency between the Haskell type given in an import declaration and the argument and result types of the external entity. It is the responsibility of the programmer to ensure this consistency.
Optionally, an import declaration can specify, after the calling convention, the safety level that should be used when invoking an external entity. A safe call is less efficient, but guarantees to leave the Haskell system in a state that allows callbacks from the external code. In contrast, an unsafe call, while carrying less overhead, must not trigger a callback into the Haskell system. If it does, the system behaviour is undefined. The default for an invocation is to be safe. Note that a callback into the Haskell system implies that a garbage collection might be triggered after an external entity was called, but before this call returns. Consequently, objects other than stable pointers (cf. Section 36) may be moved or garbage collected by the storage manager.
The general form of export declarations is foreign export c e v :: t Such a declaration enables external access to v, which may be a value, field name, or class method that is declared on the top-level of the same module or imported. Moreover, the Haskell system defines the external entity described by the string e, which may be used by external code using the calling convention c; an external invocation of the external entity e is translated into evaluation of v. The type t must be an instance of the type of v. For example, we may have
If an evaluation triggered by an external invocation of an exported Haskell value returns with an exception, the system behaviour is undefined. Thus, Haskell exceptions have to be caught within Haskell and explicitly marshalled to the foreign code.
Each foreign declaration has to specify the external entity that is accessed or provided by that declaration. The syntax and semantics of the notation that is required to uniquely determine an external entity depends heavily on the calling convention by which this entity is accessed. For example, for the calling convention ccall, a global label is sufficient. However, to uniquely identify a method in the calling convention jvm, type information has to be provided. For the latter, there is a choice between the Java source-level syntax of types and the syntax expected by JNI—but, clearly, the syntax of the specification of an external entity depends on the calling convention and may be non-trivial.
Consequently, the FFI does not fix a general syntax for denoting external entities, but requires both impent and expent to take the form of a Haskell string literal. The formation rules for the values of these strings depend on the calling convention and a Haskell system implementing a particular calling convention will have to parse these strings in accordance with the calling convention.
Defining impent and expent to take the form of a string implies that all information that is needed to statically analyse the Haskell program is separated from the information needed to generate the code interacting with the foreign language. This is, in particular, helpful for tools processing Haskell source code. When ignoring the entity information provided by impent or expent, foreign import and export declarations are still sufficient to infer identifier definition and use information as well as type information.
For more complex calling conventions, there is a choice between the user-level syntax for identifying entities (e.g., Java or C++) and the system-level syntax (e.g., the type syntax of JNI or mangled C++, respectively). If such a choice exists, the user-level syntax is preferred. Not only because it is more user friendly, but also because the system-level syntax may not be entirely independent of the particular implementation of the foreign language.
The following defines the syntax for specifying external entities and their semantics for the calling conventions ccall and stdcall. Other calling conventions from Table 8.1 are expected to be defined in future versions of Haskell.
The following defines the structure of external entities for foreign declarations under the ccall calling convention for both import and export declarations separately. Afterwards additional constraints on the type of foreign functions are defined.
The FFI covers only access to C functions and global variables. There are no mechanisms to access other entities of C programs. In particular, there is no support for accessing pre-processor symbols from Haskell, which includes #defined constants. Access from Haskell to such entities is the domain of language-specific tools, which provide added convenience over the plain FFI as defined here.
Import Declarations For import declarations, the syntax for the specification of external entities under the ccall calling convention is as follows:
impent | → | " [static] [chname] [&] [cid] " | (static function or address) |
| | " dynamic " | (stub factory importing addresses) | |
| | " wrapper " | (stub factory exporting thunks) |
The first alternative either imports a static function cid or, if & precedes the identifier, a static address. If cid is omitted, it defaults to the name of the imported Haskell variable. The optional filename chname specifies a C header file, where the intended meaning is that the header file declares the C entity identified by cid. In particular, when the Haskell system compiles Haskell to C code, the directive
#include "chname"
needs to be placed into any generated C file that refers to the foreign entity before the first occurrence of that entity in the generated C file.
The second and third alternative, identified by the keywords dynamic and wrapper, respectively, import stub functions that have to be generated by the Haskell system. In the case of dynamic, the stub converts C function pointers into Haskell functions; and conversely, in the case of wrapper, the stub converts Haskell thunks to C function pointers. If neither of the specifiers static, dynamic, or wrapper is given, static is assumed. The specifier static is nevertheless needed to import C routines that are named dynamic or wrapper.
It should be noted that a static foreign declaration that does not import an address (i.e., where & is not used in the specification of the external entity) always refers to a C function, even if the Haskell type is non-functional. For example,
refers to a pure C function foo with no arguments that returns an integer value. Similarly, if the type is IO CInt, the declaration refers to an impure nullary function. If a Haskell program needs to access a C variable bar of integer type,
must be used to obtain a pointer referring to the variable. The variable can be read and updated using the routines provided by the module Foreign.Storable (cf. Section 37).
Export Declarations External entities in ccall export declarations are of the form
expent | → | " [cid] " |
The optional C identifier cid defines the external name by which the exported Haskell variable is accessible in C. If it is omitted, the external name defaults to the name of the exported Haskell variable.
Constraints on Foreign Function Types In the case of import declaration, there are, depending on the kind of import declaration, constraints regarding the admissible Haskell type that the variable defined in the import may have. These constraints are specified in the following.
As an example, consider
This declaration imports the system() function whose prototype is available from stdlib.h.
As an example, consider
It imports the address of the variable errno, which is of the C type int.
As an example, consider
The stub factory mkFun converts any pointer to a C function that gets an integer value as its only argument and does not have a return value into a corresponding Haskell function.
As an example, consider
The stub factory mkCallback turns any Haskell computation of type IO () into a C function pointer that can be passed to C routines, which can call back into the Haskell context by invoking the referenced function.
Specification of Header Files A C header specified in an import declaration is always included by #include "chname". There is no explicit support for #include <chname> style inclusion. The ISO C99 [7] standard guarantees that any search path that would be used for a #include <chname> is also used for #include "chname" and it is guaranteed that these paths are searched after all paths that are unique to #include "chname". Furthermore, we require that chname ends in .h to make parsing of the specification of external entities unambiguous.
The specification of include files has been kept to a minimum on purpose. Libraries often require a multitude of include directives, some of which may be system-dependent. Any design that attempts to cover all possible configurations would introduce significant complexity. Moreover, in the current design, a custom include file can be specified that uses the standard C preprocessor features to include all relevant headers.
Header files have no impact on the semantics of a foreign call, and whether an implementation uses the header file or not is implementation-defined. However, as some implementations may require a header file that supplies a correct prototype for external functions in order to generate correct code, portable FFI code must include suitable header files.
C Argument Promotion The argument passing conventions of C are dependent on whether a function prototype for the called functions is in scope at a call site. In particular, if no function prototype is in scope, default argument promotion is applied to integral and floating types. In general, it cannot be expected from a Haskell system that it is aware of whether a given C function was compiled with or without a function prototype being in scope. For the sake of portability, we thus require that a Haskell system generally implements calls to C functions as well as C stubs for Haskell functions as if a function prototype for the called function is in scope.
This convention implies that the onus for ensuring the match between C and Haskell code is placed on the FFI user. In particular, when a C function that was compiled without a prototype is called from Haskell, the Haskell signature at the corresponding foreign import declaration must use the types after argument promotion. For example, consider the following C function definition, which lacks a prototype:
The lack of a prototype implies that a C compiler will apply default argument promotion to the parameter a, and thus, foo will expect to receive a value of type double, not float. Hence, the correct foreign import declaration is
In contrast, a C function compiled with the prototype
requires
A similar situation arises in the case of foreign export declarations that use types that would be altered under the C default argument promotion rules. When calling such Haskell functions from C, a function prototype matching the signature provided in the foreign export declaration must be in scope; otherwise, the C compiler will erroneously apply the promotion rules to all function arguments.
Note that for a C function defined to accept a variable number of arguments, all arguments beyond the explicitly typed arguments suffer argument promotion. However, because C permits the calling convention to be different for such functions, a Haskell system will, in general, not be able to make use of variable argument functions. Hence, their use is deprecated in portable code.
The specification of external entities under the stdcall calling convention is identical to that for standard C calls. The two calling conventions only differ in the generated code.
In addition to the language extension discussed in previous sections, the FFI includes a set of standard libraries, which ease portable use of foreign functions as well as marshalling of compound structures. Generally, the marshalling of Haskell structures into a foreign representation and vice versa can be implemented in either Haskell or the foreign language. At least where the foreign language is at a significantly lower level, e.g. C, there are good reasons for doing the marshalling in Haskell:
Consequently, the Haskell FFI emphasises Haskell-side marshalling.
The interface to the marshalling libraries is provided by the module Foreign (Chapter 24) plus a language-dependent module per supported language. In particular, the standard requires the availability of the module Foreign.C (Chapter 25), which simplifies portable interfacing with external C code. Language-dependent modules, such as Foreign.C, generally provide Haskell types representing the basic types of the foreign language using a representation that is compatible with the foreign types as implemented by the default implementation of the foreign language on the present architecture. This is especially important for languages where the standard leaves some aspects of the implementation of basic types open. For example, in C, the size of the various integral types is not fixed. Thus, to represent C interfaces faithfully in Haskell, for each integral type in C, we need to have an integral type in Haskell that is guaranteed to have the same size as the corresponding C type.
C symbol | Haskell symbol | Constraint on concrete C type |
HsChar | Char | integral type |
HsInt | Int | signed integral type, ≥ 30 bit |
HsInt8 | Int8 | signed integral type, 8 bit; int8_t if available |
HsInt16 | Int16 | signed integral type, 16 bit; int16_t if available |
HsInt32 | Int32 | signed integral type, 32 bit; int32_t if available |
HsInt64 | Int64 | signed integral type, 64 bit; int64_t if available |
HsWord8 | Word8 | unsigned integral type, 8 bit; uint8_t if available |
HsWord16 | Word16 | unsigned integral type, 16 bit; uint16_t if available |
HsWord32 | Word32 | unsigned integral type, 32 bit; uint32_t if available |
HsWord64 | Word64 | unsigned integral type, 64 bit; uint64_t if available |
HsFloat | Float | floating point type |
HsDouble | Double | floating point type |
HsBool | Bool | int |
HsPtr | Ptr a | (void ⋆) |
HsFunPtr | FunPtr a | (void (⋆)(void)) |
HsStablePtr | StablePtr a | (void ⋆) |
CPP symbol | Haskell value | Description |
HS_CHAR_MIN | minBound :: Char |
|
HS_CHAR_MAX | maxBound :: Char |
|
HS_INT_MIN | minBound :: Int |
|
HS_INT_MAX | maxBound :: Int |
|
HS_INT8_MIN | minBound :: Int8 |
|
HS_INT8_MAX | maxBound :: Int8 |
|
HS_INT16_MIN | minBound :: Int16 |
|
HS_INT16_MAX | maxBound :: Int16 |
|
HS_INT32_MIN | minBound :: Int32 |
|
HS_INT32_MAX | maxBound :: Int32 |
|
HS_INT64_MIN | minBound :: Int64 |
|
HS_INT64_MAX | maxBound :: Int64 |
|
HS_WORD8_MAX | maxBound :: Word8 |
|
HS_WORD16_MAX | maxBound :: Word16 |
|
HS_WORD32_MAX | maxBound :: Word32 |
|
HS_WORD64_MAX | maxBound :: Word64 |
|
HS_FLOAT_RADIX | floatRadix :: Float |
|
HS_FLOAT_ROUND | n/a | rounding style as per [7] |
HS_FLOAT_EPSILON | n/a | difference between 1 and the least value greater than 1 as per [7] |
HS_DOUBLE_EPSILON | n/a | (as above) |
HS_FLOAT_DIG | n/a | number of decimal digits as per [7] |
HS_DOUBLE_DIG | n/a | (as above) |
HS_FLOAT_MANT_DIG | floatDigits :: Float |
|
HS_DOUBLE_MANT_DIG | floatDigits :: Double |
|
HS_FLOAT_MIN | n/a | minimum floating point number as per [7] |
HS_DOUBLE_MIN | n/a | (as above) |
HS_FLOAT_MIN_EXP | fst . floatRange :: Float |
|
HS_DOUBLE_MIN_EXP | fst . floatRange :: Double |
|
HS_FLOAT_MIN_10_EXP | n/a | minimum decimal exponent as per [7] |
HS_DOUBLE_MIN_10_EXP | n/a | (as above) |
HS_FLOAT_MAX | n/a | maximum floating point number as per [7] |
HS_DOUBLE_MAX | n/a | (as above) |
HS_FLOAT_MAX_EXP | snd . floatRange :: Float |
|
HS_DOUBLE_MAX_EXP | snd . floatRange :: Double |
|
HS_FLOAT_MAX_10_EXP | n/a | maximum decimal exponent as per [7] |
HS_DOUBLE_MAX_10_EXP | n/a | (as above) |
HS_BOOL_FALSE | False |
|
HS_BOOL_TRUE | True |
|
Every Haskell system that implements the FFI needs to provide a C header file named HsFFI.h that defines the C symbols listed in Tables 8.2 and 8.3. Table 8.2 table lists symbols that represent types together with the Haskell type that they represent and any constraints that are placed on the concrete C types that implement these symbols. When a C type HsT represents a Haskell type T, the occurrence of T in a foreign function declaration should be matched by HsT in the corresponding C function prototype. Indeed, where the Haskell system translates Haskell to C code that invokes foreignimported C routines, such prototypes need to be provided and included via the header that can be specified in external entity strings for foreign C functions (cf. Section 8.5.1); otherwise, the system behaviour is undefined. It is guaranteed that the Haskell value nullPtr is mapped to (HsPtr) NULL in C and nullFunPtr is mapped to (HsFunPtr) NULL and vice versa.
Table 8.3 contains symbols characterising the range and precision of the types from Table 8.2. Where available, the table states the corresponding Haskell values. All C symbols, with the exception of HS_FLOAT_ROUND are constants that are suitable for use in #if preprocessing directives. Note that there is only one rounding style (HS_FLOAT_ROUND) and one radix (HS_FLOAT_RADIX), as this is all that is supported by ISO C [7].
Moreover, an implementation that does not support 64 bit integral types on the C side should implement HsInt64 and HsWord64 as a structure. In this case, the bounds HS_INT64_MIN, HS_INT64_MAX, and HS_WORD64_MAX are undefined.
In addition, to the symbols from Table 8.2 and 8.3, the header HsFFI.h must also contain the following prototypes:
These routines are useful for mixed language programs, where the main application is implemented in a foreign language that accesses routines implemented in Haskell. The function hs_init() initialises the Haskell system and provides it with the available command line arguments. Upon return, the arguments solely intended for the Haskell runtime system are removed (i.e., the values that argc and argv point to may have changed). This function must be called during program startup before any Haskell function is invoked; otherwise, the system behaviour is undefined. Conversely, the Haskell system is deinitialised by a call to hs_exit(). Multiple invocations of hs_init() are permitted, provided that they are followed by an equal number of calls to hs_exit() and that the first call to hs_exit() is after the last call to hs_init(). In addition to nested calls to hs_init(), the Haskell system may be de-initialised with hs_exit() and be re-initialised with hs_init() at a later point in time. This ensures that repeated initialisation due to multiple libraries being implemented in Haskell is covered.
The Haskell system will ignore the command line arguments passed to the second and any following calls to hs_init(). Moreover, hs_init() may be called with NULL for both argc and argv, signalling the absence of command line arguments.
The function hs_set_argv() sets the values returned by the functions getProgName and getArgs of the module System.Environment (Section 39). This function may only be invoked after hs_init(). Moreover, if hs_set_argv() is called at all, this call must precede the first invocation of getProgName and getArgs. Note that the separation of hs_init() and hs_set_argv() is essential in cases where in addition to the Haskell system other libraries that process command line arguments during initialisation are used.
The function hs_perform_gc() advises the Haskell storage manager to perform a garbage collection, where the storage manager makes an effort to releases all unreachable objects. This function must not be invoked from C functions that are imported unsafe into Haskell code nor may it be used from a finalizer.
Finally, hs_free_stable_ptr() and hs_free_fun_ptr() are the C counterparts of the Haskell functions freeStablePtr and freeHaskellFunPtr.