Chapter 16
Data.Char

module Data.Char (  
    Char,  String,  isControl,  isSpace,  isLower,  isUpper,  isAlpha,  
    isAlphaNum,  isPrint,  isDigit,  isOctDigit,  isHexDigit,  isLetter,  
    isMark,  isNumber,  isPunctuation,  isSymbol,  isSeparator,  isAscii,  
    isLatin1,  isAsciiUpper,  isAsciiLower,  
    GeneralCategory(UppercaseLetter,  
                    LowercaseLetter,  
                    TitlecaseLetter,  
                    ModifierLetter,  
                    OtherLetter,  
                    NonSpacingMark,  
                    SpacingCombiningMark,  
                    EnclosingMark,  
                    DecimalNumber,  
                    LetterNumber,  
                    OtherNumber,  
                    ConnectorPunctuation,  
                    DashPunctuation,  
                    OpenPunctuation,  
                    ClosePunctuation,  
                    InitialQuote,  
                    FinalQuote,  
                    OtherPunctuation,  
                    MathSymbol,  
                    CurrencySymbol,  
                    ModifierSymbol,  
                    OtherSymbol,  
                    Space,  
                    LineSeparator,  
                    ParagraphSeparator,  
                    Control,  
                    Format,  
                    Surrogate,  
                    PrivateUse,  
                    NotAssigned),  
    generalCategory,  toUpper,  toLower,  toTitle,  digitToInt,  intToDigit,  
    ord,  chr,  showLitChar,  lexLitChar,  readLitChar  
  ) where

16.1 Characters and strings

data Char
The character type Char is an enumeration whose values represent Unicode (or equivalently ISO/IEC 10646) characters (see http://www.unicode.org/ for details). This set extends the ISO 8859-1 (Latin-1) character set (the first 256 charachers), which is itself an extension of the ASCII character set (the first 128 characters). A character literal in Haskell has type Char.

To convert a Char to or from the corresponding Int value defined by Unicode, use Prelude.toEnum and Prelude.fromEnum from the Prelude.Enum class respectively (or equivalently ord and chr).

instance Bounded Char
instance Enum Char
instance Eq Char
instance Ord Char
instance Read Char
instance Show Char
instance Ix Char
instance Storable Char

type String = [Char]
A String is a list of characters. String constants in Haskell are values of type String.

16.2 Character classification

Unicode characters are divided into letters, numbers, marks, punctuation, symbols, separators (including spaces) and others (including control characters).

isControl :: Char -> Bool
Selects control characters, which are the non-printing characters of the Latin-1 subset of Unicode.

isSpace :: Char -> Bool
Returns True for any Unicode space character, and the control characters \t, \n, \r, \f, \v.

isLower :: Char -> Bool
Selects lower-case alphabetic Unicode characters (letters).

isUpper :: Char -> Bool
Selects upper-case or title-case alphabetic Unicode characters (letters). Title case is used by a small number of letter ligatures like the single-character form of Lj.

isAlpha :: Char -> Bool
Selects alphabetic Unicode characters (lower-case, upper-case and title-case letters, plus letters of caseless scripts and modifiers letters). This function is equivalent to Data.Char.isLetter.

isAlphaNum :: Char -> Bool
Selects alphabetic or numeric digit Unicode characters.

Note that numeric digits outside the ASCII range are selected by this function but not by isDigit. Such digits may be part of identifiers but are not used by the printer and reader to represent numbers.

isPrint :: Char -> Bool
Selects printable Unicode characters (letters, numbers, marks, punctuation, symbols and spaces).

isDigit :: Char -> Bool
Selects ASCII digits, i.e. '0'..'9'.

isOctDigit :: Char -> Bool
Selects ASCII octal digits, i.e. '0'..'7'.

isHexDigit :: Char -> Bool
Selects ASCII hexadecimal digits, i.e. '0'..'9', 'a'..'f', 'A'..'F'.

isLetter :: Char -> Bool
Selects alphabetic Unicode characters (lower-case, upper-case and title-case letters, plus letters of caseless scripts and modifiers letters). This function is equivalent to Data.Char.isAlpha.

isMark :: Char -> Bool
Selects Unicode mark characters, e.g. accents and the like, which combine with preceding letters.

isNumber :: Char -> Bool
Selects Unicode numeric characters, including digits from various scripts, Roman numerals, etc.

isPunctuation :: Char -> Bool
Selects Unicode punctuation characters, including various kinds of connectors, brackets and quotes.

isSymbol :: Char -> Bool
Selects Unicode symbol characters, including mathematical and currency symbols.

isSeparator :: Char -> Bool
Selects Unicode space and separator characters.

16.2.1 Subranges

isAscii :: Char -> Bool
Selects the first 128 characters of the Unicode character set, corresponding to the ASCII character set.

isLatin1 :: Char -> Bool
Selects the first 256 characters of the Unicode character set, corresponding to the ISO 8859-1 (Latin-1) character set.

isAsciiUpper :: Char -> Bool
Selects ASCII upper-case letters, i.e. characters satisfying both isAscii and isUpper.

isAsciiLower :: Char -> Bool
Selects ASCII lower-case letters, i.e. characters satisfying both isAscii and isLower.

16.2.2 Unicode general categories

data GeneralCategory
= UppercaseLetter Lu: Letter, Uppercase
| LowercaseLetter Ll: Letter, Lowercase
| TitlecaseLetter Lt: Letter, Titlecase
| ModifierLetter Lm: Letter, Modifier
| OtherLetter Lo: Letter, Other
| NonSpacingMark Mn: Mark, Non-Spacing
| SpacingCombiningMark Mc: Mark, Spacing Combining
| EnclosingMark Me: Mark, Enclosing
| DecimalNumber Nd: Number, Decimal
| LetterNumber Nl: Number, Letter
| OtherNumber No: Number, Other
| ConnectorPunctuation Pc: Punctuation, Connector
| DashPunctuation Pd: Punctuation, Dash
| OpenPunctuation Ps: Punctuation, Open
| ClosePunctuation Pe: Punctuation, Close
| InitialQuote Pi: Punctuation, Initial quote
| FinalQuote Pf: Punctuation, Final quote
| OtherPunctuation Po: Punctuation, Other
| MathSymbol Sm: Symbol, Math
| CurrencySymbol Sc: Symbol, Currency
| ModifierSymbol Sk: Symbol, Modifier
| OtherSymbol So: Symbol, Other
| Space Zs: Separator, Space
| LineSeparator Zl: Separator, Line
| ParagraphSeparator Zp: Separator, Paragraph
| Control Cc: Other, Control
| Format Cf: Other, Format
| Surrogate Cs: Other, Surrogate
| PrivateUse Co: Other, Private Use
| NotAssigned Cn: Other, Not Assigned

Unicode General Categories (column 2 of the UnicodeData table) in the order they are listed in the Unicode standard.

instance Bounded GeneralCategory
instance Enum GeneralCategory
instance Eq GeneralCategory
instance Ord GeneralCategory
instance Read GeneralCategory
instance Show GeneralCategory
instance Ix GeneralCategory

generalCategory :: Char -> GeneralCategory
The Unicode general category of the character.

16.3 Case conversion

toUpper :: Char -> Char
Convert a letter to the corresponding upper-case letter, if any. Any other character is returned unchanged.

toLower :: Char -> Char
Convert a letter to the corresponding lower-case letter, if any. Any other character is returned unchanged.

toTitle :: Char -> Char
Convert a letter to the corresponding title-case or upper-case letter, if any. (Title case differs from upper case only for a small number of ligature letters.) Any other character is returned unchanged.

16.4 Single digit characters

digitToInt :: Char -> Int
Convert a single digit Char to the corresponding Int. This function fails unless its argument satisfies isHexDigit, but recognises both upper and lower-case hexadecimal digits (i.e. '0'..'9', 'a'..'f', 'A'..'F').

intToDigit :: Int -> Char
Convert an Int in the range 0..15 to the corresponding single digit Char. This function fails on other inputs, and generates lower-case hexadecimal digits.

16.5 Numeric representations

ord :: Char -> Int
The Prelude.fromEnum method restricted to the type Data.Char.Char.

chr :: Int -> Char
The Prelude.toEnum method restricted to the type Data.Char.Char.

16.6 String representations

showLitChar :: Char -> ShowS
Convert a character to a string using only printable characters, using Haskell source-language escape conventions. For example:

 showLitChar '\n' s  =  "\\n" ++ s

lexLitChar :: ReadS String
Read a string representation of a character, using Haskell source-language escape conventions. For example:

 lexLitChar  "\\nHello"  =  [("\\n", "Hello")]

readLitChar :: ReadS Char
Read a string representation of a character, using Haskell source-language escape conventions, and convert it to the character that it encodes. For example:

 readLitChar "\\nHello"  =  [('\n', "Hello")]