16 Data.Char

[next] [prev] [prev-tail] [tail] [up]

Chapter 16
Data.Char

module Data.Char (
    Char,  String,  isControl,  isSpace,  isLower,  isUpper,  isAlpha,
    isAlphaNum,  isPrint,  isDigit,  isOctDigit,  isHexDigit,  isLetter,
    isMark,  isNumber,  isPunctuation,  isSymbol,  isSeparator,  isAscii,
    isLatin1,  isAsciiUpper,  isAsciiLower,
    GeneralCategory(UppercaseLetter,
                    LowercaseLetter,
                    TitlecaseLetter,
                    ModifierLetter,
                    OtherLetter,
                    NonSpacingMark,
                    SpacingCombiningMark,
                    EnclosingMark,
                    DecimalNumber,
                    LetterNumber,
                    OtherNumber,
                    ConnectorPunctuation,
                    DashPunctuation,
                    OpenPunctuation,
                    ClosePunctuation,
                    InitialQuote,
                    FinalQuote,
                    OtherPunctuation,
                    MathSymbol,
                    CurrencySymbol,
                    ModifierSymbol,
                    OtherSymbol,
                    Space,
                    LineSeparator,
                    ParagraphSeparator,
                    Control,
                    Format,
                    Surrogate,
                    PrivateUse,
                    NotAssigned),
    generalCategory,  toUpper,  toLower,  toTitle,  digitToInt,  intToDigit,
    ord,  chr,  showLitChar,  lexLitChar,  readLitChar
  ) where

16.1 Characters and strings

data Char: The character type Char is an enumeration whose values represent Unicode (or equivalently ISO/IEC 10646) characters (see http://www.unicode.org/ for details). This set extends the ISO 8859-1 (Latin-1) character set (the first 256 charachers), which is itself an extension of the ASCII character set (the first 128 characters). A character literal in Haskell has type Char.
To convert a Char to or from the corresponding Int value defined by Unicode, use Prelude.toEnum and Prelude.fromEnum from the Prelude.Enum class respectively (or equivalently ord and chr).

instance Bounded Char instance Enum Char instance Eq Char instance Ord Char instance Read Char instance Show Char instance Ix Char instance Storable Char

type String = [Char]: A String is a list of characters. String constants in Haskell are values of type String.

16.2 Character classification

Unicode characters are divided into letters, numbers, marks, punctuation, symbols, separators (including spaces) and others (including control characters).

isControl :: Char -> Bool: Selects control characters, which are the non-printing characters of the Latin-1 subset of Unicode.

isSpace :: Char -> Bool: Returns True for any Unicode space character, and the control characters \t, \n, \r, \f, \v.

isLower :: Char -> Bool: Selects lower-case alphabetic Unicode characters (letters).

isUpper :: Char -> Bool: Selects upper-case or title-case alphabetic Unicode characters (letters). Title case is used by a small number of letter ligatures like the single-character form of Lj.

isAlpha :: Char -> Bool: Selects alphabetic Unicode characters (lower-case, upper-case and title-case letters, plus letters of caseless scripts and modifiers letters). This function is equivalent to Data.Char.isLetter.

isAlphaNum :: Char -> Bool: Selects alphabetic or numeric digit Unicode characters.
Note that numeric digits outside the ASCII range are selected by this function but not by isDigit. Such digits may be part of identifiers but are not used by the printer and reader to represent numbers.

isPrint :: Char -> Bool: Selects printable Unicode characters (letters, numbers, marks, punctuation, symbols and spaces).

isDigit :: Char -> Bool: Selects ASCII digits, i.e. '0'..'9'.

isOctDigit :: Char -> Bool: Selects ASCII octal digits, i.e. '0'..'7'.

isHexDigit :: Char -> Bool: Selects ASCII hexadecimal digits, i.e. '0'..'9', 'a'..'f', 'A'..'F'.

isLetter :: Char -> Bool: Selects alphabetic Unicode characters (lower-case, upper-case and title-case letters, plus letters of caseless scripts and modifiers letters). This function is equivalent to Data.Char.isAlpha.

isMark :: Char -> Bool: Selects Unicode mark characters, e.g. accents and the like, which combine with preceding letters.

isNumber :: Char -> Bool: Selects Unicode numeric characters, including digits from various scripts, Roman numerals, etc.

isPunctuation :: Char -> Bool: Selects Unicode punctuation characters, including various kinds of connectors, brackets and quotes.

isSymbol :: Char -> Bool: Selects Unicode symbol characters, including mathematical and currency symbols.

isSeparator :: Char -> Bool: Selects Unicode space and separator characters.

16.2.1 Subranges

isAscii :: Char -> Bool: Selects the first 128 characters of the Unicode character set, corresponding to the ASCII character set.

isLatin1 :: Char -> Bool: Selects the first 256 characters of the Unicode character set, corresponding to the ISO 8859-1 (Latin-1) character set.

isAsciiUpper :: Char -> Bool: Selects ASCII upper-case letters, i.e. characters satisfying both isAscii and isUpper.

isAsciiLower :: Char -> Bool: Selects ASCII lower-case letters, i.e. characters satisfying both isAscii and isLower.

16.2.2 Unicode general categories

data GeneralCategory

=	UppercaseLetter	Lu: Letter, Uppercase
\|	LowercaseLetter	Ll: Letter, Lowercase
\|	TitlecaseLetter	Lt: Letter, Titlecase
\|	ModifierLetter	Lm: Letter, Modifier
\|	OtherLetter	Lo: Letter, Other
\|	NonSpacingMark	Mn: Mark, Non-Spacing
\|	SpacingCombiningMark	Mc: Mark, Spacing Combining
\|	EnclosingMark	Me: Mark, Enclosing
\|	DecimalNumber	Nd: Number, Decimal
\|	LetterNumber	Nl: Number, Letter
\|	OtherNumber	No: Number, Other
\|	ConnectorPunctuation	Pc: Punctuation, Connector
\|	DashPunctuation	Pd: Punctuation, Dash
\|	OpenPunctuation	Ps: Punctuation, Open
\|	ClosePunctuation	Pe: Punctuation, Close
\|	InitialQuote	Pi: Punctuation, Initial quote
\|	FinalQuote	Pf: Punctuation, Final quote
\|	OtherPunctuation	Po: Punctuation, Other
\|	MathSymbol	Sm: Symbol, Math
\|	CurrencySymbol	Sc: Symbol, Currency
\|	ModifierSymbol	Sk: Symbol, Modifier
\|	OtherSymbol	So: Symbol, Other
\|	Space	Zs: Separator, Space
\|	LineSeparator	Zl: Separator, Line
\|	ParagraphSeparator	Zp: Separator, Paragraph
\|	Control	Cc: Other, Control
\|	Format	Cf: Other, Format
\|	Surrogate	Cs: Other, Surrogate
\|	PrivateUse	Co: Other, Private Use
\|	NotAssigned	Cn: Other, Not Assigned

Unicode General Categories (column 2 of the UnicodeData table) in the order they are listed in the Unicode standard.

instance Bounded GeneralCategory instance Enum GeneralCategory instance Eq GeneralCategory instance Ord GeneralCategory instance Read GeneralCategory instance Show GeneralCategory instance Ix GeneralCategory

generalCategory :: Char -> GeneralCategory: The Unicode general category of the character.

16.3 Case conversion

toUpper :: Char -> Char: Convert a letter to the corresponding upper-case letter, if any. Any other character is returned unchanged.

toLower :: Char -> Char: Convert a letter to the corresponding lower-case letter, if any. Any other character is returned unchanged.

toTitle :: Char -> Char: Convert a letter to the corresponding title-case or upper-case letter, if any. (Title case differs from upper case only for a small number of ligature letters.) Any other character is returned unchanged.

16.4 Single digit characters

digitToInt :: Char -> Int: Convert a single digit Char to the corresponding Int. This function fails unless its argument satisfies isHexDigit, but recognises both upper and lower-case hexadecimal digits (i.e. '0'..'9', 'a'..'f', 'A'..'F').

intToDigit :: Int -> Char: Convert an Int in the range 0..15 to the corresponding single digit Char. This function fails on other inputs, and generates lower-case hexadecimal digits.

16.5 Numeric representations

ord :: Char -> Int: The Prelude.fromEnum method restricted to the type Data.Char.Char.

chr :: Int -> Char: The Prelude.toEnum method restricted to the type Data.Char.Char.

16.6 String representations

showLitChar :: Char -> ShowS: Convert a character to a string using only printable characters, using Haskell source-language escape conventions. For example:

showLitChar '\n' s = "\\n" ++ s

lexLitChar :: ReadS String: Read a string representation of a character, using Haskell source-language escape conventions. For example:

lexLitChar "\\nHello" = [("\\n", "Hello")]

readLitChar :: ReadS Char: Read a string representation of a character, using Haskell source-language escape conventions, and convert it to the character that it encodes. For example:

readLitChar "\\nHello" = [('\n', "Hello")]

[next] [prev] [prev-tail] [front] [up]