123456789101112131415161718192021222324252627282930313233343536373839404142434445464748495051525354555657585960616263646566676869707172737475767778798081828384858687888990919293949596979899100101102103104105106107108109110111112113114115116117118119120121122123124125126127128129130131132133134135136137138139140141142143144145146147148149150151152153154155156157158159160161162163164165166167168169170171172173174175176177178179180181182183184185186187188189190191192193194195196197198199200201202203204205206207208209210211212213214215216217218219220221222223224225226227228229230231232233234235236237238239240241242243244245246247248249250251252253254255256257258259260261262263264265266267268269270271272273274275276277278279280281282283284285286287288289290291292293294295296297298299300301302303304305 |
- [/==============================================================================
- Copyright (C) 2001-2011 Joel de Guzman
- Copyright (C) 2001-2011 Hartmut Kaiser
- Distributed under the Boost Software License, Version 1.0. (See accompanying
- file LICENSE_1_0.txt or copy at http://www.boost.org/LICENSE_1_0.txt)
- ===============================================================================/]
- [section:char Character Parsers]
- This module includes parsers for single characters. Currently, this
- module includes literal chars (e.g. `'x'`, `L'x'`), `char_` (single
- characters, ranges and character sets) and the encoding specific
- character classifiers (`alnum`, `alpha`, `digit`, `xdigit`, etc.).
- [heading Module Header]
- // forwards to <boost/spirit/home/qi/char.hpp>
- #include <boost/spirit/include/qi_char.hpp>
- Also, see __include_structure__.
- [/------------------------------------------------------------------------------]
- [section:char Character Parser (`char_`, `lit`)]
- [heading Description]
- The `char_` parser matches single characters. The `char_` parser has an
- associated __char_encoding_namespace__. This is needed when doing basic
- operations such as inhibiting case sensitivity and dealing with
- character ranges.
- There are various forms of `char_`.
- [heading char_]
- The no argument form of `char_` matches any character in the associated
- __char_encoding_namespace__.
- char_ // matches any character
- [heading char_(ch)]
- The single argument form of `char_` (with a character argument) matches
- the supplied character.
- char_('x') // matches 'x'
- char_(L'x') // matches L'x'
- char_(x) // matches x (a char)
- [heading char_(first, last)]
- `char_` with two arguments, matches a range of characters.
- char_('a','z') // alphabetic characters
- char_(L'0',L'9') // digits
- A range of characters is created from a low-high character pair. Such a
- parser matches a single character that is in the range, including both
- endpoints. Note, the first character must be /before/ the second,
- according to the underlying __char_encoding_namespace__.
- Character mapping is inherently platform dependent. It is not guaranteed
- in the standard for example that `'A' < 'Z'`, that is why in Spirit2, we
- purposely attach a specific __char_encoding_namespace__ (such as ASCII,
- ISO-8859-1) to the `char_` parser to eliminate such ambiguities.
- [note *Sparse bit vectors*
- To accommodate 16/32 and 64 bit characters, the char-set statically
- switches from a `std::bitset` implementation when the character type is
- not greater than 8 bits, to a sparse bit/boolean set which uses a sorted
- vector of disjoint ranges (`range_run`). The set is constructed from
- ranges such that adjacent or overlapping ranges are coalesced.
- `range_runs` are very space-economical in situations where there are lots
- of ranges and a few individual disjoint values. Searching is O(log n)
- where n is the number of ranges.]
- [heading char_(def)]
- Lastly, when given a string (a plain C string, a `std::basic_string`,
- etc.), the string is regarded as a char-set definition string following
- a syntax that resembles posix style regular expression character sets
- (except that double quotes delimit the set elements instead of square
- brackets and there is no special negation ^ character). Examples:
- char_("a-zA-Z") // alphabetic characters
- char_("0-9a-fA-F") // hexadecimal characters
- char_("actgACTG") // DNA identifiers
- char_("\x7f\x7e") // Hexadecimal 0x7F and 0x7E
- [heading lit(ch)]
- `lit`, when passed a single character, behaves like the single argument
- `char_` except that `lit` does not synthesize an attribute. A plain
- `char` or `wchar_t` is equivalent to a `lit`.
- [note `lit` is reused by both the [qi_lit_string string parsers] and the
- char parsers. In general, a char parser is created when you pass in a
- character and a string parser is created when you pass in a string. The
- exception is when you pass a single element literal string, e.g.
- `lit("x")`. In this case, we optimize this to create a char parser
- instead of a string parser.]
- Examples:
- 'x'
- lit('x')
- lit(L'x')
- lit(c) // c is a char
- [heading Header]
- // forwards to <boost/spirit/home/qi/char/char.hpp>
- #include <boost/spirit/include/qi_char_.hpp>
- Also, see __include_structure__.
- [heading Namespace]
- [table
- [[Name]]
- [[`boost::spirit::lit // alias: boost::spirit::qi::lit` ]]
- [[`ns::char_`]]
- ]
- In the table above, `ns` represents a __char_encoding_namespace__.
- [heading Model of]
- [:__primitive_parser_concept__]
- [variablelist Notation
- [[`c`, `f`, `l`] [A literal char, e.g. `'x'`, `L'x'` or anything that can be
- converted to a `char` or `wchar_t`, or a __qi_lazy_argument__
- that evaluates to anything that can be converted to a `char`
- or `wchar_t`.]]
- [[`ns`] [A __char_encoding_namespace__.]]
- [[`cs`] [A __string__ or a __qi_lazy_argument__ that evaluates to a __string__
- that specifies a char-set definition string following a syntax
- that resembles posix style regular expression character sets
- (except the square brackets and the negation `^` character).]]
- [[`cp`] [A char parser, a char range parser or a char set parser.]]
- ]
- [heading Expression Semantics]
- Semantics of an expression is defined only where it differs from, or is
- not defined in __primitive_parser_concept__.
- [table
- [[Expression] [Semantics]]
- [[`c`] [Create char parser from a char, `c`.]]
- [[`lit(c)`] [Create a char parser from a char, `c`.]]
- [[`ns::char_`] [Create a char parser that matches any character in the
- `ns` encoding.]]
- [[`ns::char_(c)`] [Create a char parser with `ns` encoding from a char, `c`.]]
- [[`ns::char_(f, l)`][Create a char-range parser that matches characters from
- range (`f` to `l`, inclusive) with `ns` encoding.]]
- [[`ns::char_(cs)`] [Create a char-set parser with `ns` encoding from a char-set
- definition string, `cs`.]]
- [[`~cp`] [Negate `cp`. The result is a negated char parser that
- matches any character in the `ns` encoding except the
- characters matched by `cp`.]]
- ]
- [heading Attributes]
- [table
- [[Expression] [Attribute]]
- [[`c`] [__unused__ or if `c` is a __qi_lazy_argument__, the character
- type returned by invoking it.]]
- [[`lit(c)`] [__unused__ or if `c` is a __qi_lazy_argument__, the character
- type returned by invoking it.]]
- [[`ns::char_`] [The character type of the __char_encoding_namespace__, `ns`.]]
- [[`ns::char_(c)`] [The character type of the __char_encoding_namespace__, `ns`.]]
- [[`ns::char_(f, l)`][The character type of the __char_encoding_namespace__, `ns`.]]
- [[`ns::char_(cs)`] [The character type of the __char_encoding_namespace__, `ns`.]]
- [[`~cp`] [The attribute of `cp`.]]
- ]
- [heading Complexity]
- [:*O(N)*, except for char-sets with 16-bit (or more) characters (e.g.
- `wchar_t`). These have *O(log N)* complexity, where N is the number of
- distinct character ranges in the set.]
- [heading Example]
- [note The test harness for the example(s) below is presented in the
- __qi_basics_examples__ section.]
- Some using declarations:
- [reference_using_declarations_lit_char]
- Basic literals:
- [reference_char_literals]
- Range:
- [reference_char_range]
- Character set:
- [reference_char_set]
- Lazy char_ using __phoenix__
- [reference_char_phoenix]
- [endsect] [/ Char]
- [/------------------------------------------------------------------------------]
- [section:char_class Character Classification Parsers (`alnum`, `digit`, etc.)]
- [heading Description]
- The library has the full repertoire of single character parsers for
- character classification. This includes the usual `alnum`, `alpha`,
- `digit`, `xdigit`, etc. parsers. These parsers have an associated
- __char_encoding_namespace__. This is needed when doing basic operations
- such as inhibiting case sensitivity.
- [heading Header]
- // forwards to <boost/spirit/home/qi/char/char_class.hpp>
- #include <boost/spirit/include/qi_char_class.hpp>
- Also, see __include_structure__.
- [heading Namespace]
- [table
- [[Name]]
- [[`ns::alnum`]]
- [[`ns::alpha`]]
- [[`ns::blank`]]
- [[`ns::cntrl`]]
- [[`ns::digit`]]
- [[`ns::graph`]]
- [[`ns::lower`]]
- [[`ns::print`]]
- [[`ns::punct`]]
- [[`ns::space`]]
- [[`ns::upper`]]
- [[`ns::xdigit`]]
- ]
- In the table above, `ns` represents a __char_encoding_namespace__.
- [heading Model of]
- [:__primitive_parser_concept__]
- [variablelist Notation
- [[`ns`] [A __char_encoding_namespace__.]]
- ]
- [heading Expression Semantics]
- Semantics of an expression is defined only where it differs from, or is
- not defined in __primitive_parser_concept__.
- [table
- [[Expression] [Semantics]]
- [[`ns::alnum`] [Matches alpha-numeric characters]]
- [[`ns::alpha`] [Matches alphabetic characters]]
- [[`ns::blank`] [Matches spaces or tabs]]
- [[`ns::cntrl`] [Matches control characters]]
- [[`ns::digit`] [Matches numeric digits]]
- [[`ns::graph`] [Matches non-space printing characters]]
- [[`ns::lower`] [Matches lower case letters]]
- [[`ns::print`] [Matches printable characters]]
- [[`ns::punct`] [Matches punctuation symbols]]
- [[`ns::space`] [Matches spaces, tabs, returns, and newlines]]
- [[`ns::upper`] [Matches upper case letters]]
- [[`ns::xdigit`] [Matches hexadecimal digits]]
- ]
- [heading Attributes]
- [:The character type of the __char_encoding_namespace__, `ns`.]
- [heading Complexity]
- [:O(N)]
- [heading Example]
- [note The test harness for the example(s) below is presented in the
- __qi_basics_examples__ section.]
- Some using declarations:
- [reference_using_declarations_char_class]
- Basic usage:
- [reference_char_class]
- [endsect] [/ Char Classification]
- [endsect]
|