char.qbk 10.0 KB

123456789101112131415161718192021222324252627282930313233343536373839404142434445464748495051525354555657585960616263646566676869707172737475767778798081828384858687888990919293949596979899100101102103104105106107108109110111112113114115116117118119120121122123124125126127128129130131132133134135136137138139140141142143144145146147148149150151152153154155156157158159160161162163164165166167168169170171172173174175176177178179180181182183184185186187188189190191192193194195196197198199200201202203204205206207208209210211212213214215216217218219220221222223224225226227228229230231232233234235236237238239240241242243244245246247248249250251252253254255256257258259260261262263264265266267268269270271272273274275276277278279280281282283284285286287288289290291292293294295296297298299300301302303304305
  1. [/==============================================================================
  2. Copyright (C) 2001-2011 Joel de Guzman
  3. Copyright (C) 2001-2011 Hartmut Kaiser
  4. Distributed under the Boost Software License, Version 1.0. (See accompanying
  5. file LICENSE_1_0.txt or copy at http://www.boost.org/LICENSE_1_0.txt)
  6. ===============================================================================/]
  7. [section:char Character Parsers]
  8. This module includes parsers for single characters. Currently, this
  9. module includes literal chars (e.g. `'x'`, `L'x'`), `char_` (single
  10. characters, ranges and character sets) and the encoding specific
  11. character classifiers (`alnum`, `alpha`, `digit`, `xdigit`, etc.).
  12. [heading Module Header]
  13. // forwards to <boost/spirit/home/qi/char.hpp>
  14. #include <boost/spirit/include/qi_char.hpp>
  15. Also, see __include_structure__.
  16. [/------------------------------------------------------------------------------]
  17. [section:char Character Parser (`char_`, `lit`)]
  18. [heading Description]
  19. The `char_` parser matches single characters. The `char_` parser has an
  20. associated __char_encoding_namespace__. This is needed when doing basic
  21. operations such as inhibiting case sensitivity and dealing with
  22. character ranges.
  23. There are various forms of `char_`.
  24. [heading char_]
  25. The no argument form of `char_` matches any character in the associated
  26. __char_encoding_namespace__.
  27. char_ // matches any character
  28. [heading char_(ch)]
  29. The single argument form of `char_` (with a character argument) matches
  30. the supplied character.
  31. char_('x') // matches 'x'
  32. char_(L'x') // matches L'x'
  33. char_(x) // matches x (a char)
  34. [heading char_(first, last)]
  35. `char_` with two arguments, matches a range of characters.
  36. char_('a','z') // alphabetic characters
  37. char_(L'0',L'9') // digits
  38. A range of characters is created from a low-high character pair. Such a
  39. parser matches a single character that is in the range, including both
  40. endpoints. Note, the first character must be /before/ the second,
  41. according to the underlying __char_encoding_namespace__.
  42. Character mapping is inherently platform dependent. It is not guaranteed
  43. in the standard for example that `'A' < 'Z'`, that is why in Spirit2, we
  44. purposely attach a specific __char_encoding_namespace__ (such as ASCII,
  45. ISO-8859-1) to the `char_` parser to eliminate such ambiguities.
  46. [note *Sparse bit vectors*
  47. To accommodate 16/32 and 64 bit characters, the char-set statically
  48. switches from a `std::bitset` implementation when the character type is
  49. not greater than 8 bits, to a sparse bit/boolean set which uses a sorted
  50. vector of disjoint ranges (`range_run`). The set is constructed from
  51. ranges such that adjacent or overlapping ranges are coalesced.
  52. `range_runs` are very space-economical in situations where there are lots
  53. of ranges and a few individual disjoint values. Searching is O(log n)
  54. where n is the number of ranges.]
  55. [heading char_(def)]
  56. Lastly, when given a string (a plain C string, a `std::basic_string`,
  57. etc.), the string is regarded as a char-set definition string following
  58. a syntax that resembles posix style regular expression character sets
  59. (except that double quotes delimit the set elements instead of square
  60. brackets and there is no special negation ^ character). Examples:
  61. char_("a-zA-Z") // alphabetic characters
  62. char_("0-9a-fA-F") // hexadecimal characters
  63. char_("actgACTG") // DNA identifiers
  64. char_("\x7f\x7e") // Hexadecimal 0x7F and 0x7E
  65. [heading lit(ch)]
  66. `lit`, when passed a single character, behaves like the single argument
  67. `char_` except that `lit` does not synthesize an attribute. A plain
  68. `char` or `wchar_t` is equivalent to a `lit`.
  69. [note `lit` is reused by both the [qi_lit_string string parsers] and the
  70. char parsers. In general, a char parser is created when you pass in a
  71. character and a string parser is created when you pass in a string. The
  72. exception is when you pass a single element literal string, e.g.
  73. `lit("x")`. In this case, we optimize this to create a char parser
  74. instead of a string parser.]
  75. Examples:
  76. 'x'
  77. lit('x')
  78. lit(L'x')
  79. lit(c) // c is a char
  80. [heading Header]
  81. // forwards to <boost/spirit/home/qi/char/char.hpp>
  82. #include <boost/spirit/include/qi_char_.hpp>
  83. Also, see __include_structure__.
  84. [heading Namespace]
  85. [table
  86. [[Name]]
  87. [[`boost::spirit::lit // alias: boost::spirit::qi::lit` ]]
  88. [[`ns::char_`]]
  89. ]
  90. In the table above, `ns` represents a __char_encoding_namespace__.
  91. [heading Model of]
  92. [:__primitive_parser_concept__]
  93. [variablelist Notation
  94. [[`c`, `f`, `l`] [A literal char, e.g. `'x'`, `L'x'` or anything that can be
  95. converted to a `char` or `wchar_t`, or a __qi_lazy_argument__
  96. that evaluates to anything that can be converted to a `char`
  97. or `wchar_t`.]]
  98. [[`ns`] [A __char_encoding_namespace__.]]
  99. [[`cs`] [A __string__ or a __qi_lazy_argument__ that evaluates to a __string__
  100. that specifies a char-set definition string following a syntax
  101. that resembles posix style regular expression character sets
  102. (except the square brackets and the negation `^` character).]]
  103. [[`cp`] [A char parser, a char range parser or a char set parser.]]
  104. ]
  105. [heading Expression Semantics]
  106. Semantics of an expression is defined only where it differs from, or is
  107. not defined in __primitive_parser_concept__.
  108. [table
  109. [[Expression] [Semantics]]
  110. [[`c`] [Create char parser from a char, `c`.]]
  111. [[`lit(c)`] [Create a char parser from a char, `c`.]]
  112. [[`ns::char_`] [Create a char parser that matches any character in the
  113. `ns` encoding.]]
  114. [[`ns::char_(c)`] [Create a char parser with `ns` encoding from a char, `c`.]]
  115. [[`ns::char_(f, l)`][Create a char-range parser that matches characters from
  116. range (`f` to `l`, inclusive) with `ns` encoding.]]
  117. [[`ns::char_(cs)`] [Create a char-set parser with `ns` encoding from a char-set
  118. definition string, `cs`.]]
  119. [[`~cp`] [Negate `cp`. The result is a negated char parser that
  120. matches any character in the `ns` encoding except the
  121. characters matched by `cp`.]]
  122. ]
  123. [heading Attributes]
  124. [table
  125. [[Expression] [Attribute]]
  126. [[`c`] [__unused__ or if `c` is a __qi_lazy_argument__, the character
  127. type returned by invoking it.]]
  128. [[`lit(c)`] [__unused__ or if `c` is a __qi_lazy_argument__, the character
  129. type returned by invoking it.]]
  130. [[`ns::char_`] [The character type of the __char_encoding_namespace__, `ns`.]]
  131. [[`ns::char_(c)`] [The character type of the __char_encoding_namespace__, `ns`.]]
  132. [[`ns::char_(f, l)`][The character type of the __char_encoding_namespace__, `ns`.]]
  133. [[`ns::char_(cs)`] [The character type of the __char_encoding_namespace__, `ns`.]]
  134. [[`~cp`] [The attribute of `cp`.]]
  135. ]
  136. [heading Complexity]
  137. [:*O(N)*, except for char-sets with 16-bit (or more) characters (e.g.
  138. `wchar_t`). These have *O(log N)* complexity, where N is the number of
  139. distinct character ranges in the set.]
  140. [heading Example]
  141. [note The test harness for the example(s) below is presented in the
  142. __qi_basics_examples__ section.]
  143. Some using declarations:
  144. [reference_using_declarations_lit_char]
  145. Basic literals:
  146. [reference_char_literals]
  147. Range:
  148. [reference_char_range]
  149. Character set:
  150. [reference_char_set]
  151. Lazy char_ using __phoenix__
  152. [reference_char_phoenix]
  153. [endsect] [/ Char]
  154. [/------------------------------------------------------------------------------]
  155. [section:char_class Character Classification Parsers (`alnum`, `digit`, etc.)]
  156. [heading Description]
  157. The library has the full repertoire of single character parsers for
  158. character classification. This includes the usual `alnum`, `alpha`,
  159. `digit`, `xdigit`, etc. parsers. These parsers have an associated
  160. __char_encoding_namespace__. This is needed when doing basic operations
  161. such as inhibiting case sensitivity.
  162. [heading Header]
  163. // forwards to <boost/spirit/home/qi/char/char_class.hpp>
  164. #include <boost/spirit/include/qi_char_class.hpp>
  165. Also, see __include_structure__.
  166. [heading Namespace]
  167. [table
  168. [[Name]]
  169. [[`ns::alnum`]]
  170. [[`ns::alpha`]]
  171. [[`ns::blank`]]
  172. [[`ns::cntrl`]]
  173. [[`ns::digit`]]
  174. [[`ns::graph`]]
  175. [[`ns::lower`]]
  176. [[`ns::print`]]
  177. [[`ns::punct`]]
  178. [[`ns::space`]]
  179. [[`ns::upper`]]
  180. [[`ns::xdigit`]]
  181. ]
  182. In the table above, `ns` represents a __char_encoding_namespace__.
  183. [heading Model of]
  184. [:__primitive_parser_concept__]
  185. [variablelist Notation
  186. [[`ns`] [A __char_encoding_namespace__.]]
  187. ]
  188. [heading Expression Semantics]
  189. Semantics of an expression is defined only where it differs from, or is
  190. not defined in __primitive_parser_concept__.
  191. [table
  192. [[Expression] [Semantics]]
  193. [[`ns::alnum`] [Matches alpha-numeric characters]]
  194. [[`ns::alpha`] [Matches alphabetic characters]]
  195. [[`ns::blank`] [Matches spaces or tabs]]
  196. [[`ns::cntrl`] [Matches control characters]]
  197. [[`ns::digit`] [Matches numeric digits]]
  198. [[`ns::graph`] [Matches non-space printing characters]]
  199. [[`ns::lower`] [Matches lower case letters]]
  200. [[`ns::print`] [Matches printable characters]]
  201. [[`ns::punct`] [Matches punctuation symbols]]
  202. [[`ns::space`] [Matches spaces, tabs, returns, and newlines]]
  203. [[`ns::upper`] [Matches upper case letters]]
  204. [[`ns::xdigit`] [Matches hexadecimal digits]]
  205. ]
  206. [heading Attributes]
  207. [:The character type of the __char_encoding_namespace__, `ns`.]
  208. [heading Complexity]
  209. [:O(N)]
  210. [heading Example]
  211. [note The test harness for the example(s) below is presented in the
  212. __qi_basics_examples__ section.]
  213. Some using declarations:
  214. [reference_using_declarations_char_class]
  215. Basic usage:
  216. [reference_char_class]
  217. [endsect] [/ Char Classification]
  218. [endsect]