string.qbk 11 KB

123456789101112131415161718192021222324252627282930313233343536373839404142434445464748495051525354555657585960616263646566676869707172737475767778798081828384858687888990919293949596979899100101102103104105106107108109110111112113114115116117118119120121122123124125126127128129130131132133134135136137138139140141142143144145146147148149150151152153154155156157158159160161162163164165166167168169170171172173174175176177178179180181182183184185186187188189190191192193194195196197198199200201202203204205206207208209210211212213214215216217218219220221222223224225226227228229230231232233234235236237238239240241242243244245246247248249250251252253254255256257258259260261262263264265266267268269270271272273274275276277278279280281282283284285286287288289290291292293294295296297
  1. [/==============================================================================
  2. Copyright (C) 2001-2011 Joel de Guzman
  3. Copyright (C) 2001-2011 Hartmut Kaiser
  4. Distributed under the Boost Software License, Version 1.0. (See accompanying
  5. file LICENSE_1_0.txt or copy at http://www.boost.org/LICENSE_1_0.txt)
  6. ===============================================================================/]
  7. [section:string String Parsers]
  8. This module includes parsers for strings. Currently, this module
  9. includes the literal and string parsers and the symbol table.
  10. [heading Module Header]
  11. // forwards to <boost/spirit/home/qi/string.hpp>
  12. #include <boost/spirit/include/qi_string.hpp>
  13. Also, see __include_structure__.
  14. [/------------------------------------------------------------------------------]
  15. [section:string String Parsers (`string`, `lit`)]
  16. [heading Description]
  17. The `string` parser matches a string of characters. The `string` parser
  18. is an implicit lexeme: the `skip` parser is not applied in between
  19. characters of the string. The `string` parser has an associated
  20. __char_encoding_namespace__. This is needed when doing basic operations
  21. such as inhibiting case sensitivity. Examples:
  22. string("Hello")
  23. string(L"Hello")
  24. string(s) // s is a std::string
  25. `lit`, like `string`, also matches a string of characters. The main
  26. difference is that `lit` does not synthesize an attribute. A plain
  27. string like `"hello"` or a `std::basic_string` is equivalent to a `lit`.
  28. Examples:
  29. "Hello"
  30. lit("Hello")
  31. lit(L"Hello")
  32. lit(s) // s is a std::string
  33. [heading Header]
  34. // forwards to <boost/spirit/home/qi/string/lit.hpp>
  35. #include <boost/spirit/include/qi_lit.hpp>
  36. [heading Namespace]
  37. [table
  38. [[Name]]
  39. [[`boost::spirit::lit // alias: boost::spirit::qi::lit`]]
  40. [[`ns::string`]]
  41. ]
  42. In the table above, `ns` represents a __char_encoding_namespace__.
  43. [heading Model of]
  44. [:__primitive_parser_concept__]
  45. [variablelist Notation
  46. [[`s`] [A __string__ or a __qi_lazy_argument__ that evaluates to a __string__.]]
  47. [[`ns`] [A __char_encoding_namespace__.]]]
  48. [heading Expression Semantics]
  49. Semantics of an expression is defined only where it differs from, or is
  50. not defined in __primitive_parser_concept__.
  51. [table
  52. [[Expression] [Semantics]]
  53. [[`s`] [Create string parser
  54. from a string, `s`.]]
  55. [[`lit(s)`] [Create a string parser
  56. from a string, `s`.]]
  57. [[`ns::string(s)`] [Create a string parser with `ns` encoding
  58. from a string, `s`.]]
  59. ]
  60. [heading Attributes]
  61. [table
  62. [[Expression] [Attribute]]
  63. [[`s`] [__unused__]]
  64. [[`lit(s)`] [__unused__]]
  65. [[`ns::string(s)`] [`std::basic_string<T>` where `T`
  66. is the underlying character type
  67. of `s`.]]
  68. ]
  69. [heading Complexity]
  70. [:O(N)]
  71. where `N` is the number of characters in the string to be parsed.
  72. [heading Example]
  73. [note The test harness for the example(s) below is presented in the
  74. __qi_basics_examples__ section.]
  75. Some using declarations:
  76. [reference_using_declarations_lit_string]
  77. Basic literals:
  78. [reference_string_literals]
  79. From a `std::string`
  80. [reference_string_std_string]
  81. Lazy strings using __phoenix__
  82. [reference_string_phoenix]
  83. [endsect] [/ lit/string]
  84. [/------------------------------------------------------------------------------]
  85. [section:symbols Symbols Parser (`symbols`)]
  86. [heading Description]
  87. The class `symbols` implements a symbol table: an associative container
  88. (or map) of key-value pairs where the keys are strings. The `symbols`
  89. class can work efficiently with 8, 16, 32 and even 64 bit characters.
  90. Traditionally, symbol table management is maintained separately outside
  91. the grammar through semantic actions. Contrary to standard practice, the
  92. Spirit symbol table class `symbols` is-a parser, an instance of which may
  93. be used anywhere in the grammar specification. It is an example of a
  94. dynamic parser. A dynamic parser is characterized by its ability to
  95. modify its behavior at run time. Initially, an empty symbols object
  96. matches nothing. At any time, symbols may be added, thus, dynamically
  97. altering its behavior.
  98. [heading Header]
  99. // forwards to <boost/spirit/home/qi/string/symbols.hpp>
  100. #include <boost/spirit/include/qi_symbols.hpp>
  101. Also, see __include_structure__.
  102. [heading Namespace]
  103. [table
  104. [[Name]]
  105. [[`boost::spirit::qi::symbols`]]
  106. [[`boost::spirit::qi::tst`]]
  107. [[`boost::spirit::qi::tst_map`]]
  108. ]
  109. [heading Synopsis]
  110. template <typename Char, typename T, typename Lookup>
  111. struct symbols;
  112. [heading Template parameters]
  113. [table
  114. [[Parameter] [Description] [Default]]
  115. [[`Char`] [The character type
  116. of the symbol strings.] [`char`]]
  117. [[`T`] [The data type associated
  118. with each symbol.] [__unused_type__]]
  119. [[`Lookup`] [The symbol search
  120. implementation] [`tst<Char, T>`]]
  121. ]
  122. [heading Model of]
  123. [:__primitive_parser_concept__]
  124. [variablelist Notation
  125. [[`Sym`] [A `symbols` type.]]
  126. [[`Char`] [A character type.]]
  127. [[`T`] [A data type.]]
  128. [[`sym`, `sym2`][`symbols` objects.]]
  129. [[`sseq`] [An __stl__ container of strings.]]
  130. [[`dseq`] [An __stl__ container of data with `value_type` `T`.]]
  131. [[`s1`...`sN`] [A __string__.]]
  132. [[`d1`...`dN`] [Objects of type `T`.]]
  133. [[`f`] [A callable function or function object.]]
  134. [[`f`, `l`] [`ForwardIterator` first/last pair.]]
  135. ]
  136. [heading Expression Semantics]
  137. Semantics of an expression is defined only where it differs from, or is not
  138. defined in __primitive_parser_concept__.
  139. [table
  140. [[Expression] [Semantics]]
  141. [[`Sym()`] [Construct an empty symbols names `"symbols"`.]]
  142. [[`Sym(name)`] [Construct an empty symbols named `name`.]]
  143. [[`Sym(sym2)`] [Copy construct a symbols from `sym2` (Another `symbols` object).]]
  144. [[`Sym(sseq)`] [Construct symbols from `sseq` (an __stl__ container of strings) named `"symbols"`.]]
  145. [[`Sym(sseq, name)`] [Construct symbols from `sseq` (an __stl__ container of strings) named `name`.]]
  146. [[`Sym(sseq, dseq)`] [Construct symbols from `sseq` and `dseq`
  147. (An __stl__ container of strings and an __stl__ container of
  148. data with `value_type` `T`) which is named `"symbols"`.]]
  149. [[`Sym(sseq, dseq, name)`] [Construct symbols from `sseq` and `dseq`
  150. (An __stl__ container of strings and an __stl__ container of
  151. data with `value_type` `T`) which is named `name`.]]
  152. [[`sym = sym2`] [Assign `sym2` to `sym`.]]
  153. [[`sym = s1, s2, ..., sN`] [Assign one or more symbols (`s1`...`sN`) to `sym`.]]
  154. [[`sym += s1, s2, ..., sN`] [Add one or more symbols (`s1`...`sN`) to `sym`.]]
  155. [[`sym.add(s1)(s2)...(sN)`] [Add one or more symbols (`s1`...`sN`) to `sym`.]]
  156. [[`sym.add(s1, d1)(s2, d2)...(sN, dN)`]
  157. [Add one or more symbols (`s1`...`sN`)
  158. with associated data (`d1`...`dN`) to `sym`.]]
  159. [[`sym -= s1, s2, ..., sN`] [Remove one or more symbols (`s1`...`sN`) from `sym`.]]
  160. [[`sym.remove(s1)(s2)...(sN)`] [Remove one or more symbols (`s1`...`sN`) from `sym`.]]
  161. [[`sym.clear()`] [Erase all of the symbols in `sym`.]]
  162. [[`sym.at(s)`] [Return a reference to the object associated
  163. with symbol, `s`. If `sym` does not already
  164. contain such an object, `at` inserts the default
  165. object `T()`.]]
  166. [[`sym.find(s)`] [Return a pointer to the object associated
  167. with symbol, `s`. If `sym` does not already
  168. contain such an object, `find` returns a null
  169. pointer.]]
  170. [[`sym.prefix_find(f, l)`] [Return a pointer to the object associated
  171. with longest symbol that matches the beginning
  172. of the range `[f, l)`, and updates `f` to point
  173. to one past the end of that match. If no symbol matches,
  174. then return a null pointer, and `f` is unchanged.]]
  175. [[`sym.for_each(f)`] [For each symbol in `sym`, `s`, a
  176. `std::basic_string<Char>` with associated data,
  177. `d`, an object of type `T`, invoke `f(s, d)`]]
  178. [[`sym.name()`] [Retrieve the current name of the symbols object.]]
  179. [[`sym.name(name)`] [Set the current name of the symbols object to be `name`.]]
  180. ]
  181. [heading Attributes]
  182. The attribute of `symbol<Char, T>` is `T`.
  183. [heading Complexity]
  184. The default implementation uses a Ternary Search Tree (TST) with
  185. complexity:
  186. [:O(log n+k)]
  187. Where k is the length of the string to be searched in a TST with n
  188. strings.
  189. TSTs are faster than hashing for many typical search problems especially
  190. when the search interface is iterator based. TSTs are many times faster
  191. than hash tables for unsuccessful searches since mismatches are
  192. discovered earlier after examining only a few characters. Hash tables
  193. always examine an entire key when searching.
  194. An alternative implementation uses a hybrid hash-map front end (for the
  195. first character) plus a TST: `tst_map`. This gives us a complexity of
  196. [:O(1 + log n+k-1)]
  197. This is found to be significantly faster than plain TST, albeit with a
  198. bit more memory usage requirements (each slot in the hash-map is a TST
  199. node). If you require a lot of symbols to be searched, use the `tst_map`
  200. implementation. This can be done by using `tst_map` as the third
  201. template parameter to the symbols class:
  202. symbols<Char, T, tst_map<Char, T> > sym;
  203. [heading Example]
  204. [note The test harness for the example(s) below is presented in the
  205. __qi_basics_examples__ section.]
  206. Some using declarations:
  207. [reference_using_declarations_symbols]
  208. Symbols with data:
  209. [reference_symbols_with_data]
  210. When `symbols` is used for case-insensitive parsing (in a __qi_no_case__
  211. directive), added symbol strings should be in lowercase. Symbol strings
  212. containing one or more uppercase characters will not match any input
  213. when symbols is used in a `no_case` directive.
  214. [reference_symbols_with_no_case]
  215. [endsect] [/ symbols]
  216. [endsect] [/ String]