lexer_semantic_actions.qbk 10 KB

123456789101112131415161718192021222324252627282930313233343536373839404142434445464748495051525354555657585960616263646566676869707172737475767778798081828384858687888990919293949596979899100101102103104105106107108109110111112113114115116117118119120121122123124125126127128129130131132133134135136137138139140141142143144145146147148149150151152153154155156157158159160161162163164165166167168169170171172173174175176177178179180181182183184185186
  1. [/==============================================================================
  2. Copyright (C) 2001-2011 Joel de Guzman
  3. Copyright (C) 2001-2011 Hartmut Kaiser
  4. Distributed under the Boost Software License, Version 1.0. (See accompanying
  5. file LICENSE_1_0.txt or copy at http://www.boost.org/LICENSE_1_0.txt)
  6. ===============================================================================/]
  7. [section:lexer_semantic_actions Lexer Semantic Actions]
  8. The main task of a lexer normally is to recognize tokens in the input.
  9. Traditionally this has been complemented with the possibility to execute
  10. arbitrary code whenever a certain token has been detected. __lex__ has been
  11. designed to support this mode of operation as well. We borrow from the concept
  12. of semantic actions for parsers (__qi__) and generators (__karma__). Lexer
  13. semantic actions may be attached to any token definition. These are C++
  14. functions or function objects that are called whenever a token definition
  15. successfully recognizes a portion of the input. Say you have a token definition
  16. `D`, and a C++ function `f`, you can make the lexer call `f` whenever it matches
  17. an input by attaching `f`:
  18. D[f]
  19. The expression above links `f` to the token definition, `D`. The required
  20. prototype of `f` is:
  21. void f (Iterator& start, Iterator& end, pass_flag& matched, Idtype& id, Context& ctx);
  22. [variablelist where:
  23. [[`Iterator& start`] [This is the iterator pointing to the begin of the
  24. matched range in the underlying input sequence. The
  25. type of the iterator is the same as specified while
  26. defining the type of the `lexertl::actor_lexer<...>`
  27. (its first template parameter). The semantic action
  28. is allowed to change the value of this iterator
  29. influencing, the matched input sequence.]]
  30. [[`Iterator& end`] [This is the iterator pointing to the end of the
  31. matched range in the underlying input sequence. The
  32. type of the iterator is the same as specified while
  33. defining the type of the `lexertl::actor_lexer<...>`
  34. (its first template parameter). The semantic action
  35. is allowed to change the value of this iterator
  36. influencing, the matched input sequence.]]
  37. [[`pass_flag& matched`] [This value is pre/initialized to `pass_normal`.
  38. If the semantic action sets it to `pass_fail` this
  39. behaves as if the token has not been matched in
  40. the first place. If the semantic action sets this
  41. to `pass_ignore` the lexer ignores the current
  42. token and tries to match a next token from the
  43. input.]]
  44. [[`Idtype& id`] [This is the token id of the type Idtype (most of
  45. the time this will be a `std::size_t`) for the
  46. matched token. The semantic action is allowed to
  47. change the value of this token id, influencing the
  48. if of the created token.]]
  49. [[`Context& ctx`] [This is a reference to a lexer specific,
  50. unspecified type, providing the context for the
  51. current lexer state. It can be used to access
  52. different internal data items and is needed for
  53. lexer state control from inside a semantic
  54. action.]]
  55. ]
  56. When using a C++ function as the semantic action the following prototypes are
  57. allowed as well:
  58. void f (Iterator& start, Iterator& end, pass_flag& matched, Idtype& id);
  59. void f (Iterator& start, Iterator& end, pass_flag& matched);
  60. void f (Iterator& start, Iterator& end);
  61. void f ();
  62. [important In order to use lexer semantic actions you need to use type
  63. `lexertl::actor_lexer<>` as your lexer class (instead of the
  64. type `lexertl::lexer<>` as described in earlier examples).]
  65. [heading The context of a lexer semantic action]
  66. The last parameter passed to any lexer semantic action is a reference to an
  67. unspecified type (see the `Context` type in the table above). This type is
  68. unspecified because it depends on the token type returned by the lexer. It is
  69. implemented in the internals of the iterator type exposed by the lexer.
  70. Nevertheless, any context type is expected to expose a couple of
  71. functions allowing to influence the behavior of the lexer. The following table
  72. gives an overview and a short description of the available functionality.
  73. [table Functions exposed by any context passed to a lexer semantic action
  74. [[Name] [Description]]
  75. [[`Iterator const& get_eoi() const`]
  76. [The function `get_eoi()` may be used by to access the end iterator of
  77. the input stream the lexer has been initialized with]]
  78. [[`void more()`]
  79. [The function `more()` tells the lexer that the next time it matches a
  80. rule, the corresponding token should be appended onto the current token
  81. value rather than replacing it.]]
  82. [[`Iterator const& less(Iterator const& it, int n)`]
  83. [The function `less()` returns an iterator positioned to the nth input
  84. character beyond the current token start iterator (i.e. by passing the
  85. return value to the parameter `end` it is possible to return all but the
  86. first n characters of the current token back to the input stream.]]
  87. [[`bool lookahead(std::size_t id)`]
  88. [The function `lookahead()` can be used to implement lookahead for lexer
  89. engines not supporting constructs like flex' `a/b`
  90. (match `a`, but only when followed by `b`). It invokes the lexer on the
  91. input following the current token without actually moving forward in the
  92. input stream. The function returns whether the lexer was able to match a
  93. token with the given token-id `id`.]]
  94. [[`std::size_t get_state() const` and `void set_state(std::size_t state)`]
  95. [The functions `get_state()` and `set_state()` may be used to introspect
  96. and change the current lexer state.]]
  97. [[`token_value_type get_value() const` and `void set_value(Value const&)`]
  98. [The functions `get_value()` and `set_value()` may be used to introspect
  99. and change the current token value.]]
  100. ]
  101. [heading Lexer Semantic Actions Using Phoenix]
  102. Even if it is possible to write your own function object implementations (i.e.
  103. using Boost.Lambda or Boost.Bind), the preferred way of defining lexer semantic
  104. actions is to use __phoenix__. In this case you can access the parameters
  105. described above by using the predefined __spirit__ placeholders:
  106. [table Predefined Phoenix placeholders for lexer semantic actions
  107. [[Placeholder] [Description]]
  108. [[`_start`]
  109. [Refers to the iterator pointing to the beginning of the matched input
  110. sequence. Any modifications to this iterator value will be reflected in
  111. the generated token.]]
  112. [[`_end`]
  113. [Refers to the iterator pointing past the end of the matched input
  114. sequence. Any modifications to this iterator value will be reflected in
  115. the generated token.]]
  116. [[`_pass`]
  117. [References the value signaling the outcome of the semantic action. This
  118. is pre-initialized to `lex::pass_flags::pass_normal`. If this is set to
  119. `lex::pass_flags::pass_fail`, the lexer will behave as if no token has
  120. been matched, if is set to `lex::pass_flags::pass_ignore`, the lexer will
  121. ignore the current match and proceed trying to match tokens from the
  122. input.]]
  123. [[`_tokenid`]
  124. [Refers to the token id of the token to be generated. Any modifications
  125. to this value will be reflected in the generated token.]]
  126. [[`_val`]
  127. [Refers to the value the next token will be initialized from. Any
  128. modifications to this value will be reflected in the generated token.]]
  129. [[`_state`]
  130. [Refers to the lexer state the input has been match in. Any modifications
  131. to this value will be reflected in the lexer itself (the next match will
  132. start in the new state). The currently generated token is not affected
  133. by changes to this variable.]]
  134. [[`_eoi`]
  135. [References the end iterator of the overall lexer input. This value
  136. cannot be changed.]]
  137. ]
  138. The context object passed as the last parameter to any lexer semantic action is
  139. not directly accessible while using __phoenix__ expressions. We rather provide
  140. predefined Phoenix functions allowing to invoke the different support functions
  141. as mentioned above. The following table lists the available support functions
  142. and describes their functionality:
  143. [table Support functions usable from Phoenix expressions inside lexer semantic actions
  144. [[Plain function] [Phoenix function] [Description]]
  145. [[`ctx.more()`]
  146. [`more()`]
  147. [The function `more()` tells the lexer that the next time it matches a
  148. rule, the corresponding token should be appended onto the current token
  149. value rather than replacing it.]]
  150. [[`ctx.less()`]
  151. [`less(n)`]
  152. [The function `less()` takes a single integer parameter `n` and returns an
  153. iterator positioned to the nth input character beyond the current token
  154. start iterator (i.e. by assigning the return value to the placeholder
  155. `_end` it is possible to return all but the first `n` characters of the
  156. current token back to the input stream.]]
  157. [[`ctx.lookahead()`]
  158. [`lookahead(std::size_t)` or `lookahead(token_def)`]
  159. [The function `lookahead()` takes a single parameter specifying the token
  160. to match in the input. The function can be used for instance to implement
  161. lookahead for lexer engines not supporting constructs like flex' `a/b`
  162. (match `a`, but only when followed by `b`). It invokes the lexer on the
  163. input following the current token without actually moving forward in the
  164. input stream. The function returns whether the lexer was able to match
  165. the specified token.]]
  166. ]
  167. [endsect]