operators.html 14 KB

123456789101112131415161718192021222324252627282930313233343536373839404142434445464748495051525354555657585960616263646566676869707172737475767778798081828384858687888990919293949596979899100101102103104105106107108109110111112113114115116117118119120121122123124125126127128129130131132133134135136137138139140141142143144145146147148149150151152153154155156157158159160161162163164165166167168169170171172173174175176177178179180181182183184185186187188189190191192193194195196197198199200201202203204205206207208209210211212213214215216217218219220221222
  1. <html>
  2. <head>
  3. <title>Operators</title>
  4. <meta http-equiv="Content-Type" content="text/html; charset=iso-8859-1">
  5. <link rel="stylesheet" href="theme/style.css" type="text/css">
  6. </head>
  7. <body>
  8. <table width="100%" border="0" background="theme/bkd2.gif" cellspacing="2">
  9. <tr>
  10. <td width="10">
  11. </td>
  12. <td width="85%">
  13. <font size="6" face="Verdana, Arial, Helvetica, sans-serif"><b>Operators</b></font>
  14. </td>
  15. <td width="112"><a href="http://spirit.sf.net"><img src="theme/spirit.gif" width="112" height="48" align="right" border="0"></a></td>
  16. </tr>
  17. </table>
  18. <br>
  19. <table border="0">
  20. <tr>
  21. <td width="10"></td>
  22. <td width="30"><a href="../index.html"><img src="theme/u_arr.gif" border="0"></a></td>
  23. <td width="30"><a href="primitives.html"><img src="theme/l_arr.gif" border="0"></a></td>
  24. <td width="30"><a href="numerics.html"><img src="theme/r_arr.gif" border="0"></a></td>
  25. </tr>
  26. </table>
  27. <p>Operators are used as a means for object composition and embedding. Simple
  28. parsers may be composed to form composites through operator overloading, crafted
  29. to approximate the syntax of an Extended Backus-Normal Form (EBNF) variant.
  30. An expression such as:</p>
  31. <pre><code><font color="#000000"> <span class=identifier>a </span><span class=special>| </span><span class=identifier>b</span></font></code></pre>
  32. <p>actually yields a new parser type which is a composite of its operands, a and
  33. b. Taking this example further, if a and b were of type <tt>chlit</tt>&lt;&gt;,
  34. the result would have the composite type:</p>
  35. <pre><code><font color="#000000"> <span class=identifier>alternative</span><span class=special>&lt;</span><span class=identifier>chlit</span><span class=special>&lt;&gt;, </span><span class=identifier>chlit</span><span class=special>&lt;&gt; </span><span class=special>&gt;</span></font></code></pre>
  36. <p> In general, for any binary operator, it will take its two arguments, parser1
  37. and parser2, and create a new composed parser of the form</p>
  38. <pre><code><font color="#000000"> <span class=identifier>op</span><span class=special>&lt;</span><span class=identifier>parser1</span><span class=special>, </span><span class=identifier>parser2</span><span class=special>&gt;</span></font></code></pre>
  39. <p>where parser1 and parser2 can be arbitrarily complex parsers themselves, with
  40. the only limitations being what your compiler imposes. </p>
  41. <h3>Set Operators</h3>
  42. <table width="90%" border="0" align="center">
  43. <tr>
  44. <td class="table_title" colspan="3">Set operators</td>
  45. </tr>
  46. <tr>
  47. <td class="table_cells" width="20%"><code><span class=identifier>a </span><span class=special>|
  48. </span><span class=identifier>b</span></code></td>
  49. <td class="table_cells" width="24%">Union</td>
  50. <td class="table_cells" width="56%">Match a or b. Also referred to as alternative</td>
  51. </tr>
  52. <tr>
  53. <td class="table_cells" width="20%"><code><span class=identifier>a </span><span class=special>&
  54. </span><span class=identifier>b</span></code></td>
  55. <td class="table_cells" width="24%">Intersection</td>
  56. <td class="table_cells" width="56%">Match a and b</td>
  57. </tr>
  58. <tr>
  59. <td class="table_cells" width="20%"><code><span class=identifier>a </span><span class=special>-
  60. </span><span class=identifier>b</span></code></td>
  61. <td class="table_cells" width="24%">Difference</td>
  62. <td class="table_cells" width="56%">Match a but not b. If both match and b's
  63. matched text is shorter than a's matched text, a successful match is made</td>
  64. </tr>
  65. <tr>
  66. <td class="table_cells" width="20%"><code><span class=identifier>a </span><span class=special>^
  67. </span><span class=identifier>b</span></code></td>
  68. <td class="table_cells" width="24%">XOR</td>
  69. <td class="table_cells" width="56%">Match a or b, but not both</td>
  70. </tr>
  71. </table>
  72. <p><b>Short-circuiting</b></p>
  73. <p>Alternative operands are tried one by one on a first come first served basis
  74. starting from the leftmost operand. After a successfully matched alternative
  75. is found, the parser concludes its search, essentially short-circuiting the
  76. search for other potentially viable candidates. This short-circuiting implicitly
  77. gives the highest priority to the leftmost alternative.</p>
  78. <p>Short-circuiting is done in the same manner as C or C++'s logical expressions;
  79. e.g. <tt>if</tt> <tt><span class="operators">(</span>x <span class="operators">&lt;</span>
  80. 3 <span class="operators">||</span> y <span class="operators">&lt;</span> 2<span class="operators">)</span></tt>
  81. where, if <tt>x</tt> evaluates to be less than 3, the <tt>y <span class="operators">&lt;</span>
  82. 2</tt> test is not done at all. In addition to providing an implicit priority
  83. rule for alternatives which is necessary, given the non-deterministic nature
  84. of the Spirit parser compiler, short-circuiting improves the execution time.
  85. If the order of your alternatives is logically irrelevant, strive to put the
  86. (expected) most common choice first for maximum efficiency.</p>
  87. <table width="80%" border="0" align="center">
  88. <tr>
  89. <td class="note_box"><img src="theme/lens.gif" width="15" height="16"> <b>Intersections</b><br>
  90. <br>
  91. Some researchers assert that the intersections (e.g. <tt>a &amp; b</tt>)
  92. let us define context sensitive languages (<a href="references.html#intersections">&quot;XBNF&quot;</a>
  93. [citing Leu-Weiner, 1973]). &quot;The theory of defining a language as the
  94. intersection of a finite number of context free languages was developed
  95. by Leu and Weiner in 1973&quot;.<br>
  96. <br>
  97. <b><img src="theme/lens.gif" width="15" height="16"> <b></b>~ Operator</b><br>
  98. <br>
  99. The complement operator <tt>~</tt> was originally put into consideration.
  100. Further understanding of its value and meaning leads us to uncertainty.
  101. The basic problem stems from the fact that <tt>~a</tt> will yield <tt>U-a</tt>,
  102. where <tt>U</tt> is the universal set of all strings. However, where it
  103. makes sense, some parsers can be complemented (see the <a href="primitives.html#negation">primitive
  104. character parsers</a> for examples).</td>
  105. </tr>
  106. </table>
  107. <h3>Sequencing Operators</h3>
  108. <table width="90%" border="0" align="center">
  109. <tr>
  110. <td class="table_title" colspan="3">Sequencing operators</td>
  111. </tr>
  112. <tr>
  113. <td class="table_cells" width="21%"><code><span class=identifier>a </span><span class=special>&gt;&gt;
  114. </span><span class=identifier>b</span></code></td>
  115. <td class="table_cells" width="23%">Sequence</td>
  116. <td class="table_cells" width="56%">Match a and b in sequence</td>
  117. </tr>
  118. <tr>
  119. <td class="table_cells" width="21%"><code><span class=identifier>a </span><span class=special>&&
  120. </span><span class=identifier>b</span></code></td>
  121. <td class="table_cells" width="23%">Sequential-and</td>
  122. <td class="table_cells" width="56%">Sequential-and. Same as above, match a
  123. and b in sequence</td>
  124. </tr>
  125. <tr>
  126. <td class="table_cells" width="21%"><code><span class=identifier>a </span><span class=special>||
  127. </span><span class=identifier>b</span></code></td>
  128. <td class="table_cells" width="23%">Sequential-or</td>
  129. <td class="table_cells" width="56%">Match a or b in sequence</td>
  130. </tr>
  131. </table>
  132. <p>The sequencing operator <tt class="operators">&gt;&gt;</tt> can alternatively
  133. be thought of as the sequential-and operator. The expression <tt>a <span class="operators">&amp;&amp;</span>
  134. b</tt> reads as match a and b in sequence. Continuing this logic, we can also
  135. have a sequential-or operator where the expression <tt>a <span class="operators">||</span>
  136. b</tt> reads as match a or b and in sequence. That is, if both a and b match,
  137. it must be in sequence; this is equivalent to <tt>a &gt;&gt; !b | b</tt>. </p>
  138. <h3>Optional and Loops</h3>
  139. <table width="90%" border="0" align="center">
  140. <tr>
  141. <td class="table_title" colspan="3">Optional and Loops</td>
  142. </tr>
  143. <tr>
  144. <td class="table_cells" width="21%"><code><span class=special>*</span><span class=identifier>a</span></code></td>
  145. <td class="table_cells" width="23%">Kleene star</td>
  146. <td class="table_cells" width="56%">Match a zero (0) or more times</td>
  147. </tr>
  148. <tr>
  149. <td class="table_cells" width="21%"><code><span class=special>+</span><span class=identifier>a</span></code></td>
  150. <td class="table_cells" width="23%">Positive</td>
  151. <td class="table_cells" width="56%">Match a one (1) or more times</td>
  152. </tr>
  153. <tr>
  154. <td class="table_cells" width="21%"><code><span class=special>!</span><span class=identifier>a</span></code></td>
  155. <td class="table_cells" width="23%">Optional</td>
  156. <td class="table_cells" width="56%">Match a zero (0) or one (1) time</td>
  157. </tr>
  158. <tr>
  159. <td class="table_cells" width="21%"><code><span class=identifier>a </span><span class=special>%
  160. </span><span class=identifier>b</span></code></td>
  161. <td class="table_cells" width="23%">List</td>
  162. <td class="table_cells" width="56%">Match a list of one or more repetitions
  163. of a separated by occurrences of b. This is the same as <tt>a &gt;&gt; *(b
  164. &gt;&gt; a)</tt>. Note that <tt>a</tt> must not also match <tt>b</tt></td>
  165. </tr>
  166. </table>
  167. <p><img src="theme/note.gif" width="16" height="16"> If we look more closely,
  168. take note that we generalized the optional expression of the form <tt>!a</tt>
  169. in the same category as loops. This is logical, considering that the optional
  170. matches the expression following it zero (0) or one (1) time. </p>
  171. <p><b>Primitive type operands</b></p>
  172. <p> For binary operators, one of the operands but not both may be a <tt>char</tt>,
  173. <tt> wchar_t</tt>, <tt>char const<span class="operators">*</span></tt> or <tt>wchar_t
  174. const<span class="operators">*</span></tt>. Where P is a parser object, here
  175. are some examples:</p>
  176. <pre><code><span class=identifier> </span><span class=identifier>P </span><span class=special>| </span><span class=literal>'x'
  177. </span><span class=identifier>P </span><span class=special>- </span><span class=identifier>L</span><span class=string>"Hello World"
  178. </span><span class=literal>'x' </span><span class=special>&gt;&gt; </span><span class=identifier>P
  179. </span><span class=string>"bebop" </span><span class=special>&gt;&gt; </span><span class=identifier>P</span></code></pre>
  180. <p>It is important to emphasize that C++ mandates that operators may only be overloaded
  181. if at least one argument is a user-defined type. Typically, in an expression
  182. involving multiple operators, explicitly typing the leftmost operand as a parser
  183. is enough to cause propagation to all the rest of the operands to its right
  184. to be regarded as parsers. Examples:</p>
  185. <pre><code><font color="#000000"><span class=identifier> </span><span class=identifier>r </span><span class=special>= </span><span class=literal>'a' </span><span class=special>| </span><span class=literal>'b' </span><span class=special>| </span><span class=literal>'c' </span><span class=special>| </span><span class=literal>'d'</span><span class=special>; </span><span class=comment>// ill formed
  186. </span><span class=identifier>r </span><span class=special>= </span><span class=identifier>ch_p</span><span class=special>(</span><span class=literal>'a'</span><span class=special>) </span><span class=special>| </span><span class=literal>'b' </span><span class=special>| </span><span class=literal>'c' </span><span class=special>| </span><span class=literal>'d'</span><span class=special>; </span><span class=comment>// OK</span></font></code></pre>
  187. <p>The second case is parsed as follows:</p>
  188. <pre><code><font color="#000000"> r <font color="#0000ff"><img src="theme/arrow.gif"> <span class=special>(((</span><span class=identifier>chlit</span><span class=special>&lt;</span><span class=keyword>char</span><span class=special>&gt; </span><span class=special>| </span><span class=keyword>char</span><span class=special>) </span><span class=special>| </span><span class=keyword>char</span><span class=special>) </span><span class=special>| </span><span class=keyword>char</span><span class=special>)</span></font>
  189. a <font color="#0000ff"><img src="theme/arrow.gif"> <span class=special>(</span><span class=identifier>chlit</span><span class=special>&lt;</span><span class=keyword>char</span><span class=special>&gt; </span><span class=special>| </span><span class=keyword>char</span><span class=special>)</span></font>
  190. r <font color="#0000ff"><img src="theme/arrow.gif"> <span class=special>(((</span><span class=identifier>a</span><span class=special>) </span><span class=special>| </span><span class=keyword>char</span><span class=special>) </span><span class=special>| </span><span class=keyword>char</span><span class=special>)</span></font>
  191. b <font color="#0000ff"><img src="theme/arrow.gif"> <span class=special>(</span><span class=identifier>a </span><span class=special>| </span><span class=keyword>char</span><span class=special>)</span></font>
  192. r <font color="#0000ff"><img src="theme/arrow.gif"> <span class=special>(((</span><span class=identifier>b</span><span class=special>)) </span><span class=special>| </span><span class=keyword>char</span><span class=special>)</span></font>
  193. c <font color="#0000ff"><img src="theme/arrow.gif"> <span class=special>(</span><span class=identifier>b </span><span class=special>| </span><span class=keyword>char</span><span class=special>)</span></font>
  194. r <font color="#0000ff"><img src="theme/arrow.gif"> <span class=special>(((</span><span class=identifier>c</span><span class=special>)))</span></font></font></code></pre>
  195. <p><b>Operator precedence and grouping</b></p>
  196. <p>Since we are defining our meta-language in C++, we follow C/C++'s operator
  197. precedence rules. Grouping expressions inside the parentheses override this
  198. (e.g., <tt><span class="operators">*(</span>a <span class="operators">|</span>
  199. b<span class="operators">)</span></tt> reads: match a or b zero (0) or more
  200. times). </p>
  201. <table border="0">
  202. <tr>
  203. <td width="10"></td>
  204. <td width="30"><a href="../index.html"><img src="theme/u_arr.gif" border="0"></a></td>
  205. <td width="30"><a href="primitives.html"><img src="theme/l_arr.gif" border="0"></a></td>
  206. <td width="30"><a href="numerics.html"><img src="theme/r_arr.gif" border="0"></a></td>
  207. </tr>
  208. </table>
  209. <br>
  210. <hr size="1">
  211. <p class="copyright">Copyright &copy; 1998-2003 Joel de Guzman<br>
  212. <br>
  213. <font size="2">Use, modification and distribution is subject to the Boost Software
  214. License, Version 1.0. (See accompanying file LICENSE_1_0.txt or copy at
  215. http://www.boost.org/LICENSE_1_0.txt) </font> </p>
  216. <p>&nbsp;</p>
  217. </body>
  218. </html>