character_sets.html 8.6 KB

123456789101112131415161718192021222324252627282930313233343536373839404142434445464748495051525354555657585960616263646566676869707172737475767778798081828384858687888990919293949596979899100101102103104105106107108109110111112113114115116117118119120121122123124125126127128129130131132133134135136137138139140141142143144145146147148149150151152153154155156157158
  1. <html>
  2. <head>
  3. <title>Character Sets</title>
  4. <meta http-equiv="Content-Type" content="text/html; charset=iso-8859-1">
  5. <link rel="stylesheet" href="theme/style.css" type="text/css">
  6. </head>
  7. <body>
  8. <table width="100%" border="0" background="theme/bkd2.gif" cellspacing="2">
  9. <tr>
  10. <td width="10">
  11. </td>
  12. <td width="85%">
  13. <font size="6" face="Verdana, Arial, Helvetica, sans-serif"><b>Character Sets</b></font>
  14. </td>
  15. <td width="112"><a href="http://spirit.sf.net"><img src="theme/spirit.gif" width="112" height="48" align="right" border="0"></a></td>
  16. </tr>
  17. </table>
  18. <br>
  19. <table border="0">
  20. <tr>
  21. <td width="10"></td>
  22. <td width="30"><a href="../index.html"><img src="theme/u_arr.gif" border="0"></a></td>
  23. <td width="30"><a href="loops.html"><img src="theme/l_arr.gif" border="0"></a></td>
  24. <td width="30"><a href="confix.html"><img src="theme/r_arr.gif" border="0"></a></td>
  25. </tr>
  26. </table>
  27. <p>The character set <tt>chset</tt> matches a set of characters over a finite
  28. range bounded by the limits of its template parameter <tt>CharT</tt>. This class
  29. is an optimization of a parser that acts on a set of single characters. The
  30. template class is parameterized by the character type <tt>CharT</tt> and can
  31. work efficiently with 8, 16 and 32 and even 64 bit characters.</p>
  32. <pre><span class=identifier> </span><span class=keyword>template </span><span class=special>&lt;</span><span class=keyword>typename </span><span class=identifier>CharT </span><span class=special>= </span><span class=keyword>char</span><span class=special>&gt;
  33. </span><span class=keyword>class </span><span class=identifier>chset</span><span class=special>;</span></pre>
  34. <p>The <tt>chset</tt> is constructed from literals (e.g. <tt>'x'</tt>), <tt>ch_p</tt>
  35. or <tt>chlit&lt;&gt;</tt>, <tt>range_p</tt> or <tt>range&lt;&gt;</tt>, <tt>anychar_p</tt>
  36. and <tt>nothing_p</tt> (see <a href="primitives.html">primitives</a>) or copy-constructed
  37. from another <tt>chset</tt>. The <tt>chset</tt> class uses a copy-on-write scheme
  38. that enables instances to be passed along easily by value.</p>
  39. <table width="80%" border="0" align="center">
  40. <tr>
  41. <td class="note_box"><img src="theme/lens.gif" width="15" height="16"> <b>Sparse
  42. bit vectors</b><br>
  43. <br>
  44. To accomodate 16/32 and 64 bit characters, the <tt>chset</tt> class
  45. statically switches from a <tt>std::bitset</tt> implementation when the
  46. character type is not greater than 8 bits, to a sparse bit/boolean set which
  47. uses a sorted vector of disjoint ranges (<tt>range_run</tt>). The set is
  48. constructed from ranges such that adjacent or overlapping ranges are coalesced.<br>
  49. <br>
  50. range_runs are very space-economical in situations where there are lots
  51. of ranges and a few individual disjoint values. Searching is O(log n) where
  52. n is the number of ranges.</td>
  53. </tr>
  54. </table>
  55. <p> Examples:<br>
  56. </p>
  57. <pre><span class=identifier> </span><span class=identifier>chset</span><span class=special>&lt;&gt; </span><span class=identifier>s1</span><span class=special>(</span><span class=literal>'x'</span><span class=special>);
  58. </span><span class=identifier>chset</span><span class=special>&lt;&gt; </span><span class=identifier>s2</span><span class=special>(</span><span class=identifier>anychar_p </span><span class=special>- </span><span class=identifier>s1</span><span class=special>);</span></pre>
  59. <p>Optionally, character sets may also be constructed using a definition string
  60. following a syntax that resembles posix style regular expression character sets,
  61. except that double quotes delimit the set elements instead of square brackets
  62. and there is no special negation <tt>^</tt> character.</p>
  63. <pre> <span class=identifier>range </span><span class=special>= </span><span class=identifier>anychar_p </span><span class=special>&gt;&gt; </span><span class=literal>'-' </span><span class=special>&gt;&gt; </span><span class=identifier>anychar_p</span><span class=special>;
  64. </span><span class=identifier>set </span><span class=special>= *(</span><span class=identifier>range_p </span><span class=special>| </span><span class=identifier>anychar_p</span><span class=special>);</span></pre>
  65. <p>Since we are defining the set using a C string, the usual C/C++ literal string
  66. syntax rules apply. Examples:<br>
  67. </p>
  68. <pre> <span class=identifier>chset</span><span class=special>&lt;&gt; </span><span class=identifier>s1</span><span class=special>(</span><span class=string>&quot;a-zA-Z&quot;</span><span class=special>); </span><span class=comment>// alphabetic characters
  69. </span><span class=identifier>chset</span><span class=special>&lt;&gt; </span><span class=identifier>s2</span><span class=special>(</span><span class=string>&quot;0-9a-fA-F&quot;</span><span class=special>); </span><span class=comment>// hexadecimal characters
  70. </span><span class=identifier>chset</span><span class=special>&lt;&gt; </span><span class=identifier>s3</span><span class=special>(</span><span class=string>&quot;actgACTG&quot;</span><span class=special>); </span><span class=comment>// DNA identifiers
  71. </span><span class=identifier>chset</span><span class=special>&lt;&gt; </span><span class=identifier>s4</span><span class=special>(</span><span class=string>&quot;\x7f\x7e&quot;</span><span class=special>); </span><span class=comment>// Hexadecimal 0x7F and 0x7E</span></pre>
  72. <p>The standard Spirit set operators apply (see <a href="operators.html">operators</a>)
  73. plus an additional character-set-specific inverse (negation <tt>~</tt>) operator:<span class=comment></span></p>
  74. <table width="90%" border="0" align="center">
  75. <tr>
  76. <td class="table_title" colspan="2">Character set operators</td>
  77. </tr>
  78. <tr>
  79. <td class="table_cells" width="28%"><b>~a</b></td>
  80. <td class="table_cells" width="72%">Set inverse</td>
  81. </tr>
  82. <tr>
  83. <td class="table_cells" width="28%"><b>a | b</b></td>
  84. <td class="table_cells" width="72%">Set union</td>
  85. </tr>
  86. <tr>
  87. <td class="table_cells" width="28%"><b>a &amp; </b></td>
  88. <td class="table_cells" width="72%">Set intersection</td>
  89. </tr>
  90. <tr>
  91. <td class="table_cells" width="28%"><b>a - b</b></td>
  92. <td class="table_cells" width="72%">Set difference</td>
  93. </tr>
  94. <tr>
  95. <td class="table_cells" width="28%"><b>a ^ b</b></td>
  96. <td class="table_cells" width="72%">Set xor</td>
  97. </tr>
  98. </table>
  99. <p></p>
  100. <p></p>
  101. <p></p>
  102. <p></p>
  103. <p></p>
  104. <p></p>
  105. <p></p>
  106. <p></p>
  107. <p>where operands a and b are both <tt>chsets</tt> or one of the operand is either
  108. a literal character, <tt>ch_p</tt> or <tt>chlit</tt>, <tt>range_p</tt> or <tt>range</tt>,
  109. <tt>anychar_p</tt> or <tt>nothing_p</tt>. Special optimized overloads are provided
  110. for <tt>anychar_p</tt> and <tt>nothing_p</tt> operands. A <tt>nothing_p</tt>
  111. operand is converted to an empty set, while an <tt>anychar_p</tt> operand is
  112. converted to a set having elements of the full range of the character type used
  113. (e.g. 0-255 for unsigned 8 bit chars).</p>
  114. <p>A special case is <tt>~anychar_p</tt> which yields <tt>nothing_p</tt>, but
  115. <tt>~nothing_p</tt> is illegal. Inversion of <tt>anychar_p</tt> is asymmetrical,
  116. a one-way trip comparable to converting <tt>T*</tt> to a <tt>void*.</tt></p>
  117. <table width="90%" border="0" align="center">
  118. <tr>
  119. <td class="table_title" colspan="2">Special conversions</td>
  120. </tr>
  121. <tr>
  122. <td class="table_cells" width="28%"><b>chset&lt;CharT&gt;(nothing_p)</b></td>
  123. <td class="table_cells" width="72%">empty set</td>
  124. </tr>
  125. <tr>
  126. <td class="table_cells" width="28%"><b>chset&lt;CharT&gt;(anychar_p)</b></td>
  127. <td class="table_cells" width="72%">full range of CharT (e.g. 0-255 for unsigned
  128. 8 bit chars)</td>
  129. </tr>
  130. <tr>
  131. <td class="table_cells" width="28%"><b>~anychar_p</b></td>
  132. <td class="table_cells" width="72%">nothing_p</td>
  133. </tr>
  134. <tr>
  135. <td class="table_cells" width="28%"><b>~nothing_p</b></td>
  136. <td class="table_cells" width="72%">illegal</td>
  137. </tr>
  138. </table>
  139. <p></p><table border="0">
  140. <tr>
  141. <td width="10"></td>
  142. <td width="30"><a href="../index.html"><img src="theme/u_arr.gif" border="0"></a></td>
  143. <td width="30"><a href="loops.html"><img src="theme/l_arr.gif" border="0"></a></td>
  144. <td width="30"><a href="confix.html"><img src="theme/r_arr.gif" border="0"></a></td>
  145. </tr>
  146. </table>
  147. <br>
  148. <hr size="1">
  149. <p class="copyright">Copyright &copy; 1998-2003 Joel de Guzman<br>
  150. <br>
  151. <font size="2">Use, modification and distribution is subject to the Boost Software
  152. License, Version 1.0. (See accompanying file LICENSE_1_0.txt or copy at
  153. http://www.boost.org/LICENSE_1_0.txt) </font> </p>
  154. </body>
  155. </html>