quick_start.html 21 KB

123456789101112131415161718192021222324252627282930313233343536373839404142434445464748495051525354555657585960616263646566676869707172737475767778798081828384858687888990919293949596979899100101102103104105106107108109110111112113114115116117118119120121122123124125126127128129130131132133134135136137138139140141142143144145146147148149150151152153154155156157158159160161162163164165166167168169170171172173174175176177178179180181182183184185186187188189190191192193194195196197198199200201202203204205206207208209210211212213214215216217218219220221222223224225226227228229230231232233234235236237238239240241242243244245246247248249250251252253254255256257258259260261262263264265266267268269270271272273274275276277278279280281282283284285286287288289290291292293294295296297298299300301302303304305306307308309310311312313314315316317318319320321322323324325326327328329330331332333334335336337338339340341342343344345346347348349350351352353354355356357358359360361362363364365366367368369370371372373374375376377378379380381382383384385386387388389390391392393394395396397398399400401402403404405406407408409410411412413414415416417418419420421422423424425426427428429430431432433434435436437438439440441442443444445446447448449450451452453454455456457458459460461462
  1. <!DOCTYPE html PUBLIC "-//W3C//DTD HTML 4.01 Transitional//EN">
  2. <html>
  3. <head>
  4. <meta content=
  5. "HTML Tidy for Windows (vers 1st February 2003), see www.w3.org"
  6. name="generator">
  7. <title>
  8. Quick Start
  9. </title>
  10. <meta http-equiv="Content-Type" content="text/html; charset=us-ascii">
  11. <link rel="stylesheet" href="theme/style.css" type="text/css">
  12. </head>
  13. <body>
  14. <table width="100%" border="0" background="theme/bkd2.gif" cellspacing="2">
  15. <tr>
  16. <td width="10"></td>
  17. <td width="85%">
  18. <font size="6" face="Verdana, Arial, Helvetica, sans-serif"><b>Quick
  19. Start</b></font>
  20. </td>
  21. <td width="112">
  22. <a href="http://spirit.sf.net"><img src="theme/spirit.gif"
  23. width="112" height="48" align="right" border="0"></a>
  24. </td>
  25. </tr>
  26. </table><br>
  27. <table border="0">
  28. <tr>
  29. <td width="10"></td>
  30. <td width="30">
  31. <a href="../index.html"><img src="theme/u_arr.gif" border="0"></a>
  32. </td>
  33. <td width="30">
  34. <a href="introduction.html"><img src="theme/l_arr.gif" border="0">
  35. </a>
  36. </td>
  37. <td width="30">
  38. <a href="basic_concepts.html"><img src="theme/r_arr.gif" border="0">
  39. </a>
  40. </td>
  41. </tr>
  42. </table>
  43. <h2>
  44. <b>Why would you want to use Spirit?</b>
  45. </h2>
  46. <p>
  47. Spirit is designed to be a practical parsing tool. At the very least, the
  48. ability to generate a fully-working parser from a formal EBNF
  49. specification inlined in C++ significantly reduces development time.
  50. While it may be practical to use a full-blown, stand-alone parser such as
  51. YACC or ANTLR when we want to develop a computer language such as C or
  52. Pascal, it is certainly overkill to bring in the big guns when we wish to
  53. write extremely small micro-parsers. At that end of the spectrum,
  54. programmers typically approach the job at hand not as a formal parsing
  55. task but through ad hoc hacks using primitive tools such as
  56. <tt>scanf</tt>. True, there are tools such as regular-expression
  57. libraries (such as <a href=
  58. "http://www.boost.org/libs/regex/index.html">boost regex</a>) or scanners
  59. (such as <a href="http://www.boost.org/libs/tokenizer/index.html">boost
  60. tokenizer</a>), but these tools do not scale well when we need to write
  61. more elaborate parsers. Attempting to write even a moderately-complex
  62. parser using these tools leads to code that is hard to understand and
  63. maintain.
  64. </p>
  65. <p>
  66. One prime objective is to make the tool easy to use. When one thinks of a
  67. parser generator, the usual reaction is "it must be big and complex with
  68. a steep learning curve." Not so. Spirit is designed to be fully scalable.
  69. The framework is structured in layers. This permits learning on an
  70. as-needed basis, after only learning the minimal core and basic concepts.
  71. </p>
  72. <p>
  73. For development simplicity and ease in deployment, the entire framework
  74. consists of only header files, with no libraries to link against or
  75. build. Just put the spirit distribution in your include path, compile and
  76. run. Code size? -very tight. In the quick start example that we shall
  77. present in a short while, the code size is dominated by the instantiation
  78. of the <tt>std::vector</tt> and <tt>std::iostream</tt>.
  79. </p>
  80. <h2>
  81. <b>Trivial Example #1</b></h2>
  82. <p>Create a parser that will parse
  83. a floating-point number.
  84. </p>
  85. <pre><code><font color="#000000"> </font></code><span class="identifier">real_p</span>
  86. </pre>
  87. <p>
  88. (You've got to admit, that's trivial!) The above code actually generates
  89. a Spirit <tt>real_parser</tt> (a built-in parser) which parses a floating
  90. point number. Take note that parsers that are meant to be used directly
  91. by the user end with "<tt>_p</tt>" in their names as a Spirit convention.
  92. Spirit has many pre-defined parsers and consistent naming conventions
  93. help you keep from going insane!
  94. </p>
  95. <h2>
  96. <b>Trivial Example #2</b></h2>
  97. <p>
  98. Create a parser that will accept a line consisting of two floating-point
  99. numbers.
  100. </p>
  101. <pre><code><font color="#000000"> </font></code><code><span class=
  102. "identifier">real_p</span> <span class=
  103. "special">&gt;&gt;</span> <span class="identifier">real_p</span></code>
  104. </pre>
  105. <p>
  106. Here you see the familiar floating-point numeric parser
  107. <code><tt>real_p</tt></code> used twice, once for each number. What's
  108. that <tt class="operators">&gt;&gt;</tt> operator doing in there? Well,
  109. they had to be separated by something, and this was chosen as the
  110. "followed by" sequence operator. The above program creates a parser from
  111. two simpler parsers, glueing them together with the sequence operator.
  112. The result is a parser that is a composition of smaller parsers.
  113. Whitespace between numbers can implicitly be consumed depending on how
  114. the parser is invoked (see below).
  115. </p>
  116. <p>
  117. Note: when we combine parsers, we end up with a "bigger" parser, But it's
  118. still a parser. Parsers can get bigger and bigger, nesting more and more,
  119. but whenever you glue two parsers together, you end up with one bigger
  120. parser. This is an important concept.
  121. </p>
  122. <h2>
  123. <b>Trivial Example #3</b></h2>
  124. <p>
  125. Create a parser that will accept an arbitrary number of floating-point
  126. numbers. (Arbitrary means anything from zero to infinity)
  127. </p>
  128. <pre><code><font color="#000000"> </font></code><code><span class=
  129. "special">*</span><span class="identifier">real_p</span></code>
  130. </pre>
  131. <p>
  132. This is like a regular-expression Kleene Star, though the syntax might
  133. look a bit odd for a C++ programmer not used to seeing the <tt class=
  134. "operators">*</tt> operator overloaded like this. Actually, if you know
  135. regular expressions it may look odd too since the star is <b>before</b>
  136. the expression it modifies. C'est la vie. Blame it on the fact that we
  137. must work with the syntax rules of C++.
  138. </p>
  139. <p>
  140. Any expression that evaluates to a parser may be used with the Kleene
  141. Star. Keep in mind, though, that due to C++ operator precedence rules you
  142. may need to put the expression in parentheses for complex expressions.
  143. The Kleene Star is also known as a Kleene Closure, but we call it the
  144. Star in most places.
  145. </p>
  146. <h3>
  147. <b><a name="list_of_numbers"></a> Example #4 [ A Just Slightly Less Trivial Example</b>
  148. ] </h3>
  149. <p>
  150. This example will create a parser that accepts a comma-delimited list of numbers and put the numbers in a vector.
  151. </p>
  152. <h4><strong> Step 1. Create the parser</strong></h4>
  153. <pre><code><font color="#000000"> </font></code><code><span class=
  154. "identifier">real_p</span> <span class=
  155. "special">&gt;&gt;</span> <span class="special">*(</span><span class=
  156. "identifier">ch_p</span><span class="special">(</span><span class=
  157. "literal">','</span><span class="special">)</span> <span class=
  158. "special">&gt;&gt;</span> <span class=
  159. "identifier">real_p</span><span class="special">)</span></code>
  160. </pre>
  161. <p>
  162. Notice <tt>ch_p(',')</tt>. It is a literal character parser that can
  163. recognize the comma <tt>','</tt>. In this case, the Kleene Star is
  164. modifying a more complex parser, namely, the one generated by the
  165. expression:
  166. </p>
  167. <pre><code><font color="#000000"> </font></code><code><span class=
  168. "special">(</span><span class="identifier">ch_p</span><span class=
  169. "special">(</span><span class="literal">','</span><span class=
  170. "special">)</span> <span class="special">&gt;&gt;</span> <span class=
  171. "identifier">real_p</span><span class="special">)</span></code>
  172. </pre>
  173. <p>
  174. Note that this is a case where the parentheses are necessary. The Kleene
  175. star encloses the complete expression above.
  176. </p>
  177. <h4>
  178. <b><strong>Step 2. </strong>Using a Parser (now that it's created)</b></h4>
  179. <p>
  180. Now that we have created a parser, how do we use it? Like the result of
  181. any C++ temporary object, we can either store it in a variable, or call
  182. functions directly on it.
  183. </p>
  184. <p>
  185. We'll gloss over some low-level C++ details and just get to the good
  186. stuff.
  187. </p>
  188. <p>
  189. If <b><tt>r</tt></b> is a rule (don't worry about what rules exactly are
  190. for now. This will be discussed later. Suffice it to say that the rule is
  191. a placeholder variable that can hold a parser), then we store the parser
  192. as a rule like this:
  193. </p>
  194. <pre><code><font color="#000000"> </font></code><code><font color="#000000"><span class=
  195. "identifier">r</span> <span class="special">=</span> <span class=
  196. "identifier">real_p</span> <span class=
  197. "special">&gt;&gt; *(</span><span class=
  198. "identifier">ch_p</span><span class="special">(</span><span class=
  199. "literal">','</span><span class="special">) &gt;&gt;</span> <span class=
  200. "identifier">real_p</span><span class="special">);</span></font></code>
  201. </pre>
  202. <p>
  203. Not too exciting, just an assignment like any other C++ expression you've
  204. used for years. The cool thing about storing a parser in a rule is this:
  205. rules are parsers, and now you can refer to it <b>by name</b>. (In this
  206. case the name is <tt><b>r</b></tt>). Notice that this is now a full
  207. assignment expression, thus we terminate it with a semicolon,
  208. "<tt>;</tt>".
  209. </p>
  210. <p>
  211. That's it. We're done with defining the parser. So the next step is now
  212. invoking this parser to do its work. There are a couple of ways to do
  213. this. For now, we shall use the free <tt>parse</tt> function that takes
  214. in a <tt>char const*</tt>. The function accepts three arguments:
  215. </p>
  216. <blockquote>
  217. <p>
  218. <img src="theme/bullet.gif" width="12" height="12"> The null-terminated
  219. <tt>const char*</tt> input<br>
  220. <img src="theme/bullet.gif" width="12" height="12"> The parser
  221. object<br>
  222. <img src="theme/bullet.gif" width="12" height="12"> Another parser
  223. called the <b>skip parser</b>
  224. </p>
  225. </blockquote>
  226. <p>
  227. In our example, we wish to skip spaces and tabs. Another parser named
  228. <tt>space_p</tt> is included in Spirit's repertoire of predefined
  229. parsers. It is a very simple parser that simply recognizes whitespace. We
  230. shall use <tt>space_p</tt> as our skip parser. The skip parser is the one
  231. responsible for skipping characters in between parser elements such as
  232. the <tt>real_p</tt> and the <tt>ch_p</tt>.
  233. </p>
  234. <p>
  235. Ok, so now let's parse!
  236. </p>
  237. <pre><code><font color="#000000"> </font></code><code><font color="#000000"><span class=
  238. "identifier">r</span> <span class="special">=</span> <span class=
  239. "identifier">real_p</span> <span class=
  240. "special">&gt;&gt;</span> <span class="special">*(</span><span class=
  241. "identifier">ch_p</span><span class="special">(</span><span class=
  242. "literal">','</span><span class="special">)</span> <span class=
  243. "special">&gt;&gt;</span> <span class=
  244. "identifier">real_p</span><span class="special">);
  245. </span> <span class="identifier"> parse</span><span class=
  246. "special">(</span><span class="identifier">str</span><span class=
  247. "special">,</span> <span class="identifier">r</span><span class=
  248. "special">,</span> <span class="identifier">space_p</span><span class=
  249. "special">)</span> <span class=
  250. "comment">// Not a full statement yet, patience...</span></font></code>
  251. </pre>
  252. <p>
  253. The parse function returns an object (called <tt>parse_info</tt>) that
  254. holds, among other things, the result of the parse. In this example, we
  255. need to know:
  256. </p>
  257. <blockquote>
  258. <p>
  259. <img src="theme/bullet.gif" width="12" height="12"> Did the parser
  260. successfully recognize the input <tt>str</tt>?<br>
  261. <img src="theme/bullet.gif" width="12" height="12"> Did the parser
  262. <b>fully</b> parse and consume the input up to its end?
  263. </p>
  264. </blockquote>
  265. <p>
  266. To get a complete picture of what we have so far, let us also wrap this
  267. parser inside a function:
  268. </p>
  269. <pre><code><font color="#000000"> </font></code><code><font color="#000000"><span class=
  270. "keyword">bool
  271. </span> <span class="identifier"> parse_numbers</span><span class=
  272. "special">(</span><span class="keyword">char</span> <span class=
  273. "keyword">const</span><span class="special">*</span> <span class=
  274. "identifier">str</span><span class="special">)
  275. {
  276. </span> <span class="keyword"> return</span> <span class=
  277. "identifier">parse</span><span class="special">(</span><span class=
  278. "identifier">str</span><span class="special">,</span> <span class=
  279. "identifier">real_p</span> <span class=
  280. "special">&gt;&gt;</span> <span class="special">*(</span><span class=
  281. "literal">','</span> <span class="special">&gt;&gt;</span> <span class=
  282. "identifier">real_p</span><span class="special">),</span> <span class=
  283. "identifier">space_p</span><span class="special">).</span><span class=
  284. "identifier">full</span><span class="special">;
  285. }</span></font></code>
  286. </pre>
  287. <p>
  288. Note in this case we dropped the named rule and inlined the parser
  289. directly in the call to parse. Upon calling parse, the expression
  290. evaluates into a temporary, unnamed parser which is passed into the
  291. parse() function, used, and then destroyed.
  292. </p>
  293. <table border="0" width="80%" align="center">
  294. <tr>
  295. <td class="note_box">
  296. <img src="theme/note.gif" width="16" height="16"><b>char and wchar_t
  297. operands</b><br>
  298. <br>
  299. The careful reader may notice that the parser expression has
  300. <tt class="quotes">','</tt> instead of <tt>ch_p(',')</tt> as the
  301. previous examples did. This is ok due to C++ syntax rules of
  302. conversion. There are <tt>&gt;&gt;</tt> operators that are overloaded
  303. to accept a <tt>char</tt> or <tt>wchar_t</tt> argument on its left or
  304. right (but not both). An operator may be overloaded if at least one
  305. of its parameters is a user-defined type. In this case, the
  306. <tt>real_p</tt> is the 2nd argument to <tt>operator<span class=
  307. "operators">&gt;&gt;</span></tt>, and so the proper overload of
  308. <tt class="operators">&gt;&gt;</tt> is used, converting
  309. <tt class="quotes">','</tt> into a character literal parser.<br>
  310. <br>
  311. The problem with omiting the <tt>ch_p</tt> call should be obvious:
  312. <tt>'a' &gt;&gt; 'b'</tt> is <b>not</b> a spirit parser, it is a
  313. numeric expression, right-shifting the ASCII (or another encoding)
  314. value of <tt class="quotes">'a'</tt> by the ASCII value of
  315. <tt class="quotes">'b'</tt>. However, both <tt>ch_p('a') &gt;&gt;
  316. 'b'</tt> and <tt>'a' &gt;&gt; ch_p('b')</tt> are Spirit sequence
  317. parsers for the letter <tt class="quotes">'a'</tt> followed by
  318. <tt class="quotes">'b'</tt>. You'll get used to it, sooner or
  319. later.
  320. </td>
  321. </tr>
  322. </table>
  323. <p>
  324. Take note that the object returned from the parse function has a member
  325. called <tt>full</tt> which returns true if both of our requirements above
  326. are met (i.e. the parser fully parsed the input).
  327. </p>
  328. <h4>
  329. <b> Step 3. Semantic Actions</b></h4>
  330. <p>
  331. Our parser above is really nothing but a recognizer. It answers the
  332. question <i class="quotes">"did the input match our grammar?"</i>, but it
  333. does not remember any data, nor does it perform any side effects.
  334. Remember: we want to put the parsed numbers into a vector. This is done
  335. in an <b>action</b> that is linked to a particular parser. For example,
  336. whenever we parse a real number, we wish to store the parsed number after
  337. a successful match. We now wish to extract information from the parser.
  338. Semantic actions do this. Semantic actions may be attached to any point
  339. in the grammar specification. These actions are C++ functions or functors
  340. that are called whenever a part of the parser successfully recognizes a
  341. portion of the input. Say you have a parser <b>P</b>, and a C++ function
  342. <b>F</b>, you can make the parser call <b>F</b> whenever it matches an
  343. input by attaching <b>F</b>:
  344. </p>
  345. <pre><code><font color="#000000"> </font></code><code><font color="#000000"><span class=
  346. "identifier">P</span><span class="special">[&amp;</span><span class=
  347. "identifier">F</span><span class="special">]</span></font></code>
  348. </pre>
  349. <p>
  350. Or if <b>F</b> is a function object (a functor):
  351. </p>
  352. <pre><code><font color="#000000"> </font></code><code><font color="#000000"><span class=
  353. "identifier">P</span><span class="special">[</span><span class=
  354. "identifier">F</span><span class="special">]</span></font></code>
  355. </pre>
  356. <p>
  357. The function/functor signature depends on the type of the parser to which
  358. it is attached. The parser <tt>real_p</tt> passes a single argument: the
  359. parsed number. Thus, if we were to attach a function <b>F</b> to
  360. <tt>real_p</tt>, we need <b>F</b> to be declared as:
  361. </p>
  362. <pre><code> </code><code><span class=
  363. "keyword">void</span> <span class="identifier">F</span><span class=
  364. "special">(</span><span class="keyword">double</span> <span class=
  365. "identifier">n</span><span class="special">);</span></code></pre>
  366. <p>
  367. For our example however, again, we can take advantage of some predefined
  368. semantic functors and functor generators (<img src="theme/lens.gif"
  369. width="15" height="16"> A functor generator is a function that returns
  370. a functor). For our purpose, Spirit has a functor generator
  371. <tt>push_back_a(c)</tt>. In brief, this semantic action, when called,
  372. <b>appends</b> the parsed value it receives from the parser it is
  373. attached to, to the container <tt>c</tt>.
  374. </p>
  375. <p>
  376. Finally, here is our complete comma-separated list parser:
  377. </p>
  378. <pre><code><font color="#000000"> </font></code><code><font color="#000000"><span class=
  379. "keyword">bool
  380. </span> <span class="identifier">parse_numbers</span><span class=
  381. "special">(</span><span class="keyword">char</span> <span class=
  382. "keyword">const</span><span class="special">*</span> <span class=
  383. "identifier">str</span><span class="special">,</span> <span class=
  384. "identifier">vector</span><span class="special">&lt;</span><span class=
  385. "keyword">double</span><span class=
  386. "special">&gt;&amp;</span> <span class="identifier">v</span><span class=
  387. "special">)
  388. {
  389. </span> <span class="keyword">return</span> <span class=
  390. "identifier">parse</span><span class="special">(</span><span class=
  391. "identifier">str</span><span class="special">,
  392. </span> <span class="comment"> // Begin grammar
  393. </span> <span class="special"> (
  394. </span> <span class="identifier">real_p</span><span class=
  395. "special">[</span><span class="identifier">push_back_a</span><span class=
  396. "special">(</span><span class="identifier">v</span><span class=
  397. "special">)]</span> <span class="special">&gt;&gt;</span> <span class=
  398. "special">*(</span><span class="literal">','</span> <span class=
  399. "special">&gt;&gt;</span> <span class=
  400. "identifier">real_p</span><span class="special">[</span><span class=
  401. "identifier">push_back_a</span><span class="special">(</span><span class=
  402. "identifier">v</span><span class="special">)])
  403. )
  404. </span> <span class="special"> ,
  405. </span> <span class="comment"> // End grammar
  406. </span> <span class="identifier"> space_p</span><span class=
  407. "special">).</span><span class="identifier">full</span><span class="special">;
  408. }</span></font></code>
  409. </pre>
  410. <p>
  411. This is the same parser as above. This time with appropriate semantic
  412. actions attached to strategic places to extract the parsed numbers and
  413. stuff them in the vector <tt>v</tt>. The parse_numbers function returns
  414. true when successful.
  415. </p>
  416. <p>
  417. <img src="theme/lens.gif" width="15" height="16"> The full source code
  418. can be <a href="../example/fundamental/number_list.cpp">viewed here</a>.
  419. This is part of the Spirit distribution.
  420. </p>
  421. <table border="0">
  422. <tr>
  423. <td width="10"></td>
  424. <td width="30">
  425. <a href="../index.html"><img src="theme/u_arr.gif" border="0"></a>
  426. </td>
  427. <td width="30">
  428. <a href="introduction.html"><img src="theme/l_arr.gif" border="0">
  429. </a>
  430. </td>
  431. <td width="30">
  432. <a href="basic_concepts.html"><img src="theme/r_arr.gif" border="0">
  433. </a>
  434. </td>
  435. </tr>
  436. </table><br>
  437. <hr size="1">
  438. <p class="copyright">
  439. Copyright &copy; 1998-2003 Joel de Guzman<br>
  440. Copyright &copy; 2002 Chris Uzdavinis<br>
  441. <br>
  442. <font size="2">Use, modification and distribution is subject to the
  443. Boost Software License, Version 1.0. (See accompanying file
  444. LICENSE_1_0.txt or copy at http://www.boost.org/LICENSE_1_0.txt)</font>
  445. </p>
  446. <blockquote>&nbsp;
  447. </blockquote>
  448. </body>
  449. </html>