actions.qbk 19 KB

123456789101112131415161718192021222324252627282930313233343536373839404142434445464748495051525354555657585960616263646566676869707172737475767778798081828384858687888990919293949596979899100101102103104105106107108109110111112113114115116117118119120121122123124125126127128129130131132133134135136137138139140141142143144145146147148149150151152153154155156157158159160161162163164165166167168169170171172173174175176177178179180181182183184185186187188189190191192193194195196197198199200201202203204205206207208209210211212213214215216217218219220221222223224225226227228229230231232233234235236237238239240241242243244245246247248249250251252253254255256257258259260261262263264265266267268269270271272273274275276277278279280281282283284285286287288289290291292293294295296297298299300301302303304305306307308309310311312313314315316317318319320321322323324325326327328329330331332333334335336337338339340341342343344345346347348349350351352353354355356357358359360361362363364365366367368369370371372373374375376377378379380381382383384385386387388389390391392393394395396397398399400401402403404405406407408409410411412413414415416417418419420421422423424425426427428429430431432433434435436437438439440441442443444445446447448449450451452453454455456457458459460461462463464465466467468469470471472473474475476477478479480481482483484485486487488489490491492493494495496497498499500501502503504505506507508509510511512513514515516517518
  1. [/
  2. / Copyright (c) 2008 Eric Niebler
  3. /
  4. / Distributed under the Boost Software License, Version 1.0. (See accompanying
  5. / file LICENSE_1_0.txt or copy at http://www.boost.org/LICENSE_1_0.txt)
  6. /]
  7. [section Semantic Actions and User-Defined Assertions]
  8. [h2 Overview]
  9. Imagine you want to parse an input string and build a `std::map<>` from it. For
  10. something like that, matching a regular expression isn't enough. You want to
  11. /do something/ when parts of your regular expression match. Xpressive lets
  12. you attach semantic actions to parts of your static regular expressions. This
  13. section shows you how.
  14. [h2 Semantic Actions]
  15. Consider the following code, which uses xpressive's semantic actions to parse
  16. a string of word/integer pairs and stuffs them into a `std::map<>`. It is
  17. described below.
  18. #include <string>
  19. #include <iostream>
  20. #include <boost/xpressive/xpressive.hpp>
  21. #include <boost/xpressive/regex_actions.hpp>
  22. using namespace boost::xpressive;
  23. int main()
  24. {
  25. std::map<std::string, int> result;
  26. std::string str("aaa=>1 bbb=>23 ccc=>456");
  27. // Match a word and an integer, separated by =>,
  28. // and then stuff the result into a std::map<>
  29. sregex pair = ( (s1= +_w) >> "=>" >> (s2= +_d) )
  30. [ ref(result)[s1] = as<int>(s2) ];
  31. // Match one or more word/integer pairs, separated
  32. // by whitespace.
  33. sregex rx = pair >> *(+_s >> pair);
  34. if(regex_match(str, rx))
  35. {
  36. std::cout << result["aaa"] << '\n';
  37. std::cout << result["bbb"] << '\n';
  38. std::cout << result["ccc"] << '\n';
  39. }
  40. return 0;
  41. }
  42. This program prints the following:
  43. [pre
  44. 1
  45. 23
  46. 456
  47. ]
  48. The regular expression `pair` has two parts: the pattern and the action. The
  49. pattern says to match a word, capturing it in sub-match 1, and an integer,
  50. capturing it in sub-match 2, separated by `"=>"`. The action is the part in
  51. square brackets: `[ ref(result)[s1] = as<int>(s2) ]`. It says to take sub-match
  52. one and use it to index into the `results` map, and assign to it the result of
  53. converting sub-match 2 to an integer.
  54. [note To use semantic actions with your static regexes, you must
  55. `#include <boost/xpressive/regex_actions.hpp>`]
  56. How does this work? Just as the rest of the static regular expression, the part
  57. between brackets is an expression template. It encodes the action and executes
  58. it later. The expression `ref(result)` creates a lazy reference to the `result`
  59. object. The larger expression `ref(result)[s1]` is a lazy map index operation.
  60. Later, when this action is getting executed, `s1` gets replaced with the
  61. first _sub_match_. Likewise, when `as<int>(s2)` gets executed, `s2` is replaced
  62. with the second _sub_match_. The `as<>` action converts its argument to the
  63. requested type using Boost.Lexical_cast. The effect of the whole action is to
  64. insert a new word/integer pair into the map.
  65. [note There is an important difference between the function `boost::ref()` in
  66. `<boost/ref.hpp>` and `boost::xpressive::ref()` in
  67. `<boost/xpressive/regex_actions.hpp>`. The first returns a plain
  68. `reference_wrapper<>` which behaves in many respects like an ordinary
  69. reference. By contrast, `boost::xpressive::ref()` returns a /lazy/ reference
  70. that you can use in expressions that are executed lazily. That is why we can
  71. say `ref(result)[s1]`, even though `result` doesn't have an `operator[]` that
  72. would accept `s1`.]
  73. In addition to the sub-match placeholders `s1`, `s2`, etc., you can also use
  74. the placeholder `_` within an action to refer back to the string matched by
  75. the sub-expression to which the action is attached. For instance, you can use
  76. the following regex to match a bunch of digits, interpret them as an integer
  77. and assign the result to a local variable:
  78. int i = 0;
  79. // Here, _ refers back to all the
  80. // characters matched by (+_d)
  81. sregex rex = (+_d)[ ref(i) = as<int>(_) ];
  82. [h3 Lazy Action Execution]
  83. What does it mean, exactly, to attach an action to part of a regular expression
  84. and perform a match? When does the action execute? If the action is part of a
  85. repeated sub-expression, does the action execute once or many times? And if the
  86. sub-expression initially matches, but ultimately fails because the rest of the
  87. regular expression fails to match, is the action executed at all?
  88. The answer is that by default, actions are executed /lazily/. When a sub-expression
  89. matches a string, its action is placed on a queue, along with the current
  90. values of any sub-matches to which the action refers. If the match algorithm
  91. must backtrack, actions are popped off the queue as necessary. Only after the
  92. entire regex has matched successfully are the actions actually exeucted. They
  93. are executed all at once, in the order in which they were added to the queue,
  94. as the last step before _regex_match_ returns.
  95. For example, consider the following regex that increments a counter whenever
  96. it finds a digit.
  97. int i = 0;
  98. std::string str("1!2!3?");
  99. // count the exciting digits, but not the
  100. // questionable ones.
  101. sregex rex = +( _d [ ++ref(i) ] >> '!' );
  102. regex_search(str, rex);
  103. assert( i == 2 );
  104. The action `++ref(i)` is queued three times: once for each found digit. But
  105. it is only /executed/ twice: once for each digit that precedes a `'!'`
  106. character. When the `'?'` character is encountered, the match algorithm
  107. backtracks, removing the final action from the queue.
  108. [h3 Immediate Action Execution]
  109. When you want semantic actions to execute immediately, you can wrap the
  110. sub-expression containing the action in a [^[funcref boost::xpressive::keep keep()]].
  111. `keep()` turns off back-tracking for its sub-expression, but it also causes
  112. any actions queued by the sub-expression to execute at the end of the `keep()`.
  113. It is as if the sub-expression in the `keep()` were compiled into an
  114. independent regex object, and matching the `keep()` is like a separate invocation
  115. of `regex_search()`. It matches characters and executes actions but never backtracks
  116. or unwinds. For example, imagine the above example had been written as follows:
  117. int i = 0;
  118. std::string str("1!2!3?");
  119. // count all the digits.
  120. sregex rex = +( keep( _d [ ++ref(i) ] ) >> '!' );
  121. regex_search(str, rex);
  122. assert( i == 3 );
  123. We have wrapped the sub-expression `_d [ ++ref(i) ]` in `keep()`. Now, whenever
  124. this regex matches a digit, the action will be queued and then immediately
  125. executed before we try to match a `'!'` character. In this case, the action
  126. executes three times.
  127. [note Like `keep()`, actions within [^[funcref boost::xpressive::before before()]]
  128. and [^[funcref boost::xpressive::after after()]] are also executed early when their
  129. sub-expressions have matched.]
  130. [h3 Lazy Functions]
  131. So far, we've seen how to write semantic actions consisting of variables and
  132. operators. But what if you want to be able to call a function from a semantic
  133. action? Xpressive provides a mechanism to do this.
  134. The first step is to define a function object type. Here, for instance, is a
  135. function object type that calls `push()` on its argument:
  136. struct push_impl
  137. {
  138. // Result type, needed for tr1::result_of
  139. typedef void result_type;
  140. template<typename Sequence, typename Value>
  141. void operator()(Sequence &seq, Value const &val) const
  142. {
  143. seq.push(val);
  144. }
  145. };
  146. The next step is to use xpressive's `function<>` template to define a function
  147. object named `push`:
  148. // Global "push" function object.
  149. function<push_impl>::type const push = {{}};
  150. The initialization looks a bit odd, but this is because `push` is being
  151. statically initialized. That means it doesn't need to be constructed
  152. at runtime. We can use `push` in semantic actions as follows:
  153. std::stack<int> ints;
  154. // Match digits, cast them to an int
  155. // and push it on the stack.
  156. sregex rex = (+_d)[push(ref(ints), as<int>(_))];
  157. You'll notice that doing it this way causes member function invocations
  158. to look like ordinary function invocations. You can choose to write your
  159. semantic action in a different way that makes it look a bit more like
  160. a member function call:
  161. sregex rex = (+_d)[ref(ints)->*push(as<int>(_))];
  162. Xpressive recognizes the use of the `->*` and treats this expression
  163. exactly the same as the one above.
  164. When your function object must return a type that depends on its
  165. arguments, you can use a `result<>` member template instead of the
  166. `result_type` typedef. Here, for example, is a `first` function object
  167. that returns the `first` member of a `std::pair<>` or _sub_match_:
  168. // Function object that returns the
  169. // first element of a pair.
  170. struct first_impl
  171. {
  172. template<typename Sig> struct result {};
  173. template<typename This, typename Pair>
  174. struct result<This(Pair)>
  175. {
  176. typedef typename remove_reference<Pair>
  177. ::type::first_type type;
  178. };
  179. template<typename Pair>
  180. typename Pair::first_type
  181. operator()(Pair const &p) const
  182. {
  183. return p.first;
  184. }
  185. };
  186. // OK, use as first(s1) to get the begin iterator
  187. // of the sub-match referred to by s1.
  188. function<first_impl>::type const first = {{}};
  189. [h3 Referring to Local Variables]
  190. As we've seen in the examples above, we can refer to local variables within
  191. an actions using `xpressive::ref()`. Any such variables are held by reference
  192. by the regular expression, and care should be taken to avoid letting those
  193. references dangle. For instance, in the following code, the reference to `i`
  194. is left to dangle when `bad_voodoo()` returns:
  195. sregex bad_voodoo()
  196. {
  197. int i = 0;
  198. sregex rex = +( _d [ ++ref(i) ] >> '!' );
  199. // ERROR! rex refers by reference to a local
  200. // variable, which will dangle after bad_voodoo()
  201. // returns.
  202. return rex;
  203. }
  204. When writing semantic actions, it is your responsibility to make sure that
  205. all the references do not dangle. One way to do that would be to make the
  206. variables shared pointers that are held by the regex by value.
  207. sregex good_voodoo(boost::shared_ptr<int> pi)
  208. {
  209. // Use val() to hold the shared_ptr by value:
  210. sregex rex = +( _d [ ++*val(pi) ] >> '!' );
  211. // OK, rex holds a reference count to the integer.
  212. return rex;
  213. }
  214. In the above code, we use `xpressive::val()` to hold the shared pointer by
  215. value. That's not normally necessary because local variables appearing in
  216. actions are held by value by default, but in this case, it is necessary. Had
  217. we written the action as `++*pi`, it would have executed immediately. That's
  218. because `++*pi` is not an expression template, but `++*val(pi)` is.
  219. It can be tedious to wrap all your variables in `ref()` and `val()` in your
  220. semantic actions. Xpressive provides the `reference<>` and `value<>` templates
  221. to make things easier. The following table shows the equivalencies:
  222. [table reference<> and value<>
  223. [[This ...][... is equivalent to this ...]]
  224. [[``int i = 0;
  225. sregex rex = +( _d [ ++ref(i) ] >> '!' );``][``int i = 0;
  226. reference<int> ri(i);
  227. sregex rex = +( _d [ ++ri ] >> '!' );``]]
  228. [[``boost::shared_ptr<int> pi(new int(0));
  229. sregex rex = +( _d [ ++*val(pi) ] >> '!' );``][``boost::shared_ptr<int> pi(new int(0));
  230. value<boost::shared_ptr<int> > vpi(pi);
  231. sregex rex = +( _d [ ++*vpi ] >> '!' );``]]
  232. ]
  233. As you can see, when using `reference<>`, you need to first declare a local
  234. variable and then declare a `reference<>` to it. These two steps can be combined
  235. into one using `local<>`.
  236. [table local<> vs. reference<>
  237. [[This ...][... is equivalent to this ...]]
  238. [[``local<int> i(0);
  239. sregex rex = +( _d [ ++i ] >> '!' );``][``int i = 0;
  240. reference<int> ri(i);
  241. sregex rex = +( _d [ ++ri ] >> '!' );``]]
  242. ]
  243. We can use `local<>` to rewrite the above example as follows:
  244. local<int> i(0);
  245. std::string str("1!2!3?");
  246. // count the exciting digits, but not the
  247. // questionable ones.
  248. sregex rex = +( _d [ ++i ] >> '!' );
  249. regex_search(str, rex);
  250. assert( i.get() == 2 );
  251. Notice that we use `local<>::get()` to access the value of the local
  252. variable. Also, beware that `local<>` can be used to create a dangling
  253. reference, just as `reference<>` can.
  254. [h3 Referring to Non-Local Variables]
  255. In the beginning of this
  256. section, we used a regex with a semantic action to parse a string of
  257. word/integer pairs and stuff them into a `std::map<>`. That required that
  258. the map and the regex be defined together and used before either could
  259. go out of scope. What if we wanted to define the regex once and use it
  260. to fill lots of different maps? We would rather pass the map into the
  261. _regex_match_ algorithm rather than embed a reference to it directly in
  262. the regex object. What we can do instead is define a placeholder and use
  263. that in the semantic action instead of the map itself. Later, when we
  264. call one of the regex algorithms, we can bind the reference to an actual
  265. map object. The following code shows how.
  266. // Define a placeholder for a map object:
  267. placeholder<std::map<std::string, int> > _map;
  268. // Match a word and an integer, separated by =>,
  269. // and then stuff the result into a std::map<>
  270. sregex pair = ( (s1= +_w) >> "=>" >> (s2= +_d) )
  271. [ _map[s1] = as<int>(s2) ];
  272. // Match one or more word/integer pairs, separated
  273. // by whitespace.
  274. sregex rx = pair >> *(+_s >> pair);
  275. // The string to parse
  276. std::string str("aaa=>1 bbb=>23 ccc=>456");
  277. // Here is the actual map to fill in:
  278. std::map<std::string, int> result;
  279. // Bind the _map placeholder to the actual map
  280. smatch what;
  281. what.let( _map = result );
  282. // Execute the match and fill in result map
  283. if(regex_match(str, what, rx))
  284. {
  285. std::cout << result["aaa"] << '\n';
  286. std::cout << result["bbb"] << '\n';
  287. std::cout << result["ccc"] << '\n';
  288. }
  289. This program displays:
  290. [pre
  291. 1
  292. 23
  293. 456
  294. ]
  295. We use `placeholder<>` here to define `_map`, which stands in for a
  296. `std::map<>` variable. We can use the placeholder in the semantic action as if
  297. it were a map. Then, we define a _match_results_ struct and bind an actual map
  298. to the placeholder with "`what.let( _map = result );`". The _regex_match_ call
  299. behaves as if the placeholder in the semantic action had been replaced with a
  300. reference to `result`.
  301. [note Placeholders in semantic actions are not /actually/ replaced at runtime
  302. with references to variables. The regex object is never mutated in any way
  303. during any of the regex algorithms, so they are safe to use in multiple
  304. threads.]
  305. The syntax for late-bound action arguments is a little different if you are
  306. using _regex_iterator_ or _regex_token_iterator_. The regex iterators accept
  307. an extra constructor parameter for specifying the argument bindings. There is
  308. a `let()` function that you can use to bind variables to their placeholders.
  309. The following code demonstrates how.
  310. // Define a placeholder for a map object:
  311. placeholder<std::map<std::string, int> > _map;
  312. // Match a word and an integer, separated by =>,
  313. // and then stuff the result into a std::map<>
  314. sregex pair = ( (s1= +_w) >> "=>" >> (s2= +_d) )
  315. [ _map[s1] = as<int>(s2) ];
  316. // The string to parse
  317. std::string str("aaa=>1 bbb=>23 ccc=>456");
  318. // Here is the actual map to fill in:
  319. std::map<std::string, int> result;
  320. // Create a regex_iterator to find all the matches
  321. sregex_iterator it(str.begin(), str.end(), pair, let(_map=result));
  322. sregex_iterator end;
  323. // step through all the matches, and fill in
  324. // the result map
  325. while(it != end)
  326. ++it;
  327. std::cout << result["aaa"] << '\n';
  328. std::cout << result["bbb"] << '\n';
  329. std::cout << result["ccc"] << '\n';
  330. This program displays:
  331. [pre
  332. 1
  333. 23
  334. 456
  335. ]
  336. [h2 User-Defined Assertions]
  337. You are probably already familiar with regular expression /assertions/. In
  338. Perl, some examples are the [^^] and [^$] assertions, which you can use to
  339. match the beginning and end of a string, respectively. Xpressive lets you
  340. define your own assertions. A custom assertion is a contition which must be
  341. true at a point in the match in order for the match to succeed. You can check
  342. a custom assertion with xpressive's _check_ function.
  343. There are a couple of ways to define a custom assertion. The simplest is to
  344. use a function object. Let's say that you want to ensure that a sub-expression
  345. matches a sub-string that is either 3 or 6 characters long. The following
  346. struct defines such a predicate:
  347. // A predicate that is true IFF a sub-match is
  348. // either 3 or 6 characters long.
  349. struct three_or_six
  350. {
  351. bool operator()(ssub_match const &sub) const
  352. {
  353. return sub.length() == 3 || sub.length() == 6;
  354. }
  355. };
  356. You can use this predicate within a regular expression as follows:
  357. // match words of 3 characters or 6 characters.
  358. sregex rx = (bow >> +_w >> eow)[ check(three_or_six()) ] ;
  359. The above regular expression will find whole words that are either 3 or 6
  360. characters long. The `three_or_six` predicate accepts a _sub_match_ that refers
  361. back to the part of the string matched by the sub-expression to which the
  362. custom assertion is attached.
  363. [note The custom assertion participates in determining whether the match
  364. succeeds or fails. Unlike actions, which execute lazily, custom assertions
  365. execute immediately while the regex engine is searching for a match.]
  366. Custom assertions can also be defined inline using the same syntax as for
  367. semantic actions. Below is the same custom assertion written inline:
  368. // match words of 3 characters or 6 characters.
  369. sregex rx = (bow >> +_w >> eow)[ check(length(_)==3 || length(_)==6) ] ;
  370. In the above, `length()` is a lazy function that calls the `length()` member
  371. function of its argument, and `_` is a placeholder that receives the
  372. `sub_match`.
  373. Once you get the hang of writing custom assertions inline, they can be
  374. very powerful. For example, you can write a regular expression that
  375. only matches valid dates (for some suitably liberal definition of the
  376. term ["valid]).
  377. int const days_per_month[] =
  378. {31, 29, 31, 30, 31, 30, 31, 31, 30, 31, 31, 31};
  379. mark_tag month(1), day(2);
  380. // find a valid date of the form month/day/year.
  381. sregex date =
  382. (
  383. // Month must be between 1 and 12 inclusive
  384. (month= _d >> !_d) [ check(as<int>(_) >= 1
  385. && as<int>(_) <= 12) ]
  386. >> '/'
  387. // Day must be between 1 and 31 inclusive
  388. >> (day= _d >> !_d) [ check(as<int>(_) >= 1
  389. && as<int>(_) <= 31) ]
  390. >> '/'
  391. // Only consider years between 1970 and 2038
  392. >> (_d >> _d >> _d >> _d) [ check(as<int>(_) >= 1970
  393. && as<int>(_) <= 2038) ]
  394. )
  395. // Ensure the month actually has that many days!
  396. [ check( ref(days_per_month)[as<int>(month)-1] >= as<int>(day) ) ]
  397. ;
  398. smatch what;
  399. std::string str("99/99/9999 2/30/2006 2/28/2006");
  400. if(regex_search(str, what, date))
  401. {
  402. std::cout << what[0] << std::endl;
  403. }
  404. The above program prints out the following:
  405. [pre
  406. 2/28/2006
  407. ]
  408. Notice how the inline custom assertions are used to range-check the values for
  409. the month, day and year. The regular expression doesn't match `"99/99/9999"` or
  410. `"2/30/2006"` because they are not valid dates. (There is no 99th month, and
  411. February doesn't have 30 days.)
  412. [endsect]