123456789101112131415161718192021222324252627282930313233343536373839404142434445464748495051525354555657585960616263646566676869707172737475767778798081828384858687888990919293949596979899100101102103104105106107108109110111112113114115116117118119120121122123124125126127128129130131132133134135136137138139140141142143144145146147148149150151152153154155 |
- [/==============================================================================
- Copyright (C) 2001-2015 Joel de Guzman
- Copyright (C) 2001-2011 Hartmut Kaiser
- Distributed under the Boost Software License, Version 1.0. (See accompanying
- file LICENSE_1_0.txt or copy at http://www.boost.org/LICENSE_1_0.txt)
- ===============================================================================/]
- [section Warming up]
- We'll start by showing examples of parser expressions to give you a feel on how
- to build parsers from the simplest parser, building up as we go. When comparing
- EBNF to __spirit__, the expressions may seem awkward at first. __spirit__ heavily
- uses operator overloading to accomplish its magic.
- [heading Trivial Example #1 Parsing a number]
- Create a parser that will parse a floating-point number.
- double_
- (You've got to admit, that's trivial!) The above code actually generates a
- Spirit floating point parser (a built-in parser). Spirit has many pre-defined
- parsers and consistent naming conventions help you keep from going insane!
- [heading Trivial Example #2 Parsing two numbers]
- Create a parser that will accept a line consisting of two floating-point numbers.
- double_ >> double_
- Here you see the familiar floating-point numeric parser `double_` used twice,
- once for each number. What's that `>>` operator doing in there? Well, they had
- to be separated by something, and this was chosen as the "followed by" sequence
- operator. The above program creates a parser from two simpler parsers, glueing
- them together with the sequence operator. The result is a parser that is a
- composition of smaller parsers. Whitespace between numbers can implicitly be
- consumed depending on how the parser is invoked (see below).
- [note When we combine parsers, we end up with a "bigger" parser, but
- it's still a parser. Parsers can get bigger and bigger, nesting more and more,
- but whenever you glue two parsers together, you end up with one bigger parser.
- This is an important concept.
- ]
- [heading Trivial Example #3 Parsing zero or more numbers]
- Create a parser that will accept zero or more floating-point numbers.
- *double_
- This is like a regular-expression Kleene Star, though the syntax might look a
- bit odd for a C++ programmer not used to seeing the `*` operator overloaded like
- this. Actually, if you know regular expressions it may look odd too since the
- star is before the expression it modifies. C'est la vie. Blame it on the fact
- that we must work with the syntax rules of C++.
- Any expression that evaluates to a parser may be used with the Kleene Star.
- Keep in mind that C++ operator precedence rules may require you to put
- expressions in parentheses for complex expressions. The Kleene Star
- is also known as a Kleene Closure, but we call it the Star in most places.
- [heading Trivial Example #4 Parsing a comma-delimited list of numbers]
- This example will create a parser that accepts a comma-delimited list of
- numbers.
- double_ >> *(char_(',') >> double_)
- Notice `char_(',')`. It is a literal character parser that can recognize the
- comma `','`. In this case, the Kleene Star is modifying a more complex parser,
- namely, the one generated by the expression:
- (char_(',') >> double_)
- Note that this is a case where the parentheses are necessary. The Kleene star
- encloses the complete expression above.
- [heading Let's Parse!]
- We're done with defining the parser. So the next step is now invoking this
- parser to do its work. There are a couple of ways to do this. For now, we will
- use the `phrase_parse` function. One overload of this function accepts four
- arguments:
- # An iterator pointing to the start of the input
- # An iterator pointing to one past the end of the input
- # The parser object
- # Another parser called the skip parser
- In our example, we wish to skip spaces and tabs. Another parser named `space`
- is included in Spirit's repertoire of predefined parsers. It is a very simple
- parser that simply recognizes whitespace. We will use `space` as our skip
- parser. The skip parser is the one responsible for skipping characters in
- between parser elements such as the `double_` and `char_`.
- Ok, so now let's parse!
- template <typename Iterator>
- bool parse_numbers(Iterator first, Iterator last)
- {
- using x3::double_;
- using x3::phrase_parse;
- using ascii::space;
- bool r = phrase_parse(
- first, // Start Iterator
- last, // End Iterator
- double_ >> *(',' >> double_), // The Parser
- space // The Skip-Parser
- );
- if (first != last) // fail if we did not get a full match
- return false;
- return r;
- }
- The parse function returns `true` or `false` depending on the result of
- the parse. The first iterator is passed by reference. On a successful
- parse, this iterator is repositioned to the rightmost position consumed
- by the parser. If this becomes equal to `last`, then we have a full
- match. If not, then we have a partial match. A partial match happens
- when the parser is only able to parse a portion of the input.
- Note that we inlined the parser directly in the call to parse. Upon calling
- parse, the expression evaluates into a temporary, unnamed parser which is passed
- into the parse() function, used, and then destroyed.
- Here, we opted to make the parser generic by making it a template, parameterized
- by the iterator type. By doing so, it can take in data coming from any STL
- conforming sequence as long as the iterators conform to a forward iterator.
- You can find the full cpp file here:
- [@../../../example/x3/num_list/num_list1.cpp num_list1.cpp]
- [note `char` and `wchar_t` operands
- The careful reader may notice that the parser expression has `','` instead of
- `char_(',')` as the previous examples did. This is ok due to C++ syntax rules of
- conversion. There are `>>` operators that are overloaded to accept a `char` or
- `wchar_t` argument on its left or right (but not both). An operator may be
- overloaded if at least one of its parameters is a user-defined type. In this
- case, the `double_` is the 2nd argument to `operator>>`, and so the proper
- overload of `>>` is used, converting `','` into a character literal parser.
- The problem with omitting the `char_` should be obvious: `'a' >> 'b'` is not a
- spirit parser, it is a numeric expression, right-shifting the ASCII (or another
- encoding) value of `'a'` by the ASCII value of `'b'`. However, both
- `char_('a') >> 'b'` and `'a' >> char_('b')` are Spirit sequence parsers
- for the letter `'a'` followed by `'b'`. You'll get used to it, sooner or later.
- ]
- Finally, take note that we test for a full match (i.e. the parser fully parsed
- the input) by checking if the first iterator, after parsing, is equal to the end
- iterator. You may strike out this part if partial matches are to be allowed.
- [endsect] [/ Warming up]
|