123456789101112131415161718192021222324252627282930313233343536373839404142434445464748495051525354555657585960616263646566676869707172737475767778798081828384858687888990919293949596979899100101102103104105106107108109110111112113114115116117118119120121122123124125126127128129130131132133134135136137138139140141142143144145146147148149150151152153154155156157158159160161162163164165166167168169170171172173174175176177178179180181182183184185186187188189190191192193194195196197198199200201202203204205206207208209210211212213214215216217218219220221222223224225226227228229230231232233234235236237238239240241242243244245246247248249250251252253254255256257258259260261262263264265266267268269270271272273274275276277278279280281282283284285286287288289290291292293294295296297298299300301302303304305306307308309310311312313314315316317318319320321322323324325326327328329330331332333334335336337338339340341342343344345346347348349350351352353354355356357358359360361362363364365366367368369370371372373374375376377378379380381382383384385386387388389390391392393394395396397398399400401402403404405406407408409410411412413414415416417418419420421422423424425426427428429430431432433434435436437438439440441442443444445446447448449450451452453454455456457458459460461462463464465466467468469470471472473474475476477478479480481482483484485486487488489490491492493494495496497498499500501502503504505506507508509510511512513514515516517518519520521522523524525526527528529530531532533534535536537538539540541542543544545546547548549550551552553554555556557558559560561562563564565566567568569570571572573574575576577578579580581582583584585586587588589 |
- [#manual]
- [section User manual]
- [section What is a parser]
- See the [link parser parser] section of the [link reference reference] for the
- explanation of what a parser is.
- [section The input of the parsers]
- Parsers take a [link string `string`] as input, which represents a string
- for template metaprograms. For example the string `"Hello World!"` can be
- defined the following way:
- string<'H','e','l','l','o',' ','W','o','r','l','d','!'>
- This syntax makes the input of the parsers difficult to read. Metaparse works
- with compilers using C++98, but the input of the parsers has to be defined the
- way it is described above.
- Based on `constexpr`, a feature provided by C++11, Metaparse provides a macro,
- [link BOOST_METAPARSE_STRING `BOOST_METAPARSE_STRING`] for defining strings:
- BOOST_METAPARSE_STRING("Hello World!")
- This defines a [link string `string`] as well, however, it is easier to
- read. The maximum length of the string that can be defined this way is limited,
- however, this limit is configurable. It is specified by the
- `BOOST_METAPARSE_LIMIT_STRING_SIZE` macro.
- [endsect]
- [section Source positions]
- A source position is described using a compile-time data structure. The
- following functions can be used to query it:
- * [link get_col `get_col`]
- * [link get_line `get_line`]
- The beginning of the input is [link start `start`] which requires
- `<boost/metaparse/start.hpp>` to be included.
- [endsect]
- [section Error handling]
- An error is described using a compile-time data structure. It contains
- information about the source position where the error was detected and some
- [link parsing_error_message description] about the error.
- [link debug_parsing_error `debug_parsing_error`] can be used to display the
- error message. Metaparse provides the
- [link BOOST_METAPARSE_DEFINE_ERROR `BOOST_METAPARSE_DEFINE_ERROR`] macro for
- defining simple [link parsing_error_message parsing error message]s.
- [endsect]
- [section Some examples of simple parsers]
- * A parser that parses nothing and always succeeds is
- [link return_ `return_`].
- * A parser that always fails is [link fail `fail`].
- * A parser that parses one character and returns the parsed character as the
- result is [link one_char `one_char`].
- [endsect]
- [section Combining parsers]
- Complex parsers can be built by combining simple parsers. The parser library
- contains a number of parser combinators that build new parsers from already
- existing ones.
- For example
- [link accept_when `accept_when`]`<Parser, Predicate, RejectErrorMsg>` is a
- parser. It uses `Parser` to parse the input. When `Parser` rejects the input,
- the combinator returns the error `Parser` failed with. When `Parser` is
- successful, the combinator validates the result using `Predicate`. If the
- predicate returns true, the combinator accepts the input, otherwise it generates
- an error with the message `RejectErrorMsg`.
- Having [link accept_when `accept_when`], [link one_char `one_char`] can be
- used to build parsers that accept only digit characters, only whitespaces, etc.
- For example [link digit `digit`] accepts only digit characters:
- typedef
- boost::metaparse::accept_when<
- boost::metaparse::one_char,
- boost::metaparse::util::is_digit,
- boost::metaparse::errors::digit_expected
- >
- digit;
- [endsect]
- [section Sequence]
- The result of a successful parsing is some value and the remaining string that
- was not parsed. The remaining string can be processed by another parser. The
- parser library provides a parser combinator, [link sequence `sequence`],
- that takes a number of parsers as arguments and builds a new parser from them
- that:
- * Parses the input using the first parser
- * If parsing succeeds, it parses the remaining string with the second parser
- * It continues applying the parsers in order as long as they succeed
- * If all of them succeed, it returns the list of results
- * If any of the parsers fails, the combinator fails as well and returns the
- error the first failing parser returned with
- [endsect]
- [#repetition]
- [section Repetition]
- It is a common thing to parse a list of things of unknown length. As an example
- let's start with something simple: the text is a list of numbers. For example:
- 11 13 3 21
- We want the result of parsing to be the sum of these values. Metaparse provides
- the [link int_ `int_`] parser we can use to parse one of these numbers.
- Metaparse provides the [link token `token`] combinator to consume the
- whitespaces after the number. So the following parser parses one number and the
- whitespaces after it:
- using int_token = token<int_>;
- The result of parsing is a boxed integer value: the value of the parsed number.
- For example parsing
- [link BOOST_METAPARSE_STRING `BOOST_METAPARSE_STRING`]`("13 ")` gives
- `boost::mpl::int_<13>` as the result.
- Our example input is a list of numbers. Each number can be parsed by
- `int_token`:
- [$images/metaparse/repeated_diag0.png [width 70%]]
- This diagram shows how the repeated application of `int_token` can parse the
- example input. Metaparse provides the [link repeated `repeated`] parser to
- easily implement this. The result of parsing is a typelist: the list of the
- individual numbers.
- [$images/metaparse/repeated_diag1.png [width 70%]]
- This diagram shows how [link repeated `repeated`]`<int_token>` works. It uses
- the `int_token` parser repeatedly and builds a `boost::mpl::vector` from the
- results it provides.
- But we need the sum of these, so we need to summarise the result. We can do this
- by wrapping our parser, [link repeated `repeated`]`<int_token>` with
- [link transform `transform`]. That gives us the opportunity to specify a
- function transforming this typelist to some other value - the sum of the
- elements in our case. Initially let's ignore how to summarise the elements in
- the vector. Let's assume that it can be implemented by a lambda expression and
- use `boost::mpl::lambda<...>::type` representing that lambda expression. Here is
- an example using [link transform `transform`] and this lambda expression:
- using sum_parser =
- transform<
- repeated<int_token>,
- boost::mpl::lambda<...>::type
- >;
- The [link transform `transform`]`<>` parser combinator wraps the
- [link repeated `repeated`]`<int_token>` to build the parser we need. Here is a
- diagram showing how it works:
- [$images/metaparse/repeated_diag2.png [width 70%]]
- As the diagram shows, the
- [link transform `transform`]`<`[link repeated `repeated`]`<int_token>, ...>`
- parser parses the input using [link repeated `repeated`]`<int_token>` and then
- does some processing on the result of parsing.
- Let's implement the missing lambda expression that tells
- [link transform `transform`] how to change the result coming from
- [link repeated `repeated`]`<int_token>`. We can summarise the numbers in a
- typelist by using Boost.MPL's `fold` or `accumulate`. Here is an example doing
- that:
- using sum_op = mpl::lambda<mpl::plus<mpl::_1, mpl::_2>>::type;
-
- using sum_parser =
- transform<
- repeated<int_token>,
- mpl::lambda<
- mpl::fold<mpl::_1, mpl::int_<0>, sum_op>
- >::type
- >;
- Here is an extended version of the above diagram showing what happens here:
- [$images/metaparse/repeated_diag3.png [width 70%]]
- This example parses the input, builds the list of numbers and then loops over it
- and summarises the values. It starts with the second argument of `fold`,
- `int_<0>` and adds every item of the list of numbers (which is the result of
- the parser [link repeated `repeated`]`<int_token>`) one by one.
- [note
- Note that [link transform `transform`] wraps another parser,
- [link repeated `repeated`]`<int_token>` here. It parses the input with that
- parser, gets the result of that parsing and changes that result.
- [link transform `transform`] itself will be a parser returning that updated
- result.
- ]
- [#introducing-foldl]
- [section Introducing foldl]
- It works, however, this is rather inefficient: it has a loop parsing the
- integers one by one, building a typelist and then it loops over this typelist to
- summarise the result. Using template metaprograms in your applications can have
- a serious impact on the compiler's memory usage and the speed of the
- compilation, therefore I recommend being careful with these things.
- Metaparse offers more efficient ways of achieving the same result. You don't
- need two loops: you can merge them together and add every number to your summary
- right after parsing it. Metaparse offers the [link foldl `foldl`] for this.
- With [link foldl `foldl`] you specify:
- * the parser to parse the individual elements of the list
- (which is `int_token` in our example)
- * the initial value used for folding (which is `int_<0>` in our example)
- * the forward operation merging the sub-result we have so far and the value
- coming from the last application of the parser (this was `sum_op` in our
- example)
- Our parser can be implemented this way:
- using better_sum_parser = foldl<int_token, mpl::int_<0>, sum_op>;
- As you can see the implementation of the parser is more compact.
- Here is a diagram showing what happens when you use this parser to parse some
- input:
- [$images/metaparse/foldl_diag1.png [width 70%]]
- As you can see, not only the implementation of the parser is more compact, but
- it achieves the same result by doing less as well. It parses the input by
- applying `int_token` repeatedly, just like the previous solution. But it
- produces the final result without building a typelist as an internal step. Here
- is how it works internally:
- [$images/metaparse/foldl_diag2.png [width 70%]]
- It summarises the results of the repeated `int_token` application using
- `sum_op`. This implementation is more efficient. It accepts an empty string as a
- valid input: the sum of it is `0`. It may be good for you, in which case you are
- done. If you don't wan to accept it, you can use [link foldl1 `foldl1`] instead
- of [link foldl `foldl`]. This is the same, but it rejects empty input.
- (Metaparse offers [link repeated1 `repeated1`] as well if you choose the first
- approach and would like to reject empty string)
- [endsect]
- [#introducing-foldr]
- [section Introducing foldr]
- [note
- Note that if you are reading this manual for the first time, you probably want
- to skip this section and proceed with
- [link introducing-foldl_start_with_parser Introducing foldl_start_with_parser]
- ]
- You might have noticed that Metaparse offers [link foldr `foldr`] as well. The
- difference between [link foldl `foldl`] and [link foldr `foldr`] is the
- direction in which the results are summarised. (`l` stands for ['from the Left]
- and `r` stands for ['from the Right]) Here is a diagram showing how
- `better_sum_parser` works if it is implemented using [link foldr `foldr`]:
- [$images/metaparse/foldr_diag1.png [width 70%]]
- As you can see this is very similar to using [link foldl `foldl`], but the
- results coming out of the individual applications of `int_token` are summarised
- in a right-to-left order. As `sum_op` is addition, it does not affect the end
- result, but in other cases it might.
- [note
- Note that the implementation of [link foldl `foldl`] is more efficient than
- [link foldr `foldr`]. Prefer [link foldl `foldl`] whenever possible.
- ]
- As you might expect it, Metaparse offers [link foldr1 `foldr1`] as well, which
- folds from the right and rejects empty input.
- [endsect]
- [#introducing-foldl_start_with_parser]
- [section Introducing foldl_start_with_parser]
- Let's change the grammar of our little language. Instead of a list of numbers,
- let's expect numbers separated by a `+` symbol. Our example input becomes the
- following:
- BOOST_METAPARSE_STRING("11 + 13 + 3 + 21")
- Parsing it with [link foldl `foldl`] or [link repeated `repeated`] is difficult:
- there has to be a `+` symbol before every element ['except] the first one. None
- of the already introduced repetition constructs offer a way of treating the
- first element in a different way.
- If we forget about the first number for a moment, the rest of the input is
- `"+ 13 + 3 + 21"`. This can easily be parsed by [link foldl `foldl`] (or
- [link repeated `repeated`]):
- using plus_token = token<lit_c<'+'>>;
- using plus_int = last_of<plus_token, int_token>;
-
- using sum_parser2 = foldl<plus_int, int_<0>, sum_op>;
- It uses `plus_int`, that is [link last_of `last_of`]`<plus_token, int_token>`
- as the parser that is used repeatedly to get the numbers. It does the following:
- * Uses `plus_token` to parse the `+` symbol and any whitespace that might follow
- it.
- * Uses then `int_token` to parse the number
- * Combines the above two with [link last_of `last_of`] to use both parsers in
- order and keep only the result of using the second one (the result of parsing
- the `+` symbol is thrown away - we don't care about it).
- This way [link last_of `last_of`]`<plus_token, int_token>` returns the value of
- the number as the result of parsing, just like our previous parser, `int_token`
- did. Because of this, it can be used as a drop-in replacement of `int_token` in
- the previous example and we get a parser for our updated language. Or at least
- for all number except the first one.
- This [link foldl `foldl`] can not parse the first element, because it expects a
- `+` symbol before every number. You might think of making the `+` symbol
- optional in the above approach - don't do that. It makes the parser accept
- `"11 + 13 3 21"` as well as the `+` symbol is now optional ['everywhere].
- What you could do is parsing the first element with `int_token`, the rest of
- the elements with the above [link foldl `foldl`]-based solution and add the
- result of the two. This is left as an exercise to the reader.
- Metaparse offers [link foldl_start_with_parser `foldl_start_with_parser`] to
- implement this. [link foldl_start_with_parser `foldl_start_with_parser`] is the
- same as [link foldl `foldl`]. The difference is that instead of an initial value
- to combine the list elements with it takes an ['initial parser]:
- using plus_token = token<lit_c<'+'>>;
- using plus_int = last_of<plus_token, int_token>;
-
- using sum_parser3 = foldl_start_with_parser<plus_int, int_token, sum_op>;
- [link foldl_start_with_parser `foldl_start_with_parser`] starts with applying
- that initial parser and uses the result it returns as the initial value for
- folding. It does the same as [link foldl `foldl`] after that. The following
- diagram shows how it can be used to parse a list of numbers separated by `+`
- symbols:
- [$images/metaparse/foldl_start_with_parser_diag1.png [width 70%]]
- As the diagram shows, it start parsing the list of numbers with `int_token`,
- uses its value as the starting value for folding (earlier approaches were using
- the value `int_<0>` as this starting value). Then it parses all elements of the
- list by using `plus_int` multiple times.
- [endsect]
- [#introducing-foldr_start_with_parser]
- [section Introducing foldr_start_with_parser]
- [note
- Note that if you are reading this manual for the first time, you probably want
- to skip this section and try creating some parsers using
- [link foldl_start_with_parser `foldl_start_with_parser`] instead.
- ]
- [@foldl_start_with_parser.hpp `foldl_start_with_parser`] has its
- ['from the right] pair,
- [link foldr_start_with_parser `foldr_start_with_parser`]. It uses the same
- elements as [link foldl_start_with_parser `foldl_start_with_parser`] but in a
- different order. Here is a parser for our example language implemented with
- [link foldr_start_with_parser `foldr_start_with_parser`]:
- using plus_token = token<lit_c<'+'>>;
- using int_plus = first_of<int_token, plus_token>;
-
- using sum_parser4 = foldr_start_with_parser<int_plus, int_token, sum_op>;
- Note that it uses `int_plus` instead of `plus_int`. This is because the parser
- the initial value for folding comes from is used after `int_plus` has parsed the
- input as many times as it could. It might sound strange for the first time, but
- the following diagram should help you understand how it works:
- [$images/metaparse/foldr_start_with_parser_diag1.png [width 70%]]
- As you can see, it starts with the parser that is applied repeatedly on the
- input, thus instead of parsing `plus_token int_token` repeatedly, we need to
- parse `int_token plus_token` repeatedly. The last number is not followed by `+`,
- thus `int_plus` fails to parse it and it stops the iteration.
- [link foldr_start_with_parser `foldr_start_with_parser`] then uses the other
- parser, `int_token` to parse the input. It succeeds and the result it returns is
- used as the starting value for folding from the right.
- [note
- Note that as the above description also suggests, the implementation of
- [link foldl_start_with_parser `foldl_start_with_parser`] is more efficient
- than [link foldr_start_with_parser `foldr_start_with_parser`]. Prefer
- [link foldl_start_with_parser `foldl_start_with_parser`] whenever possible.
- ]
- [endsect]
- [#introducing-foldl_reject_incomplete_start_with_parser]
- [section Introducing foldl_reject_incomplete_start_with_parser]
- Using a parser built with
- [link foldl_start_with_parser `foldl_start_with_parser`] we can parse the input
- when the input is correct. However, it is not always the case. Consider the
- following input for example:
- BOOST_METAPARSE_STRING("11 + 13 + 3 + 21 +")
- This is an invalid expression. However, if we parse it using the
- [link foldl_start_with_parser `foldl_start_with_parser`]-based parser presented
- earlier (`sum_parser3`), it accepts the input and the result is `48`. This is
- because [link foldl_start_with_parser `foldl_start_with_parser`] parses the
- input ['as long as it can]. It parses the first`int_token` (`11`) and then it
- starts parsing the `plus_int` elements (`+ 13`, `+ 3`, `+ 21`). After parsing
- all of these, it tries to parse the remaining `" +"` input using `plus_int`
- which fails and therefore
- [link foldl_start_with_parser `foldl_start_with_parser`] stops after `+ 21`.
- The problem is that the parser parses the longest sub-expression starting from
- the beginning, that represents a valid expression. The rest is ignored. The
- parser can be wrapped by [link entire_input `entire_input`] to make sure to
- reject expressions with invalid extra characters at the end, however, that
- won't make the error message useful. ([link entire_input `entire_input`] can
- only tell the author of the invalid expression that after `+ 21` is something
- wrong).
- Metaparse provides
- [link foldl_reject_incomplete_start_with_parser `foldl_reject_incomplete_start_with_parser`],
- which does the same as [link foldl_start_with_parser `foldl_start_with_parser`],
- except that once no further repetitions are found, it checks ['where] the
- repeated parser (in our example `plus_int`) fails. When it can make any progress
- (eg. it finds a `+` symbol), then
- [link foldl_reject_incomplete_start_with_parser `foldl_reject_incomplete_start_with_parser`]
- assumes, that the expression's author intended to make the repetition longer,
- but made a mistake and propagates the error message coming from that last broken
- expression.
- [$images/metaparse/foldl_reject_incomplete_start_with_parser_diag1.png [width 70%]]
- The above diagram shows how
- [link foldl_reject_incomplete_start_with_parser `foldl_reject_incomplete_start_with_parser`]
- parses the example invalid input and how it fails. This can be used for better
- error reporting from the parsers.
- Other folding parsers also have their `f` version. (eg.
- [link foldr_reject_incomplete `foldr_reject_incomplete`],
- [link foldl_reject_incomplete1 `foldl_reject_incomplete1`], etc).
- [endsect]
- [#finding-the-right-folding-parser-combinator]
- [section Finding the right folding parser combinator]
- As you might have noticed, there are a lot of different folding parser
- combinators. To help you find the right one, the following naming convention is
- used:
- [$images/metaparse/folds.png [width 70%]]
- [note
- Note that there is no `foldr_reject_incomplete_start_with_parser`. The `p`
- version of the right-folding parsers applies the special parser, whose result
- is the initial value, after the repeated elements. Therefore, when the parser
- parsing one repeated element fails, `foldr_start_with_parser` would apply that
- special final parser instead of checking how the repeated element's parser
- failed.
- ]
- [endsect]
- [endsect]
- [#result_types]
- [section What can be built from a compile-time string?]
- Parsers built using Metaparse are template metaprograms parsing text (or code)
- at compile-time. Here is a list of things that can be the "result" of parsing:
- * A ['type]. An example for this is a parser parsing a `printf` format string
- and returning the typelist (eg. `boost::mpl::vector`) of the expected
- arguments.
- * A ['constant value]. An example for this is the result of a calculator
- language. See the [link getting_started Getting Started] section for further
- details.
- * A ['runtime object]. A static runtime object can be generated that might be
- used at runtime. An example for this is parsing regular expressions at
- compile-time and building `boost::xpressive::sregex` objects. See the
- `regex` example of Metaparse for an example.
- * A C++ ['function], which might be called at runtime. A C++ function can be
- generated that can be called at runtime. It is good for generating native
- (and optimised) code from EDSLs. See the `compile_to_native_code` example of
- Metaparse as an example for this.
- * A [link metafunction_class ['template metafunction class]]. The result of
- parsing might be a type, which is a
- [link metafunction_class template metafunction class]. This is good for
- building an EDSL for template metaprogramming. See the `meta_hs` example of
- Metaparse as an example for this.
- [endsect]
- [section Grammars]
- Metaparse provides a way to define grammars in a syntax that resembles EBNF. The
- [link grammar `grammar`] template can be used to define a grammar. It can be
- used the following way:
- grammar<BOOST_METAPARSE_STRING("plus_exp")>
- ::import<BOOST_METAPARSE_STRING("int_token"), token<int_>>::type
-
- ::rule<BOOST_METAPARSE_STRING("ws ::= (' ' | '\n' | '\r' | '\t')*")>::type
- ::rule<BOOST_METAPARSE_STRING("plus_token ::= '+' ws"), front<_1>>::type
- ::rule<BOOST_METAPARSE_STRING("plus_exp ::= int_token (plus_token int_token)*"), plus_action>::type
- The code above defines a parser from a grammar definition. The start symbol of
- the grammar is `plus_exp`. The lines beginning with `::rule` define rules.
- Rules optionally have a semantic action, which is a metafunction class that
- transforms the result of parsing after the rule has been applied.
- Existing parsers can be bound to names and be used in the rules by importing
- them. Lines beginning with `::import` bind existing parsers to names.
- The result of a grammar definition is a parser which can be given to other
- parser combinators or be used directly. Given that grammars can import existing
- parsers and build new ones, they are parser combinators as well.
- [endsect]
- [endsect]
- [section Parsing based on `constexpr`]
- Metaparse is based on template metaprogramming, however, C++11 provides
- `constexpr`, which can be used for parsing at compile-time as well. While
- implementing parsers based on `constexpr` is easier for a C++ developer, since
- its syntax resembles the regular syntax of the language, the result of parsing
- has to be a `constexpr` value. Parsers based on template metaprogramming can
- build types as the result of parsing. These types may be boxed `constexpr`
- values but can be metafunction classes, classes with static functions which can
- be called at runtime, etc.
- When a parser built with Metaparse needs a sub-parser for processing a part of
- the input text and generating a `constexpr` value as the result of parsing, one
- can implement the sub-parser based on `constexpr` functions. Metaparse
- can be integrated with them and lift their results into C++ template
- metaprogramming. An example demonstrating this feature can be found among the
- examples (`constexpr_parser`). This capability makes it possible to integrate
- Metaparse with parsing libraries based on `constexpr`.
- [endsect]
- [section What types of grammars can be used?]
- It is possible to write parsers for ['context free grammars] using Metaparse.
- However, this is not the most general category of grammars that can be used. As
- Metaparse is a highly extendable framework, it is not clear what should be
- considered to be the limit of Metaparse itself. For example Metaparse provides
- the [link accept_when `accept_when`] [link parser_combinator parser combinator].
- It can be used to provide arbitrary predicates for enabled/disabling a specific
- rule. One can go as far as providing the Turing machine (as a
- [link metafunction metafunction]) of the entire grammar as a predicate, so one
- can build parsers for ['unrestricted grammars] that can be parsed using a Turing
- machine. Note that such a parser would not be considered to be a parser built
- with Metaparse, however, it is not clear how far a solution might go and still
- be considered using Metaparse.
-
- Metaparse assumes that the parsers are ['deterministic], as they have only "one"
- result. It is of course possible to write parsers and combinators that return a
- set (or list or some other container) of results as that "one" result, but that
- can be considered building a new parser library. There is no clear boundary for
- Metaparse.
-
- Metaparse supports building ['top-down parsers] and ['left-recursion] is not
- supported as it would lead to infinite recursion. ['Right-recursion] is
- supported, however, in most cases the
- [link repetition iterative parser combinators] provide better alternatives.
-
- [endsect]
- [endsect]
|