123456789101112131415161718192021222324252627282930313233343536373839404142434445464748495051525354555657585960616263646566676869707172737475767778798081828384858687888990919293949596979899100101102103104105106107108109110111112113114115116117118119120121122123124125126127128129130131132133134135136137138139140141142143144145146147 |
- [/==============================================================================
- Copyright (C) 2001-2011 Joel de Guzman
- Copyright (C) 2001-2011 Hartmut Kaiser
- Distributed under the Boost Software License, Version 1.0. (See accompanying
- file LICENSE_1_0.txt or copy at http://www.boost.org/LICENSE_1_0.txt)
- ===============================================================================/]
- [section Rationale]
- [heading Naming]
- Why use the name "Spirit", "Qi" and "Karma"? Because xpressive names
- have a better spirit, brings qi to your software and will enhance your
- karma so they can heal your (con)fusion and make you wave like a phoenix
- from the ashes. (Joachim Faulhaber)
- [heading Type Erasure: From static to dynamic C++]
- Rules straddle the border between static and dynamic C++. In effect, a
- rule transforms compile-time polymorphism (using templates) into
- run-time polymorphism (using virtual functions). This is necessary due
- to C++'s inability to automatically declare a variable of a type deduced
- from an arbitrarily complex expression in the right-hand side (rhs) of
- an assignment. Basically, we want to do something like:
- T rule = an_arbitrarily_complex_expression;
- without having to know or care about the resulting type of the
- right-hand side (rhs) of the assignment expression. This can be done
- with modern C++ 0x compilers using `auto`:
- auto rule = an_arbitrarily_complex_expression;
- Apart from this, we also need a facility to forward declare an unknown
- type:
- T rule;
- ...
- rule = a | b;
- These limitations lead us to this implementation of rules. This comes at
- the expense of the overhead of a type-erased call which is an indirect
- function call that connot be inlined, once through each invocation of a
- rule.
- [heading Multiple declaration]
- Some BNF variants allow multiple declarations of a rule. The
- declarations are taken as alternatives. Example:
- r = a;
- r = b;
- is equivalent to:
- r = a | b;
- Spirit v1.3 allowed this behavior. However, the current version of
- Spirit no longer allows this because experience shows that this behavior
- leads to unwanted gotchas (for instance, it does not allow rules to be
- held in containers). In the current release of Spirit, a second
- assignment to a rule will simply redefine it. The old definition is
- destructed. This follows more closely C++ semantics and is more in line
- with what the user expects the rule to behave.
- [heading Sequencing Syntax]
- The comma operator as in `a, b` seems to be a better candidate,
- syntax-wise. But then the problem is with its precedence. It has the
- lowest precedence in C/C++, which makes it virtually useless.
- Bjarne Stroustrup, in his article "Generalizing Overloading for C++2000"
- talks about overloading whitespace. Such a feature would allow
- juxtapositioning of parser objects exactly as we do in (E)BNF (e.g. a b
- | c instead of a >> b | c). Unfortunately, the article was dated April
- 1, 1998. Oh well.
- [heading Forward iterators]
- In general, the expected iterator is at least a standard conforming
- forward iterator. Forward iterators are needed for backtracking where
- the iterator needs to be saved and restored later. Generally speaking,
- Spirit is a backtracking parser. The implication of this is that at some
- point, the iterator position will have to be saved to allow the parser
- to backtrack to a previous point. Thus, for backtracking to work, the
- framework requires at least a forward iterator.
- There are remedies of course. In cases where we need to use input
- iterators, you can use the __multi_pass__ iterator to make the forward
- iterators.
- Some parsers might require more specialized iterators (bi-directional or
- even random access). Perhaps in the future, deterministic parsers when
- added to the framework, will perform no backtracking and will need just
- a single token lookahead, hence will require input iterators only.
- [heading Exhaustive backtracking and greedy RD]
- Spirit doesn't do exhaustive backtracking like regular expressions are
- expected to. For example:
- *char_('a') >> char_('a');
- will always fail to match because Spirit's Kleene star does not back off
- when the rest of the rule fails to match.
- Actually, there's a solution to this greedy RD problem. Such a scheme is
- discussed in section 6.6.2 of Parsing Techniques: A Practical Guide. The
- trick involves passing a tail parser (in addition to the scanner) to
- each parser. The start parser will then simply be:
- start >> end;
- (end is the start's tail).
- Spirit is greedy --using straight forward, naive RD. It is certainly
- possible to implement the fully backtracking scheme presented above, but
- there will be also certainly be a performance hit. The scheme will
- always try to match all possible parser paths (full parser hierarchy
- traversal) until it reaches a point of certainty, that the whole thing
- matches or fails to match.
- [:Backtracking and Greedy RD
- Spirit is quite consistent and intuitive about when it backtracks and to
- where, although it may not be obvious to those coming from different
- backgrounds. In general, any (sub)parser will, given the same input,
- always match the same portion of the input (or fail to match the input
- at all). This means that Spirit is inherently greedy. Spirit will only
- backtrack when a (sub)parser fails to match the input, and it will
- always backtrack to the next choice point upward (not backward) in the
- parser structure. In other words abb|ab will match `"ab"`, as will
- `a(bb|b)`, but `(ab|a)b` won't because the `(ab|a)` subparser will
- always match the `'b'` after the `'a'` if it is available.
- --Rainer Deyke]
- This is the very nature of __peg__.
- There's a strong preference on "simplicity with all the knobs when you
- need them" approach, right now. On the other hand, the flexibility of
- Spirit makes it possible to have different optional schemes available.
- It might be possible to implement an exhaustive backtracking RD scheme
- as an optional feature in the future.
- [endsect]
|