12345678910111213141516171819202122232425262728293031323334353637383940414243444546474849505152535455565758596061626364656667686970717273747576777879808182838485868788899091929394959697 |
- [/==============================================================================
- Copyright (C) 2001-2011 Joel de Guzman
- Copyright (C) 2001-2011 Hartmut Kaiser
- Distributed under the Boost Software License, Version 1.0. (See accompanying
- file LICENSE_1_0.txt or copy at http://www.boost.org/LICENSE_1_0.txt)
- ===============================================================================/]
- [section:lexer_quickstart1 Quickstart 1 - A word counter using __lex__]
- __lex__ is very modular, which follows the general building principle of the
- __spirit__ libraries. You never pay for features you don't use. It is nicely
- integrated with the other parts of __spirit__ but nevertheless can be used
- separately to build stand alone lexical analyzers.
- The first quick start example describes a stand alone application:
- counting characters, words, and lines in a file, very similar to what the well
- known Unix command `wc` is doing (for the full example code see here:
- [@../../example/lex/word_count_functor.cpp word_count_functor.cpp]).
- [import ../example/lex/word_count_functor.cpp]
- [heading Prerequisites]
- The only required `#include` specific to /Spirit.Lex/ follows. It is a wrapper
- for all necessary definitions to use /Spirit.Lex/ in a stand alone fashion, and
- on top of the __lexertl__ library. Additionally we `#include` two of the Boost
- headers to define `boost::bind()` and `boost::ref()`.
- [wcf_includes]
- To make all the code below more readable we introduce the following namespaces.
- [wcf_namespaces]
- [heading Defining Tokens]
- The most important step while creating a lexer using __lex__ is to define the
- tokens to be recognized in the input sequence. This is normally done by
- defining the regular expressions describing the matching character sequences,
- and optionally their corresponding token ids. Additionally the defined tokens
- need to be associated with an instance of a lexer object as provided by the
- library. The following code snippet shows how this can be done using __lex__.
- [wcf_token_definition]
- [heading Doing the Useful Work]
- We will use a setup, where we want the __lex__ library to invoke a given
- function after any of the generated tokens is recognized. For this reason
- we need to implement a functor taking at least the generated token as an
- argument and returning a boolean value allowing to stop the tokenization
- process. The default token type used in this example carries a token value of
- the type __boost_iterator_range__`<BaseIterator>` pointing to the matched
- range in the underlying input sequence.
- [wcf_functor]
- All what is left is to write some boilerplate code helping to tie together the
- pieces described so far. To simplify this example we call the `lex::tokenize()`
- function implemented in __lex__ (for a more detailed description of this
- function see here: __fixme__), even if we could have written a loop to iterate
- over the lexer iterators [`first`, `last`) as well.
- [heading Pulling Everything Together]
- [wcf_main]
- [heading Comparing __lex__ with __flex__]
- This example was deliberately chosen to be as much as possible similar to the
- equivalent __flex__ program (see below), which isn't too different from what
- has to be written when using __lex__.
- [note Interestingly enough, performance comparisons of lexical analyzers
- written using __lex__ with equivalent programs generated by
- __flex__ show that both have comparable execution speeds!
- Generally, thanks to the highly optimized __lexertl__ library and
- due its carefully designed integration with __spirit__ the
- abstraction penalty to be paid for using __lex__ is negligible.
- ]
- The remaining examples in this tutorial will use more sophisticated features
- of __lex__, mainly to allow further simplification of the code to be written,
- while maintaining the similarity with corresponding features of __flex__.
- __lex__ has been designed to be as similar to __flex__ as possible. That
- is why this documentation will provide the corresponding __flex__ code for the
- shown __lex__ examples almost everywhere. So consequently, here is the __flex__
- code corresponding to the example as shown above.
- [wcf_flex_version]
- [endsect]
|