123456789101112131415161718192021222324252627282930313233343536373839404142434445464748495051525354555657585960616263646566676869707172737475767778798081828384858687888990919293949596979899100101102103104105106107108109110111112113114115116117118119120121122123124125126127128129130131132133134135136137138139140141142143144145146147148149150151152153154155156157158159160161162163164165166167168169170171172173174175176177178179180181182183184185186187188189190191192193194195196197198199200201202203204205206207208209210211212213214215216217218219220221222223224225226227228229230231232233234235236237238239240241242243244245246247248249250251252253254255256257258259260261262263264265266267268269270271272273274275276277278279280281282283284285286287288289290291292293294295296297298299300301302303304305306307308309310311312313314315316317318319320321322323324325326327328329330331332333334335336337338339340341342343344345346347348349350351352353354355356357358359360361362363364365366367368369370371372373374375376377378379380381382383384385386387388389390391392393394395396397398399400401402403404405406407408409410411412413414415416417418419420421422423424425426427428429430431432433434435436437438439440441442443444445446447448449450451452453454455456457458459460461462463464465466467468469470471472473474475476477478479480481482483484485486487488489490491492493494495496497498499500501502503504505506507508509510511512513514515516517518519520521522523524525526527528529530531532533534535536537538539540541542543544545546547548549550551552553554555556557558559560561562563564565566567568569570571572573574575576577578579580581582583584585586587588589590591592593594595596597598599600601602603604605606607608609610611612613614615616617618619620621622623624625626627628629630631632633634635636637638639640641642643644645646647648649650651652653654655656657658659660661662663664665666667668669670671672673674675676677678679680681682683684685686687688689690691692693694695 |
- <?xml version="1.0" standalone="yes"?>
- <!DOCTYPE library PUBLIC "-//Boost//DTD BoostBook XML V1.0//EN"
- "http://www.boost.org/tools/boostbook/dtd/boostbook.dtd"
- [
- <!ENTITY % entities SYSTEM "program_options.ent" >
- %entities;
- ]>
- <section id="program_options.overview">
- <title>Library Overview</title>
- <para>In the tutorial section, we saw several examples of library usage.
- Here we will describe the overall library design including the primary
- components and their function.
- </para>
- <para>The library has three main components:
- <itemizedlist>
- <listitem>
- <para>The options description component, which describes the allowed options
- and what to do with the values of the options.
- </para>
- </listitem>
- <listitem>
- <para>The parsers component, which uses this information to find option names
- and values in the input sources and return them.
- </para>
- </listitem>
- <listitem>
- <para>The storage component, which provides the
- interface to access the value of an option. It also converts the string
- representation of values that parsers return into desired C++ types.
- </para>
- </listitem>
- </itemizedlist>
- </para>
- <para>To be a little more concrete, the <code>options_description</code>
- class is from the options description component, the
- <code>parse_command_line</code> function is from the parsers component, and the
- <code>variables_map</code> class is from the storage component. </para>
- <para>In the tutorial we've learned how those components can be used by the
- <code>main</code> function to parse the command line and config
- file. Before going into the details of each component, a few notes about
- the world outside of <code>main</code>.
- </para>
- <para>
- For that outside world, the storage component is the most important. It
- provides a class which stores all option values and that class can be
- freely passed around your program to modules which need access to the
- options. All the other components can be used only in the place where
- the actual parsing is the done. However, it might also make sense for the
- individual program modules to describe their options and pass them to the
- main module, which will merge all options. Of course, this is only
- important when the number of options is large and declaring them in one
- place becomes troublesome.
- </para>
- <!--
- <para>The design looks very simple and straight-forward, but it is worth
- noting some important points:
- <itemizedlist>
- <listitem>
- <para>The options description is not tied to specific source. Once
- options are described, all parsers can use that description.</para>
- </listitem>
- <listitem>
- <para>The parsers are intended to be fairly dumb. They just
- split the input into (name, value) pairs, using strings to represent
- names and values. No meaningful processing of values is done.
- </para>
- </listitem>
- <listitem>
- <para>The storage component is focused on storing options values. It
- </para>
- </listitem>
- </itemizedlist>
- </para>
- -->
- <section>
- <title>Options Description Component</title>
- <para>The options description component has three main classes:
- &option_description;, &value_semantic; and &options_description;. The
- first two together describe a single option. The &option_description;
- class contains the option's name, description and a pointer to &value_semantic;,
- which, in turn, knows the type of the option's value and can parse the value,
- apply the default value, and so on. The &options_description; class is a
- container for instances of &option_description;.
- </para>
- <para>For almost every library, those classes could be created in a
- conventional way: that is, you'd create new options using constructors and
- then call the <code>add</code> method of &options_description;. However,
- that's overly verbose for declaring 20 or 30 options. This concern led
- to creation of the syntax that you've already seen:
- <programlisting>
- options_description desc;
- desc.add_options()
- ("help", "produce help")
- ("optimization", value<int>()->default_value(10), "optimization level")
- ;
- </programlisting>
- </para>
- <para>The call to the <code>value</code> function creates an instance of
- a class derived from the <code>value_semantic</code> class: <code>typed_value</code>.
- That class contains the code to parse
- values of a specific type, and contains a number of methods which can be
- called by the user to specify additional information. (This
- essentially emulates named parameters of the constructor.) Calls to
- <code>operator()</code> on the object returned by <code>add_options</code>
- forward arguments to the constructor of the <code>option_description</code>
- class and add the new instance.
- </para>
- <para>
- Note that in addition to the
- <code>value</code>, library provides the <code>bool_switch</code>
- function, and user can write his own function which will return
- other subclasses of <code>value_semantic</code> with
- different behaviour. For the remainder of this section, we'll talk only
- about the <code>value</code> function.
- </para>
- <para>The information about an option is divided into syntactic and
- semantic. Syntactic information includes the name of the option and the
- number of tokens which can be used to specify the value. This
- information is used by parsers to group tokens into (name, value) pairs,
- where value is just a vector of strings
- (<code>std::vector<std::string></code>). The semantic layer
- is responsible for converting the value of the option into more usable C++
- types.
- </para>
- <para>This separation is an important part of library design. The parsers
- use only the syntactic layer, which takes away some of the freedom to
- use overly complex structures. For example, it's not easy to parse
- syntax like: <screen>calc --expression=1 + 2/3</screen> because it's not
- possible to parse <screen>1 + 2/3</screen> without knowing that it's a C
- expression. With a little help from the user the task becomes trivial,
- and the syntax clear: <screen>calc --expression="1 + 2/3"</screen>
- </para>
- <section>
- <title>Syntactic Information</title>
- <para>The syntactic information is provided by the
- <classname>boost::program_options::options_description</classname> class
- and some methods of the
- <classname>boost::program_options::value_semantic</classname> class
- and includes:
- <itemizedlist>
- <listitem>
- <para>
- name of the option, used to identify the option inside the
- program,
- </para>
- </listitem>
- <listitem>
- <para>
- description of the option, which can be presented to the user,
- </para>
- </listitem>
- <listitem>
- <para>
- the allowed number of source tokens that comprise options's
- value, which is used during parsing.
- </para>
- </listitem>
- </itemizedlist>
- </para>
- <para>Consider the following example:
- <programlisting>
- options_description desc;
- desc.add_options()
- ("help", "produce help message")
- ("compression", value<string>(), "compression level")
- ("verbose", value<string>()->implicit_value("0"), "verbosity level")
- ("email", value<string>()->multitoken(), "email to send to")
- ;
- </programlisting>
- For the first parameter, we specify only the name and the
- description. No value can be specified in the parsed source.
- For the first option, the user must specify a value, using a single
- token. For the third option, the user may either provide a single token
- for the value, or no token at all. For the last option, the value can
- span several tokens. For example, the following command line is OK:
- <screen>
- test --help --compression 10 --verbose --email beadle@mars beadle2@mars
- </screen>
- </para>
- <section>
- <title>Description formatting</title>
- <para>
- Sometimes the description can get rather long, for example, when
- several option's values need separate documentation. Below we
- describe some simple formatting mechanisms you can use.
- </para>
- <para>The description string has one or more paragraphs, separated by
- the newline character ('\n'). When an option is output, the library
- will compute the indentation for options's description. Each of the
- paragraph is output as a separate line with that intentation. If
- a paragraph does not fit on one line it is spanned over multiple
- lines (which will have the same indentation).
- </para>
- <para>You may specify additional indent for the first specified by
- inserting spaces at the beginning of a paragraph. For example:
- <programlisting>
- options.add_options()
- ("help", " A long help msg a long help msg a long help msg a long help
- msg a long help msg a long help msg a long help msg a long help msg ")
- ;
- </programlisting>
- will specify a four-space indent for the first line. The output will
- look like:
- <screen>
- --help A long help msg a long
- help msg a long help msg
- a long help msg a long
- help msg a long help msg
- a long help msg a long
- help msg
- </screen>
- </para>
- <para>For the case where line is wrapped, you can want an additional
- indent for wrapped text. This can be done by
- inserting a tabulator character ('\t') at the desired position. For
- example:
- <programlisting>
- options.add_options()
- ("well_formated", "As you can see this is a very well formatted
- option description.\n"
- "You can do this for example:\n\n"
- "Values:\n"
- " Value1: \tdoes this and that, bla bla bla bla
- bla bla bla bla bla bla bla bla bla bla bla\n"
- " Value2: \tdoes something else, bla bla bla bla
- bla bla bla bla bla bla bla bla bla bla bla\n\n"
- " This paragraph has a first line indent only,
- bla bla bla bla bla bla bla bla bla bla bla bla bla bla bla");
- </programlisting>
- will produce:
- <screen>
- --well_formated As you can see this is a
- very well formatted
- option description.
- You can do this for
- example:
- Values:
- Value1: does this and
- that, bla bla
- bla bla bla bla
- bla bla bla bla
- bla bla bla bla
- bla
- Value2: does something
- else, bla bla
- bla bla bla bla
- bla bla bla bla
- bla bla bla bla
- bla
- This paragraph has a
- first line indent only,
- bla bla bla bla bla bla
- bla bla bla bla bla bla
- bla bla bla
- </screen>
- The tab character is removed before output. Only one tabulator per
- paragraph is allowed, otherwise an exception of type
- program_options::error is thrown. Finally, the tabulator is ignored if
- it is not on the first line of the paragraph or is on the last
- possible position of the first line.
- </para>
- </section>
- </section>
- <section>
- <title>Semantic Information</title>
- <para>The semantic information is completely provided by the
- <classname>boost::program_options::value_semantic</classname> class. For
- example:
- <programlisting>
- options_description desc;
- desc.add_options()
- ("compression", value<int>()->default_value(10), "compression level")
- ("email", value< vector<string> >()
- ->composing()->notifier(&your_function), "email")
- ;
- </programlisting>
- These declarations specify that default value of the first option is 10,
- that the second option can appear several times and all instances should
- be merged, and that after parsing is done, the library will call
- function <code>&your_function</code>, passing the value of the
- "email" option as argument.
- </para>
- </section>
- <section>
- <title>Positional Options</title>
- <para>Our definition of option as (name, value) pairs is simple and
- useful, but in one special case of the command line, there's a
- problem. A command line can include a <firstterm>positional option</firstterm>,
- which does not specify any name at all, for example:
- <screen>
- archiver --compression=9 /etc/passwd
- </screen>
- Here, the "/etc/passwd" element does not have any option name.
- </para>
- <para>One solution is to ask the user to extract positional options
- himself and process them as he likes. However, there's a nicer approach
- -- provide a method to automatically assign the names for positional
- options, so that the above command line can be interpreted the same way
- as:
- <screen>
- archiver --compression=9 --input-file=/etc/passwd
- </screen>
- </para>
- <para>The &positional_options_desc; class allows the command line
- parser to assign the names. The class specifies how many positional options
- are allowed, and for each allowed option, specifies the name. For example:
- <programlisting>
- positional_options_description pd; pd.add("input-file", 1);
- </programlisting> specifies that for exactly one, first, positional
- option the name will be "input-file".
- </para>
- <para>It's possible to specify that a number, or even all positional options, be
- given the same name.
- <programlisting>
- positional_options_description pd;
- pd.add("output-file", 2).add("input-file", -1);
- </programlisting>
- In the above example, the first two positional options will be associated
- with name "output-file", and any others with the name "input-file".
- </para>
- <warning>
- <para>The &positional_options_desc; class only specifies translation from
- position to name, and the option name should still be registered with
- an instance of the &options_description; class.</para>
- </warning>
- </section>
- <!-- Note that the classes are not modified during parsing -->
- </section>
- <section>
- <title>Parsers Component</title>
- <para>The parsers component splits input sources into (name, value) pairs.
- Each parser looks for possible options and consults the options
- description component to determine if the option is known and how its value
- is specified. In the simplest case, the name is explicitly specified,
- which allows the library to decide if such option is known. If it is known, the
- &value_semantic; instance determines how the value is specified. (If
- it is not known, an exception is thrown.) Common
- cases are when the value is explicitly specified by the user, and when
- the value cannot be specified by the user, but the presence of the
- option implies some value (for example, <code>true</code>). So, the
- parser checks that the value is specified when needed and not specified
- when not needed, and returns new (name, value) pair.
- </para>
- <para>
- To invoke a parser you typically call a function, passing the options
- description and command line or config file or something else.
- The results of parsing are returned as an instance of the &parsed_options;
- class. Typically, that object is passed directly to the storage
- component. However, it also can be used directly, or undergo some additional
- processing.
- </para>
- <para>
- There are three exceptions to the above model -- all related to
- traditional usage of the command line. While they require some support
- from the options description component, the additional complexity is
- tolerable.
- <itemizedlist>
- <listitem>
- <para>The name specified on the command line may be
- different from the option name -- it's common to provide a "short option
- name" alias to a longer name. It's also common to allow an abbreviated name
- to be specified on the command line.
- </para>
- </listitem>
- <listitem>
- <para>Sometimes it's desirable to specify value as several
- tokens. For example, an option "--email-recipient" may be followed
- by several emails, each as a separate command line token. This
- behaviour is supported, though it can lead to parsing ambiguities
- and is not enabled by default.
- </para>
- </listitem>
- <listitem>
- <para>The command line may contain positional options -- elements
- which don't have any name. The command line parser provides a
- mechanism to guess names for such options, as we've seen in the
- tutorial.
- </para>
- </listitem>
- </itemizedlist>
- </para>
- </section>
- <section>
- <title>Storage Component</title>
- <para>The storage component is responsible for:
- <itemizedlist>
- <listitem>
- <para>Storing the final values of an option into a special class and in
- regular variables</para>
- </listitem>
- <listitem>
- <para>Handling priorities among different sources.</para>
- </listitem>
- <listitem>
- <para>Calling user-specified <code>notify</code> functions with the final
- values of options.</para>
- </listitem>
- </itemizedlist>
- </para>
- <para>Let's consider an example:
- <programlisting>
- variables_map vm;
- store(parse_command_line(argc, argv, desc), vm);
- store(parse_config_file("example.cfg", desc), vm);
- notify(vm);
- </programlisting>
- The <code>variables_map</code> class is used to store the option
- values. The two calls to the <code>store</code> function add values
- found on the command line and in the config file. Finally the call to
- the <code>notify</code> function runs the user-specified notify
- functions and stores the values into regular variables, if needed.
- </para>
- <para>The priority is handled in a simple way: the <code>store</code>
- function will not change the value of an option if it's already
- assigned. In this case, if the command line specifies the value for an
- option, any value in the config file is ignored.
- </para>
- <warning>
- <para>Don't forget to call the <code>notify</code> function after you've
- stored all parsed values.</para>
- </warning>
- </section>
- <section>
- <title>Specific parsers</title>
- <section>
- <title>Configuration file parser</title>
- <para>The &parse_config_file; function implements parsing
- of simple INI-like configuration files. Configuration file
- syntax is line based:
- </para>
- <itemizedlist>
- <listitem><para>A line in the form:</para>
- <screen>
- <replaceable>name</replaceable>=<replaceable>value</replaceable>
- </screen>
- <para>gives a value to an option.</para>
- </listitem>
- <listitem><para>A line in the form:</para>
- <screen>
- [<replaceable>section name</replaceable>]
- </screen>
- <para>introduces a new section in the configuration file.</para>
- </listitem>
- <listitem><para>The <literal>#</literal> character introduces a
- comment that spans until the end of the line.</para>
- </listitem>
- </itemizedlist>
- <para>The option names are relative to the section names, so
- the following configuration file part:</para>
- <screen>
- [gui.accessibility]
- visual_bell=yes
- </screen>
- <para>is equivalent to</para>
- <screen>
- gui.accessibility.visual_bell=yes
- </screen>
- <para>When the option "gui.accessibility.visual_bell" has been added to the options</para>
- <programlisting>
- options_description desc;
- desc.add_options()
- ("gui.accessibility.visual_bell", value<string>(), "flash screen for bell")
- ;
- </programlisting>
- </section>
- <section>
- <title>Environment variables parser</title>
- <para><firstterm>Environment variables</firstterm> are string variables
- which are available to all programs via the <code>getenv</code> function
- of C runtime library. The operating system allows to set initial values
- for a given user, and the values can be further changed on the command
- line. For example, on Windows one can use the
- <filename>autoexec.bat</filename> file or (on recent versions) the
- <filename>Control Panel/System/Advanced/Environment Variables</filename>
- dialog, and on Unix —, the <filename>/etc/profile</filename>,
- <filename>~/.profile</filename> and <filename>~/.bash_profile</filename>
- files. Because environment variables can be set for the entire system,
- they are particularly suitable for options which apply to all programs.
- </para>
- <para>The environment variables can be parsed with the
- &parse_environment; function. The function have several overloaded
- versions. The first parameter is always an &options_description;
- instance, and the second specifies what variables must be processed, and
- what option names must correspond to it. To describe the second
- parameter we need to consider naming conventions for environment
- variables.</para>
- <para>If you have an option that should be specified via environment
- variable, you need make up the variable's name. To avoid name clashes,
- we suggest that you use a sufficiently unique prefix for environment
- variables. Also, while option names are most likely in lower case,
- environment variables conventionally use upper case. So, for an option
- name <literal>proxy</literal> the environment variable might be called
- <envar>BOOST_PROXY</envar>. During parsing, we need to perform reverse
- conversion of the names. This is accomplished by passing the choosen
- prefix as the second parameter of the &parse_environment; function.
- Say, if you pass <literal>BOOST_</literal> as the prefix, and there are
- two variables, <envar>CVSROOT</envar> and <envar>BOOST_PROXY</envar>, the
- first variable will be ignored, and the second one will be converted to
- option <literal>proxy</literal>.
- </para>
- <para>The above logic is sufficient in many cases, but it is also
- possible to pass, as the second parameter of the &parse_environment;
- function, any function taking a <code>std::string</code> and returning
- <code>std::string</code>. That function will be called for each
- environment variable and should return either the name of the option, or
- empty string if the variable should be ignored. An example showing this
- method can be found in "example/env_options.cpp".
- </para>
- </section>
- </section>
- <section>
- <title>Types</title>
- <para>Everything that is passed in on the command line, as an environmental
- variable, or in a config file is a string. For values that need to be used
- as a non-string type, the value in the variables_map will attempt to
- convert it to the correct type.</para>
- <para>Integers and floating point values are converted using Boost's
- lexical_cast. It will accept integer values such as "41" or "-42". It will
- accept floating point numbers such as "51.1", "-52.1", "53.1234567890" (as
- a double), "54", "55.", ".56", "57.1e5", "58.1E5", ".591e5", "60.1e-5",
- "-61.1e5", "-62.1e-5", etc. Unfortunately, hex, octal, and binary
- representations that are available in C++ literals are not supported by
- lexical_cast, and thus will not work with program_options.</para>
- <para>Booleans a special in that there are multiple ways to come at them.
- Similar to another value type, it can be specified as <code>("my-option",
- value<bool>())</code>, and then set as:</para>
- <screen>
- example --my-option=true
- </screen>
- <para>However, more typical is that boolean values are set by the simple
- presence of a switch. This is enabled by &bool_switch; as in <code>
- ("other-option", bool_switch())</code>. This will cause the value to
- default to false and it will become true if the switch is found:</para>
- <screen>
- example --other-switch
- </screen>
- <para>When a boolean does take a parameter, there are several options.
- Those that evaluate to true in C++ are: "true", "yes", "on", "1". Those
- that evaluate to false in C++ are: "false", "no", "off", "0". In addition,
- when reading from a config file, the option name with an equal sign and no
- value after it will also evaluate to true.</para>
- </section>
- <section>
- <title>Annotated List of Symbols</title>
- <para>The following table describes all the important symbols in the
- library, for quick access.</para>
- <informaltable pgwide="1">
- <tgroup cols="2">
- <colspec colname='c1'/>
- <colspec colname='c2'/>
- <thead>
- <row>
- <entry>Symbol</entry>
- <entry>Description</entry>
- </row>
- </thead>
- <tbody>
- <row>
- <entry namest='c1' nameend='c2'>Options description component</entry>
- </row>
- <row>
- <entry>&options_description;</entry>
- <entry>describes a number of options</entry>
- </row>
- <row>
- <entry>&value;</entry>
- <entry>defines the option's value</entry>
- </row>
- <row>
- <entry namest='c1' nameend='c2'>Parsers component</entry>
- </row>
- <row>
- <entry>&parse_command_line;</entry>
- <entry>parses command line (simpified interface)</entry>
- </row>
- <row>
- <entry>&basic_command_line_parser;</entry>
- <entry>parses command line (extended interface)</entry>
- </row>
- <row>
- <entry>&parse_config_file;</entry>
- <entry>parses config file</entry>
- </row>
- <row>
- <entry>&parse_environment;</entry>
- <entry>parses environment</entry>
- </row>
- <row>
- <entry namest='c1' nameend='c2'>Storage component</entry>
- </row>
- <row>
- <entry>&variables_map;</entry>
- <entry>storage for option values</entry>
- </row>
- </tbody>
- </tgroup>
- </informaltable>
- </section>
- </section>
- <!--
- Local Variables:
- mode: nxml
- sgml-indent-data: t
- sgml-parent-document: ("program_options.xml" "section")
- sgml-set-face: t
- End:
- -->
|