123456789101112131415161718192021222324252627282930313233343536373839404142434445464748495051525354555657585960616263646566676869707172737475767778798081828384858687888990919293949596979899100101102103104105106107108109110111112113114115116117118119120121122123124125126127128129130131132133134135136137138139140141142143144145146147148149150151152153154155156157158159160161162163164165166167168169170171172173174175176177178179180181182183184185186187188189190191192193194195196197198199200201202203204205206207208209210211212213214215216217218219220221222223224225226227228229230231232233234235236237238239240241242243244245246247248249250251252253254255256257258259260261262263264265266267268269270271272273274275276277278279280281282283284285286287288289290291292293294295296297298299300301302303304305306307308309310311312313314315316317318319320321322323324325326327328329330331332333334335336337338339340341342343344345346347348349350351352353354355356357358359360361362363364365366367368369370371372373374375376377378379380381382383384385386387388389390391392393394395396397398399400401402403404405406407408409410411412413414415416417418419420421422423424425426427428429430431432433434435436437438439440441442443444445446447448449450451452453454455456457458459460461462463464465466467468469470471472473474475476477478479480481482483484485486487488489490491492493494495496497498499500501502503504505506507508509510511512513514515516517518519520521522523524525526527528529530531532533534535536537538539540541542543544545546547548549550551552553554555556557558559560561562563564565566567568569570571572573574575576577578579580581582583584585586587588589590591592593594595596597598599600601602603604605606607608609610611612613614615616617618619620621622623624625626627628629630631632633634635636637638639640641642643644645646647648649650651652653654655656657658659660661662663664665666667668669670671672673674675676677678679680681682683684685686687688689690691692693694695696697698699700701702703704705706707708709710711712713714715716717718719720721722723724725726727728729730731732733734735736737738739740741742743744745746747748749750751752753754755756757758759760761762763764765766767768769770771772773774775776777778779780781782783784785786787788789790791792793794795796797798799800801802803804805806807808809810811812813814815816817818819820821822823824825826827828829830831832833834835836837838839840841842843844845846847848849850851852853854855856857858859860861862863864865866867868869870871872873874875876877878879880881882883884885886887888889890891892893894895896897898899900901902903904905906907908909910911912913914915916917918919920921922923924925926927928929930931932933934935936937938939940941942943944945946947 |
- +++++++++++++++++++++++++++++++++++++++++++
- Building Hybrid Systems with Boost.Python
- +++++++++++++++++++++++++++++++++++++++++++
- :Author: David Abrahams
- :Contact: dave@boost-consulting.com
- :organization: `Boost Consulting`_
- :date: 2003-05-14
- :Author: Ralf W. Grosse-Kunstleve
- :copyright: Copyright David Abrahams and Ralf W. Grosse-Kunstleve 2003. All rights reserved
- .. contents:: Table of Contents
- .. _`Boost Consulting`: http://www.boost-consulting.com
- ==========
- Abstract
- ==========
- Boost.Python is an open source C++ library which provides a concise
- IDL-like interface for binding C++ classes and functions to
- Python. Leveraging the full power of C++ compile-time introspection
- and of recently developed metaprogramming techniques, this is achieved
- entirely in pure C++, without introducing a new syntax.
- Boost.Python's rich set of features and high-level interface make it
- possible to engineer packages from the ground up as hybrid systems,
- giving programmers easy and coherent access to both the efficient
- compile-time polymorphism of C++ and the extremely convenient run-time
- polymorphism of Python.
- ==============
- Introduction
- ==============
- Python and C++ are in many ways as different as two languages could
- be: while C++ is usually compiled to machine-code, Python is
- interpreted. Python's dynamic type system is often cited as the
- foundation of its flexibility, while in C++ static typing is the
- cornerstone of its efficiency. C++ has an intricate and difficult
- compile-time meta-language, while in Python, practically everything
- happens at runtime.
- Yet for many programmers, these very differences mean that Python and
- C++ complement one another perfectly. Performance bottlenecks in
- Python programs can be rewritten in C++ for maximal speed, and
- authors of powerful C++ libraries choose Python as a middleware
- language for its flexible system integration capabilities.
- Furthermore, the surface differences mask some strong similarities:
- * 'C'-family control structures (if, while, for...)
- * Support for object-orientation, functional programming, and generic
- programming (these are both *multi-paradigm* programming languages.)
- * Comprehensive operator overloading facilities, recognizing the
- importance of syntactic variability for readability and
- expressivity.
- * High-level concepts such as collections and iterators.
- * High-level encapsulation facilities (C++: namespaces, Python: modules)
- to support the design of re-usable libraries.
- * Exception-handling for effective management of error conditions.
- * C++ idioms in common use, such as handle/body classes and
- reference-counted smart pointers mirror Python reference semantics.
- Given Python's rich 'C' interoperability API, it should in principle
- be possible to expose C++ type and function interfaces to Python with
- an analogous interface to their C++ counterparts. However, the
- facilities provided by Python alone for integration with C++ are
- relatively meager. Compared to C++ and Python, 'C' has only very
- rudimentary abstraction facilities, and support for exception-handling
- is completely missing. 'C' extension module writers are required to
- manually manage Python reference counts, which is both annoyingly
- tedious and extremely error-prone. Traditional extension modules also
- tend to contain a great deal of boilerplate code repetition which
- makes them difficult to maintain, especially when wrapping an evolving
- API.
- These limitations have lead to the development of a variety of wrapping
- systems. SWIG_ is probably the most popular package for the
- integration of C/C++ and Python. A more recent development is SIP_,
- which was specifically designed for interfacing Python with the Qt_
- graphical user interface library. Both SWIG and SIP introduce their
- own specialized languages for customizing inter-language bindings.
- This has certain advantages, but having to deal with three different
- languages (Python, C/C++ and the interface language) also introduces
- practical and mental difficulties. The CXX_ package demonstrates an
- interesting alternative. It shows that at least some parts of
- Python's 'C' API can be wrapped and presented through a much more
- user-friendly C++ interface. However, unlike SWIG and SIP, CXX does
- not include support for wrapping C++ classes as new Python types.
- The features and goals of Boost.Python_ overlap significantly with
- many of these other systems. That said, Boost.Python attempts to
- maximize convenience and flexibility without introducing a separate
- wrapping language. Instead, it presents the user with a high-level
- C++ interface for wrapping C++ classes and functions, managing much of
- the complexity behind-the-scenes with static metaprogramming.
- Boost.Python also goes beyond the scope of earlier systems by
- providing:
- * Support for C++ virtual functions that can be overridden in Python.
- * Comprehensive lifetime management facilities for low-level C++
- pointers and references.
- * Support for organizing extensions as Python packages,
- with a central registry for inter-language type conversions.
- * A safe and convenient mechanism for tying into Python's powerful
- serialization engine (pickle).
- * Coherence with the rules for handling C++ lvalues and rvalues that
- can only come from a deep understanding of both the Python and C++
- type systems.
- The key insight that sparked the development of Boost.Python is that
- much of the boilerplate code in traditional extension modules could be
- eliminated using C++ compile-time introspection. Each argument of a
- wrapped C++ function must be extracted from a Python object using a
- procedure that depends on the argument type. Similarly the function's
- return type determines how the return value will be converted from C++
- to Python. Of course argument and return types are part of each
- function's type, and this is exactly the source from which
- Boost.Python deduces most of the information required.
- This approach leads to *user guided wrapping*: as much information is
- extracted directly from the source code to be wrapped as is possible
- within the framework of pure C++, and some additional information is
- supplied explicitly by the user. Mostly the guidance is mechanical
- and little real intervention is required. Because the interface
- specification is written in the same full-featured language as the
- code being exposed, the user has unprecedented power available when
- she does need to take control.
- .. _Python: http://www.python.org/
- .. _SWIG: http://www.swig.org/
- .. _SIP: http://www.riverbankcomputing.co.uk/sip/index.php
- .. _Qt: http://www.trolltech.com/
- .. _CXX: http://cxx.sourceforge.net/
- .. _Boost.Python: http://www.boost.org/libs/python/doc
- ===========================
- Boost.Python Design Goals
- ===========================
- The primary goal of Boost.Python is to allow users to expose C++
- classes and functions to Python using nothing more than a C++
- compiler. In broad strokes, the user experience should be one of
- directly manipulating C++ objects from Python.
- However, it's also important not to translate all interfaces *too*
- literally: the idioms of each language must be respected. For
- example, though C++ and Python both have an iterator concept, they are
- expressed very differently. Boost.Python has to be able to bridge the
- interface gap.
- It must be possible to insulate Python users from crashes resulting
- from trivial misuses of C++ interfaces, such as accessing
- already-deleted objects. By the same token the library should
- insulate C++ users from low-level Python 'C' API, replacing
- error-prone 'C' interfaces like manual reference-count management and
- raw ``PyObject`` pointers with more-robust alternatives.
- Support for component-based development is crucial, so that C++ types
- exposed in one extension module can be passed to functions exposed in
- another without loss of crucial information like C++ inheritance
- relationships.
- Finally, all wrapping must be *non-intrusive*, without modifying or
- even seeing the original C++ source code. Existing C++ libraries have
- to be wrappable by third parties who only have access to header files
- and binaries.
- ==========================
- Hello Boost.Python World
- ==========================
- And now for a preview of Boost.Python, and how it improves on the raw
- facilities offered by Python. Here's a function we might want to
- expose::
- char const* greet(unsigned x)
- {
- static char const* const msgs[] = { "hello", "Boost.Python", "world!" };
- if (x > 2)
- throw std::range_error("greet: index out of range");
- return msgs[x];
- }
- To wrap this function in standard C++ using the Python 'C' API, we'd
- need something like this::
- extern "C" // all Python interactions use 'C' linkage and calling convention
- {
- // Wrapper to handle argument/result conversion and checking
- PyObject* greet_wrap(PyObject* args, PyObject * keywords)
- {
- int x;
- if (PyArg_ParseTuple(args, "i", &x)) // extract/check arguments
- {
- char const* result = greet(x); // invoke wrapped function
- return PyString_FromString(result); // convert result to Python
- }
- return 0; // error occurred
- }
- // Table of wrapped functions to be exposed by the module
- static PyMethodDef methods[] = {
- { "greet", greet_wrap, METH_VARARGS, "return one of 3 parts of a greeting" }
- , { NULL, NULL, 0, NULL } // sentinel
- };
- // module initialization function
- DL_EXPORT init_hello()
- {
- (void) Py_InitModule("hello", methods); // add the methods to the module
- }
- }
- Now here's the wrapping code we'd use to expose it with Boost.Python::
- #include <boost/python.hpp>
- using namespace boost::python;
- BOOST_PYTHON_MODULE(hello)
- {
- def("greet", greet, "return one of 3 parts of a greeting");
- }
- and here it is in action::
- >>> import hello
- >>> for x in range(3):
- ... print hello.greet(x)
- ...
- hello
- Boost.Python
- world!
- Aside from the fact that the 'C' API version is much more verbose,
- it's worth noting a few things that it doesn't handle correctly:
- * The original function accepts an unsigned integer, and the Python
- 'C' API only gives us a way of extracting signed integers. The
- Boost.Python version will raise a Python exception if we try to pass
- a negative number to ``hello.greet``, but the other one will proceed
- to do whatever the C++ implementation does when converting an
- negative integer to unsigned (usually wrapping to some very large
- number), and pass the incorrect translation on to the wrapped
- function.
- * That brings us to the second problem: if the C++ ``greet()``
- function is called with a number greater than 2, it will throw an
- exception. Typically, if a C++ exception propagates across the
- boundary with code generated by a 'C' compiler, it will cause a
- crash. As you can see in the first version, there's no C++
- scaffolding there to prevent this from happening. Functions wrapped
- by Boost.Python automatically include an exception-handling layer
- which protects Python users by translating unhandled C++ exceptions
- into a corresponding Python exception.
- * A slightly more-subtle limitation is that the argument conversion
- used in the Python 'C' API case can only get that integer ``x`` in
- *one way*. PyArg_ParseTuple can't convert Python ``long`` objects
- (arbitrary-precision integers) which happen to fit in an ``unsigned
- int`` but not in a ``signed long``, nor will it ever handle a
- wrapped C++ class with a user-defined implicit ``operator unsigned
- int()`` conversion. Boost.Python's dynamic type conversion
- registry allows users to add arbitrary conversion methods.
- ==================
- Library Overview
- ==================
- This section outlines some of the library's major features. Except as
- neccessary to avoid confusion, details of library implementation are
- omitted.
- ------------------
- Exposing Classes
- ------------------
- C++ classes and structs are exposed with a similarly-terse interface.
- Given::
- struct World
- {
- void set(std::string msg) { this->msg = msg; }
- std::string greet() { return msg; }
- std::string msg;
- };
- The following code will expose it in our extension module::
-
- #include <boost/python.hpp>
- BOOST_PYTHON_MODULE(hello)
- {
- class_<World>("World")
- .def("greet", &World::greet)
- .def("set", &World::set)
- ;
- }
- Although this code has a certain pythonic familiarity, people
- sometimes find the syntax bit confusing because it doesn't look like
- most of the C++ code they're used to. All the same, this is just
- standard C++. Because of their flexible syntax and operator
- overloading, C++ and Python are great for defining domain-specific
- (sub)languages
- (DSLs), and that's what we've done in Boost.Python. To break it down::
- class_<World>("World")
- constructs an unnamed object of type ``class_<World>`` and passes
- ``"World"`` to its constructor. This creates a new-style Python class
- called ``World`` in the extension module, and associates it with the
- C++ type ``World`` in the Boost.Python type conversion registry. We
- might have also written::
- class_<World> w("World");
- but that would've been more verbose, since we'd have to name ``w``
- again to invoke its ``def()`` member function::
- w.def("greet", &World::greet)
- There's nothing special about the location of the dot for member
- access in the original example: C++ allows any amount of whitespace on
- either side of a token, and placing the dot at the beginning of each
- line allows us to chain as many successive calls to member functions
- as we like with a uniform syntax. The other key fact that allows
- chaining is that ``class_<>`` member functions all return a reference
- to ``*this``.
- So the example is equivalent to::
- class_<World> w("World");
- w.def("greet", &World::greet);
- w.def("set", &World::set);
- It's occasionally useful to be able to break down the components of a
- Boost.Python class wrapper in this way, but the rest of this article
- will stick to the terse syntax.
- For completeness, here's the wrapped class in use: ::
- >>> import hello
- >>> planet = hello.World()
- >>> planet.set('howdy')
- >>> planet.greet()
- 'howdy'
- Constructors
- ============
- Since our ``World`` class is just a plain ``struct``, it has an
- implicit no-argument (nullary) constructor. Boost.Python exposes the
- nullary constructor by default, which is why we were able to write: ::
- >>> planet = hello.World()
- However, well-designed classes in any language may require constructor
- arguments in order to establish their invariants. Unlike Python,
- where ``__init__`` is just a specially-named method, In C++
- constructors cannot be handled like ordinary member functions. In
- particular, we can't take their address: ``&World::World`` is an
- error. The library provides a different interface for specifying
- constructors. Given::
- struct World
- {
- World(std::string msg); // added constructor
- ...
- we can modify our wrapping code as follows::
- class_<World>("World", init<std::string>())
- ...
- of course, a C++ class may have additional constructors, and we can
- expose those as well by passing more instances of ``init<...>`` to
- ``def()``::
- class_<World>("World", init<std::string>())
- .def(init<double, double>())
- ...
- Boost.Python allows wrapped functions, member functions, and
- constructors to be overloaded to mirror C++ overloading.
- Data Members and Properties
- ===========================
- Any publicly-accessible data members in a C++ class can be easily
- exposed as either ``readonly`` or ``readwrite`` attributes::
- class_<World>("World", init<std::string>())
- .def_readonly("msg", &World::msg)
- ...
- and can be used directly in Python: ::
- >>> planet = hello.World('howdy')
- >>> planet.msg
- 'howdy'
- This does *not* result in adding attributes to the ``World`` instance
- ``__dict__``, which can result in substantial memory savings when
- wrapping large data structures. In fact, no instance ``__dict__``
- will be created at all unless attributes are explicitly added from
- Python. Boost.Python owes this capability to the new Python 2.2 type
- system, in particular the descriptor interface and ``property`` type.
- In C++, publicly-accessible data members are considered a sign of poor
- design because they break encapsulation, and style guides usually
- dictate the use of "getter" and "setter" functions instead. In
- Python, however, ``__getattr__``, ``__setattr__``, and since 2.2,
- ``property`` mean that attribute access is just one more
- well-encapsulated syntactic tool at the programmer's disposal.
- Boost.Python bridges this idiomatic gap by making Python ``property``
- creation directly available to users. If ``msg`` were private, we
- could still expose it as attribute in Python as follows::
- class_<World>("World", init<std::string>())
- .add_property("msg", &World::greet, &World::set)
- ...
- The example above mirrors the familiar usage of properties in Python
- 2.2+: ::
- >>> class World(object):
- ... __init__(self, msg):
- ... self.__msg = msg
- ... def greet(self):
- ... return self.__msg
- ... def set(self, msg):
- ... self.__msg = msg
- ... msg = property(greet, set)
- Operator Overloading
- ====================
- The ability to write arithmetic operators for user-defined types has
- been a major factor in the success of both languages for numerical
- computation, and the success of packages like NumPy_ attests to the
- power of exposing operators in extension modules. Boost.Python
- provides a concise mechanism for wrapping operator overloads. The
- example below shows a fragment from a wrapper for the Boost rational
- number library::
- class_<rational<int> >("rational_int")
- .def(init<int, int>()) // constructor, e.g. rational_int(3,4)
- .def("numerator", &rational<int>::numerator)
- .def("denominator", &rational<int>::denominator)
- .def(-self) // __neg__ (unary minus)
- .def(self + self) // __add__ (homogeneous)
- .def(self * self) // __mul__
- .def(self + int()) // __add__ (heterogenous)
- .def(int() + self) // __radd__
- ...
- The magic is performed using a simplified application of "expression
- templates" [VELD1995]_, a technique originally developed for
- optimization of high-performance matrix algebra expressions. The
- essence is that instead of performing the computation immediately,
- operators are overloaded to construct a type *representing* the
- computation. In matrix algebra, dramatic optimizations are often
- available when the structure of an entire expression can be taken into
- account, rather than evaluating each operation "greedily".
- Boost.Python uses the same technique to build an appropriate Python
- method object based on expressions involving ``self``.
- .. _NumPy: http://www.pfdubois.com/numpy/
- Inheritance
- ===========
- C++ inheritance relationships can be represented to Boost.Python by adding
- an optional ``bases<...>`` argument to the ``class_<...>`` template
- parameter list as follows::
- class_<Derived, bases<Base1,Base2> >("Derived")
- ...
- This has two effects:
- 1. When the ``class_<...>`` is created, Python type objects
- corresponding to ``Base1`` and ``Base2`` are looked up in
- Boost.Python's registry, and are used as bases for the new Python
- ``Derived`` type object, so methods exposed for the Python ``Base1``
- and ``Base2`` types are automatically members of the ``Derived``
- type. Because the registry is global, this works correctly even if
- ``Derived`` is exposed in a different module from either of its
- bases.
- 2. C++ conversions from ``Derived`` to its bases are added to the
- Boost.Python registry. Thus wrapped C++ methods expecting (a
- pointer or reference to) an object of either base type can be
- called with an object wrapping a ``Derived`` instance. Wrapped
- member functions of class ``T`` are treated as though they have an
- implicit first argument of ``T&``, so these conversions are
- neccessary to allow the base class methods to be called for derived
- objects.
- Of course it's possible to derive new Python classes from wrapped C++
- class instances. Because Boost.Python uses the new-style class
- system, that works very much as for the Python built-in types. There
- is one significant detail in which it differs: the built-in types
- generally establish their invariants in their ``__new__`` function, so
- that derived classes do not need to call ``__init__`` on the base
- class before invoking its methods : ::
- >>> class L(list):
- ... def __init__(self):
- ... pass
- ...
- >>> L().reverse()
- >>>
- Because C++ object construction is a one-step operation, C++ instance
- data cannot be constructed until the arguments are available, in the
- ``__init__`` function: ::
- >>> class D(SomeBoostPythonClass):
- ... def __init__(self):
- ... pass
- ...
- >>> D().some_boost_python_method()
- Traceback (most recent call last):
- File "<stdin>", line 1, in ?
- TypeError: bad argument type for built-in operation
- This happened because Boost.Python couldn't find instance data of type
- ``SomeBoostPythonClass`` within the ``D`` instance; ``D``'s ``__init__``
- function masked construction of the base class. It could be corrected
- by either removing ``D``'s ``__init__`` function or having it call
- ``SomeBoostPythonClass.__init__(...)`` explicitly.
- Virtual Functions
- =================
- Deriving new types in Python from extension classes is not very
- interesting unless they can be used polymorphically from C++. In
- other words, Python method implementations should appear to override
- the implementation of C++ virtual functions when called *through base
- class pointers/references from C++*. Since the only way to alter the
- behavior of a virtual function is to override it in a derived class,
- the user must build a special derived class to dispatch a polymorphic
- class' virtual functions::
- //
- // interface to wrap:
- //
- class Base
- {
- public:
- virtual int f(std::string x) { return 42; }
- virtual ~Base();
- };
- int calls_f(Base const& b, std::string x) { return b.f(x); }
- //
- // Wrapping Code
- //
- // Dispatcher class
- struct BaseWrap : Base
- {
- // Store a pointer to the Python object
- BaseWrap(PyObject* self_) : self(self_) {}
- PyObject* self;
- // Default implementation, for when f is not overridden
- int f_default(std::string x) { return this->Base::f(x); }
- // Dispatch implementation
- int f(std::string x) { return call_method<int>(self, "f", x); }
- };
- ...
- def("calls_f", calls_f);
- class_<Base, BaseWrap>("Base")
- .def("f", &Base::f, &BaseWrap::f_default)
- ;
- Now here's some Python code which demonstrates: ::
- >>> class Derived(Base):
- ... def f(self, s):
- ... return len(s)
- ...
- >>> calls_f(Base(), 'foo')
- 42
- >>> calls_f(Derived(), 'forty-two')
- 9
- Things to notice about the dispatcher class:
- * The key element which allows overriding in Python is the
- ``call_method`` invocation, which uses the same global type
- conversion registry as the C++ function wrapping does to convert its
- arguments from C++ to Python and its return type from Python to C++.
- * Any constructor signatures you wish to wrap must be replicated with
- an initial ``PyObject*`` argument
- * The dispatcher must store this argument so that it can be used to
- invoke ``call_method``
- * The ``f_default`` member function is needed when the function being
- exposed is not pure virtual; there's no other way ``Base::f`` can be
- called on an object of type ``BaseWrap``, since it overrides ``f``.
- Deeper Reflection on the Horizon?
- =================================
- Admittedly, this formula is tedious to repeat, especially on a project
- with many polymorphic classes. That it is neccessary reflects some
- limitations in C++'s compile-time introspection capabilities: there's
- no way to enumerate the members of a class and find out which are
- virtual functions. At least one very promising project has been
- started to write a front-end which can generate these dispatchers (and
- other wrapping code) automatically from C++ headers.
- Pyste_ is being developed by Bruno da Silva de Oliveira. It builds on
- GCC_XML_, which generates an XML version of GCC's internal program
- representation. Since GCC is a highly-conformant C++ compiler, this
- ensures correct handling of the most-sophisticated template code and
- full access to the underlying type system. In keeping with the
- Boost.Python philosophy, a Pyste interface description is neither
- intrusive on the code being wrapped, nor expressed in some unfamiliar
- language: instead it is a 100% pure Python script. If Pyste is
- successful it will mark a move away from wrapping everything directly
- in C++ for many of our users. It will also allow us the choice to
- shift some of the metaprogram code from C++ to Python. We expect that
- soon, not only our users but the Boost.Python developers themselves
- will be "thinking hybrid" about their own code.
- .. _`GCC_XML`: http://www.gccxml.org/HTML/Index.html
- .. _`Pyste`: http://www.boost.org/libs/python/pyste
- ---------------
- Serialization
- ---------------
- *Serialization* is the process of converting objects in memory to a
- form that can be stored on disk or sent over a network connection. The
- serialized object (most often a plain string) can be retrieved and
- converted back to the original object. A good serialization system will
- automatically convert entire object hierarchies. Python's standard
- ``pickle`` module is just such a system. It leverages the language's strong
- runtime introspection facilities for serializing practically arbitrary
- user-defined objects. With a few simple and unintrusive provisions this
- powerful machinery can be extended to also work for wrapped C++ objects.
- Here is an example::
- #include <string>
- struct World
- {
- World(std::string a_msg) : msg(a_msg) {}
- std::string greet() const { return msg; }
- std::string msg;
- };
- #include <boost/python.hpp>
- using namespace boost::python;
- struct World_picklers : pickle_suite
- {
- static tuple
- getinitargs(World const& w) { return make_tuple(w.greet()); }
- };
- BOOST_PYTHON_MODULE(hello)
- {
- class_<World>("World", init<std::string>())
- .def("greet", &World::greet)
- .def_pickle(World_picklers())
- ;
- }
- Now let's create a ``World`` object and put it to rest on disk::
- >>> import hello
- >>> import pickle
- >>> a_world = hello.World("howdy")
- >>> pickle.dump(a_world, open("my_world", "w"))
- In a potentially *different script* on a potentially *different
- computer* with a potentially *different operating system*::
- >>> import pickle
- >>> resurrected_world = pickle.load(open("my_world", "r"))
- >>> resurrected_world.greet()
- 'howdy'
- Of course the ``cPickle`` module can also be used for faster
- processing.
- Boost.Python's ``pickle_suite`` fully supports the ``pickle`` protocol
- defined in the standard Python documentation. Like a __getinitargs__
- function in Python, the pickle_suite's getinitargs() is responsible for
- creating the argument tuple that will be use to reconstruct the pickled
- object. The other elements of the Python pickling protocol,
- __getstate__ and __setstate__ can be optionally provided via C++
- getstate and setstate functions. C++'s static type system allows the
- library to ensure at compile-time that nonsensical combinations of
- functions (e.g. getstate without setstate) are not used.
- Enabling serialization of more complex C++ objects requires a little
- more work than is shown in the example above. Fortunately the
- ``object`` interface (see next section) greatly helps in keeping the
- code manageable.
- ------------------
- Object interface
- ------------------
- Experienced 'C' language extension module authors will be familiar
- with the ubiquitous ``PyObject*``, manual reference-counting, and the
- need to remember which API calls return "new" (owned) references or
- "borrowed" (raw) references. These constraints are not just
- cumbersome but also a major source of errors, especially in the
- presence of exceptions.
- Boost.Python provides a class ``object`` which automates reference
- counting and provides conversion to Python from C++ objects of
- arbitrary type. This significantly reduces the learning effort for
- prospective extension module writers.
- Creating an ``object`` from any other type is extremely simple::
- object s("hello, world"); // s manages a Python string
- ``object`` has templated interactions with all other types, with
- automatic to-python conversions. It happens so naturally that it's
- easily overlooked::
- object ten_Os = 10 * s[4]; // -> "oooooooooo"
- In the example above, ``4`` and ``10`` are converted to Python objects
- before the indexing and multiplication operations are invoked.
- The ``extract<T>`` class template can be used to convert Python objects
- to C++ types::
- double x = extract<double>(o);
- If a conversion in either direction cannot be performed, an
- appropriate exception is thrown at runtime.
- The ``object`` type is accompanied by a set of derived types
- that mirror the Python built-in types such as ``list``, ``dict``,
- ``tuple``, etc. as much as possible. This enables convenient
- manipulation of these high-level types from C++::
- dict d;
- d["some"] = "thing";
- d["lucky_number"] = 13;
- list l = d.keys();
- This almost looks and works like regular Python code, but it is pure
- C++. Of course we can wrap C++ functions which accept or return
- ``object`` instances.
- =================
- Thinking hybrid
- =================
- Because of the practical and mental difficulties of combining
- programming languages, it is common to settle a single language at the
- outset of any development effort. For many applications, performance
- considerations dictate the use of a compiled language for the core
- algorithms. Unfortunately, due to the complexity of the static type
- system, the price we pay for runtime performance is often a
- significant increase in development time. Experience shows that
- writing maintainable C++ code usually takes longer and requires *far*
- more hard-earned working experience than developing comparable Python
- code. Even when developers are comfortable working exclusively in
- compiled languages, they often augment their systems by some type of
- ad hoc scripting layer for the benefit of their users without ever
- availing themselves of the same advantages.
- Boost.Python enables us to *think hybrid*. Python can be used for
- rapidly prototyping a new application; its ease of use and the large
- pool of standard libraries give us a head start on the way to a
- working system. If necessary, the working code can be used to
- discover rate-limiting hotspots. To maximize performance these can
- be reimplemented in C++, together with the Boost.Python bindings
- needed to tie them back into the existing higher-level procedure.
- Of course, this *top-down* approach is less attractive if it is clear
- from the start that many algorithms will eventually have to be
- implemented in C++. Fortunately Boost.Python also enables us to
- pursue a *bottom-up* approach. We have used this approach very
- successfully in the development of a toolbox for scientific
- applications. The toolbox started out mainly as a library of C++
- classes with Boost.Python bindings, and for a while the growth was
- mainly concentrated on the C++ parts. However, as the toolbox is
- becoming more complete, more and more newly added functionality can be
- implemented in Python.
- .. image:: images/python_cpp_mix.png
- This figure shows the estimated ratio of newly added C++ and Python
- code over time as new algorithms are implemented. We expect this
- ratio to level out near 70% Python. Being able to solve new problems
- mostly in Python rather than a more difficult statically typed
- language is the return on our investment in Boost.Python. The ability
- to access all of our code from Python allows a broader group of
- developers to use it in the rapid development of new applications.
- =====================
- Development history
- =====================
- The first version of Boost.Python was developed in 2000 by Dave
- Abrahams at Dragon Systems, where he was privileged to have Tim Peters
- as a guide to "The Zen of Python". One of Dave's jobs was to develop
- a Python-based natural language processing system. Since it was
- eventually going to be targeting embedded hardware, it was always
- assumed that the compute-intensive core would be rewritten in C++ to
- optimize speed and memory footprint [#proto]_. The project also wanted to
- test all of its C++ code using Python test scripts [#test]_. The only
- tool we knew of for binding C++ and Python was SWIG_, and at the time
- its handling of C++ was weak. It would be false to claim any deep
- insight into the possible advantages of Boost.Python's approach at
- this point. Dave's interest and expertise in fancy C++ template
- tricks had just reached the point where he could do some real damage,
- and Boost.Python emerged as it did because it filled a need and
- because it seemed like a cool thing to try.
- This early version was aimed at many of the same basic goals we've
- described in this paper, differing most-noticeably by having a
- slightly more cumbersome syntax and by lack of special support for
- operator overloading, pickling, and component-based development.
- These last three features were quickly added by Ullrich Koethe and
- Ralf Grosse-Kunstleve [#feature]_, and other enthusiastic contributors arrived
- on the scene to contribute enhancements like support for nested
- modules and static member functions.
- By early 2001 development had stabilized and few new features were
- being added, however a disturbing new fact came to light: Ralf had
- begun testing Boost.Python on pre-release versions of a compiler using
- the EDG_ front-end, and the mechanism at the core of Boost.Python
- responsible for handling conversions between Python and C++ types was
- failing to compile. As it turned out, we had been exploiting a very
- common bug in the implementation of all the C++ compilers we had
- tested. We knew that as C++ compilers rapidly became more
- standards-compliant, the library would begin failing on more
- platforms. Unfortunately, because the mechanism was so central to the
- functioning of the library, fixing the problem looked very difficult.
- Fortunately, later that year Lawrence Berkeley and later Lawrence
- Livermore National labs contracted with `Boost Consulting`_ for support
- and development of Boost.Python, and there was a new opportunity to
- address fundamental issues and ensure a future for the library. A
- redesign effort began with the low level type conversion architecture,
- building in standards-compliance and support for component-based
- development (in contrast to version 1 where conversions had to be
- explicitly imported and exported across module boundaries). A new
- analysis of the relationship between the Python and C++ objects was
- done, resulting in more intuitive handling for C++ lvalues and
- rvalues.
- The emergence of a powerful new type system in Python 2.2 made the
- choice of whether to maintain compatibility with Python 1.5.2 easy:
- the opportunity to throw away a great deal of elaborate code for
- emulating classic Python classes alone was too good to pass up. In
- addition, Python iterators and descriptors provided crucial and
- elegant tools for representing similar C++ constructs. The
- development of the generalized ``object`` interface allowed us to
- further shield C++ programmers from the dangers and syntactic burdens
- of the Python 'C' API. A great number of other features including C++
- exception translation, improved support for overloaded functions, and
- most significantly, CallPolicies for handling pointers and
- references, were added during this period.
- In October 2002, version 2 of Boost.Python was released. Development
- since then has concentrated on improved support for C++ runtime
- polymorphism and smart pointers. Peter Dimov's ingenious
- ``boost::shared_ptr`` design in particular has allowed us to give the
- hybrid developer a consistent interface for moving objects back and
- forth across the language barrier without loss of information. At
- first, we were concerned that the sophistication and complexity of the
- Boost.Python v2 implementation might discourage contributors, but the
- emergence of Pyste_ and several other significant feature
- contributions have laid those fears to rest. Daily questions on the
- Python C++-sig and a backlog of desired improvements show that the
- library is getting used. To us, the future looks bright.
- .. _`EDG`: http://www.edg.com
- =============
- Conclusions
- =============
- Boost.Python achieves seamless interoperability between two rich and
- complimentary language environments. Because it leverages template
- metaprogramming to introspect about types and functions, the user
- never has to learn a third syntax: the interface definitions are
- written in concise and maintainable C++. Also, the wrapping system
- doesn't have to parse C++ headers or represent the type system: the
- compiler does that work for us.
- Computationally intensive tasks play to the strengths of C++ and are
- often impossible to implement efficiently in pure Python, while jobs
- like serialization that are trivial in Python can be very difficult in
- pure C++. Given the luxury of building a hybrid software system from
- the ground up, we can approach design with new confidence and power.
- ===========
- Citations
- ===========
- .. [VELD1995] T. Veldhuizen, "Expression Templates," C++ Report,
- Vol. 7 No. 5 June 1995, pp. 26-31.
- http://osl.iu.edu/~tveldhui/papers/Expression-Templates/exprtmpl.html
- ===========
- Footnotes
- ===========
- .. [#proto] In retrospect, it seems that "thinking hybrid" from the
- ground up might have been better for the NLP system: the
- natural component boundaries defined by the pure python
- prototype turned out to be inappropriate for getting the
- desired performance and memory footprint out of the C++ core,
- which eventually caused some redesign overhead on the Python
- side when the core was moved to C++.
- .. [#test] We also have some reservations about driving all C++
- testing through a Python interface, unless that's the only way
- it will be ultimately used. Any transition across language
- boundaries with such different object models can inevitably
- mask bugs.
- .. [#feature] These features were expressed very differently in v1 of
- Boost.Python
|