% Copyright Matthew Pulver 2018 - 2019. % Distributed under the Boost Software License, Version 1.0. % (See accompanying file LICENSE_1_0.txt or copy at % https://www.boost.org/LICENSE_1_0.txt) \documentclass{article} \usepackage{amsmath} %\usepackage{mathtools} \usepackage{amssymb} %\mathbb \usepackage{array} % m{} column in tabular \usepackage{csquotes} % displayquote \usepackage{fancyhdr} \usepackage{fancyvrb} \usepackage[margin=0.75in]{geometry} \usepackage{graphicx} \usepackage{hyperref} %\usepackage{listings} \usepackage{multirow} \usepackage[super]{nth} \usepackage{wrapfig} \usepackage{xcolor} \hypersetup{% colorlinks=false,% hyperlinks will be black linkbordercolor=blue,% hyperlink borders will be red urlbordercolor=blue,% pdfborderstyle={/S/U/W 1}% border style will be underline of width 1pt } \pagestyle{fancyplain} \fancyhf{} \renewcommand{\headrulewidth}{0pt} \cfoot[]{\thepage\\ \scriptsize\color{gray} Copyright \textcopyright\/ Matthew Pulver 2018--2019. Distributed under the Boost Software License, Version 1.0.\\ (See accompanying file LICENSE\_1\_0.txt or copy at \url{https://www.boost.org/LICENSE\_1\_0.txt})} \DeclareMathOperator{\sinc}{sinc} \begin{document} \title{Autodiff\\ \large Automatic Differentiation C++ Library} \author{Matthew Pulver} \maketitle %\date{} %\begin{abstract} %\end{abstract} \tableofcontents %\section{Synopsis} %\begingroup %\fontsize{10pt}{10pt}\selectfont %\begin{verbatim} % example/synopsis.cpp %\end{verbatim} %\endgroup \newpage \section{Description} Autodiff is a header-only C++ library that facilitates the \href{https://en.wikipedia.org/wiki/Automatic_differentiation}{automatic differentiation} (forward mode) of mathematical functions of single and multiple variables. This implementation is based upon the \href{https://en.wikipedia.org/wiki/Taylor_series}{Taylor series} expansion of an analytic function $f$ at the point $x_0$: \begin{align*} f(x_0+\varepsilon) &= f(x_0) + f'(x_0)\varepsilon + \frac{f''(x_0)}{2!}\varepsilon^2 + \frac{f'''(x_0)}{3!}\varepsilon^3 + \cdots \\ &= \sum_{n=0}^N\frac{f^{(n)}(x_0)}{n!}\varepsilon^n + O\left(\varepsilon^{N+1}\right). \end{align*} The essential idea of autodiff is the substitution of numbers with polynomials in the evaluation of $f(x_0)$. By substituting the number $x_0$ with the first-order polynomial $x_0+\varepsilon$, and using the same algorithm to compute $f(x_0+\varepsilon)$, the resulting polynomial in $\varepsilon$ contains the function's derivatives $f'(x_0)$, $f''(x_0)$, $f'''(x_0)$, ... within the coefficients. Each coefficient is equal to the derivative of its respective order, divided by the factorial of the order. In greater detail, assume one is interested in calculating the first $N$ derivatives of $f$ at $x_0$. Without loss of precision to the calculation of the derivatives, all terms $O\left(\varepsilon^{N+1}\right)$ that include powers of $\varepsilon$ greater than $N$ can be discarded. (This is due to the fact that each term in a polynomial depends only upon equal and lower-order terms under arithmetic operations.) Under these truncation rules, $f$ provides a polynomial-to-polynomial transformation: \[ f \qquad : \qquad x_0+\varepsilon \qquad \mapsto \qquad \sum_{n=0}^Ny_n\varepsilon^n=\sum_{n=0}^N\frac{f^{(n)}(x_0)}{n!}\varepsilon^n. \] C++'s ability to overload operators and functions allows for the creation of a class {\tt fvar} (\underline{f}orward-mode autodiff \underline{var}iable) that represents polynomials in $\varepsilon$. Thus the same algorithm $f$ that calculates the numeric value of $y_0=f(x_0)$, when written to accept and return variables of a generic (template) type, is also used to calculate the polynomial $\sum_{n=0}^Ny_n\varepsilon^n=f(x_0+\varepsilon)$. The derivatives $f^{(n)}(x_0)$ are then found from the product of the respective factorial $n!$ and coefficient $y_n$: \[ \frac{d^nf}{dx^n}(x_0)=n!y_n. \] \section{Examples} \subsection{Example 1: Single-variable derivatives} \subsubsection{Calculate derivatives of $f(x)=x^4$ at $x=2$.} In this example, {\tt make\_fvar(2.0)} instantiates the polynomial $2+\varepsilon$. The {\tt Order=5} means that enough space is allocated (on the stack) to hold a polynomial of up to degree 5 during the proceeding computation. Internally, this is modeled by a {\tt std::array} whose elements {\tt \{2, 1, 0, 0, 0, 0\}} correspond to the 6 coefficients of the polynomial upon initialization. Its fourth power, at the end of the computation, is a polynomial with coefficients {\tt y = \{16, 32, 24, 8, 1, 0\}}. The derivatives are obtained using the formula $f^{(n)}(2)=n!*{\tt y[n]}$. \begin{verbatim} #include #include template T fourth_power(T const& x) { T x4 = x * x; // retval in operator*() uses x4's memory via NRVO. x4 *= x4; // No copies of x4 are made within operator*=() even when squaring. return x4; // x4 uses y's memory in main() via NRVO. } int main() { using namespace boost::math::differentiation; constexpr unsigned Order = 5; // Highest order derivative to be calculated. auto const x = make_fvar(2.0); // Find derivatives at x=2. auto const y = fourth_power(x); for (unsigned i = 0; i <= Order; ++i) std::cout << "y.derivative(" << i << ") = " << y.derivative(i) << std::endl; return 0; } /* Output: y.derivative(0) = 16 y.derivative(1) = 32 y.derivative(2) = 48 y.derivative(3) = 48 y.derivative(4) = 24 y.derivative(5) = 0 */ \end{verbatim} The above calculates \begin{alignat*}{3} {\tt y.derivative(0)} &=& f(2) =&& \left.x^4\right|_{x=2} &= 16\\ {\tt y.derivative(1)} &=& f'(2) =&& \left.4\cdot x^3\right|_{x=2} &= 32\\ {\tt y.derivative(2)} &=& f''(2) =&& \left.4\cdot 3\cdot x^2\right|_{x=2} &= 48\\ {\tt y.derivative(3)} &=& f'''(2) =&& \left.4\cdot 3\cdot2\cdot x\right|_{x=2} &= 48\\ {\tt y.derivative(4)} &=& f^{(4)}(2) =&& 4\cdot 3\cdot2\cdot1 &= 24\\ {\tt y.derivative(5)} &=& f^{(5)}(2) =&& 0 & \end{alignat*} \subsection{Example 2: Multi-variable mixed partial derivatives with multi-precision data type}\label{multivar} \subsubsection{Calculate $\frac{\partial^{12}f}{\partial w^{3}\partial x^{2}\partial y^{4}\partial z^{3}}(11,12,13,14)$ with a precision of about 50 decimal digits,\\ where $f(w,x,y,z)=\exp\left(w\sin\left(\frac{x\log(y)}{z}\right)+\sqrt{\frac{wz}{xy}}\right)+\frac{w^2}{\tan(z)}$.} In this example, {\tt make\_ftuple(11, 12, 13, 14)} returns a {\tt std::tuple} of 4 independent {\tt fvar} variables, with values of 11, 12, 13, and 14, for which the maximum order derivative to be calculated for each are 3, 2, 4, 3, respectively. The order of the variables is important, as it is the same order used when calling {\tt v.derivative(Nw, Nx, Ny, Nz)} in the example below. \begin{verbatim} #include #include #include using namespace boost::math::differentiation; template promote f(const W& w, const X& x, const Y& y, const Z& z) { using namespace std; return exp(w * sin(x * log(y) / z) + sqrt(w * z / (x * y))) + w * w / tan(z); } int main() { using float50 = boost::multiprecision::cpp_bin_float_50; constexpr unsigned Nw = 3; // Max order of derivative to calculate for w constexpr unsigned Nx = 2; // Max order of derivative to calculate for x constexpr unsigned Ny = 4; // Max order of derivative to calculate for y constexpr unsigned Nz = 3; // Max order of derivative to calculate for z // Declare 4 independent variables together into a std::tuple. auto const variables = make_ftuple(11, 12, 13, 14); auto const& w = std::get<0>(variables); // Up to Nw derivatives at w=11 auto const& x = std::get<1>(variables); // Up to Nx derivatives at x=12 auto const& y = std::get<2>(variables); // Up to Ny derivatives at y=13 auto const& z = std::get<3>(variables); // Up to Nz derivatives at z=14 auto const v = f(w, x, y, z); // Calculated from Mathematica symbolic differentiation. float50 const answer("1976.319600747797717779881875290418720908121189218755"); std::cout << std::setprecision(std::numeric_limits::digits10) << "mathematica : " << answer << '\n' << "autodiff : " << v.derivative(Nw, Nx, Ny, Nz) << '\n' << std::setprecision(3) << "relative error: " << (v.derivative(Nw, Nx, Ny, Nz) / answer - 1) << '\n'; return 0; } /* Output: mathematica : 1976.3196007477977177798818752904187209081211892188 autodiff : 1976.3196007477977177798818752904187209081211892188 relative error: 2.67e-50 */ \end{verbatim} \subsection{Example 3: Black-Scholes Option Pricing with Greeks Automatically Calculated} \subsubsection{Calculate greeks directly from the Black-Scholes pricing function.} Below is the standard Black-Scholes pricing function written as a function template, where the price, volatility (sigma), time to expiration (tau) and interest rate are template parameters. This means that any Greek based on these 4 variables can be calculated using autodiff. The below example calculates delta and gamma where the variable of differentiation is only the price. For examples of more exotic greeks, see {\tt example/black\_scholes.cpp}. \begin{verbatim} #include #include using namespace boost::math::constants; using namespace boost::math::differentiation; // Equations and function/variable names are from // https://en.wikipedia.org/wiki/Greeks_(finance)#Formulas_for_European_option_Greeks // Standard normal cumulative distribution function template X Phi(X const& x) { return 0.5 * erfc(-one_div_root_two() * x); } enum class CP { call, put }; // Assume zero annual dividend yield (q=0). template promote black_scholes_option_price(CP cp, double K, Price const& S, Sigma const& sigma, Tau const& tau, Rate const& r) { using namespace std; auto const d1 = (log(S / K) + (r + sigma * sigma / 2) * tau) / (sigma * sqrt(tau)); auto const d2 = (log(S / K) + (r - sigma * sigma / 2) * tau) / (sigma * sqrt(tau)); switch (cp) { case CP::call: return S * Phi(d1) - exp(-r * tau) * K * Phi(d2); case CP::put: return exp(-r * tau) * K * Phi(-d2) - S * Phi(-d1); } } int main() { double const K = 100.0; // Strike price. auto const S = make_fvar(105); // Stock price. double const sigma = 5; // Volatility. double const tau = 30.0 / 365; // Time to expiration in years. (30 days). double const r = 1.25 / 100; // Interest rate. auto const call_price = black_scholes_option_price(CP::call, K, S, sigma, tau, r); auto const put_price = black_scholes_option_price(CP::put, K, S, sigma, tau, r); std::cout << "black-scholes call price = " << call_price.derivative(0) << '\n' << "black-scholes put price = " << put_price.derivative(0) << '\n' << "call delta = " << call_price.derivative(1) << '\n' << "put delta = " << put_price.derivative(1) << '\n' << "call gamma = " << call_price.derivative(2) << '\n' << "put gamma = " << put_price.derivative(2) << '\n'; return 0; } /* Output: black-scholes call price = 56.5136 black-scholes put price = 51.4109 call delta = 0.773818 put delta = -0.226182 call gamma = 0.00199852 put gamma = 0.00199852 */ \end{verbatim} \section{Advantages of Automatic Differentiation} The above examples illustrate some of the advantages of using autodiff: \begin{itemize} \item Elimination of code redundancy. The existence of $N$ separate functions to calculate derivatives is a form of code redundancy, with all the liabilities that come with it: \begin{itemize} \item Changes to one function require $N$ additional changes to other functions. In the \nth{3} example above, consider how much larger and inter-dependent the above code base would be if a separate function were written for \href{https://en.wikipedia.org/wiki/Greeks\_(finance)#Formulas\_for\_European\_option\_Greeks} {each Greek} value. \item Dependencies upon a derivative function for a different purpose will break when changes are made to the original function. What doesn't need to exist cannot break. \item Code bloat, reducing conceptual integrity. Control over the evolution of code is easier/safer when the code base is smaller and able to be intuitively grasped. \end{itemize} \item Accuracy of derivatives over finite difference methods. Single-iteration finite difference methods always include a $\Delta x$ free variable that must be carefully chosen for each application. If $\Delta x$ is too small, then numerical errors become large. If $\Delta x$ is too large, then mathematical errors become large. With autodiff, there are no free variables to set and the accuracy of the answer is generally superior to finite difference methods even with the best choice of $\Delta x$. \end{itemize} \section{Mathematics} In order for the usage of the autodiff library to make sense, a basic understanding of the mathematics will help. \subsection{Truncated Taylor Series} Basic calculus courses teach that a real \href{https://en.wikipedia.org/wiki/Analytic_function}{analytic function} $f : D\rightarrow\mathbb{R}$ is one which can be expressed as a Taylor series at a point $x_0\in D\subseteq\mathbb{R}$: \[ f(x) = f(x_0) + f'(x_0)(x-x_0) + \frac{f''(x_0)}{2!}(x-x_0)^2 + \frac{f'''(x_0)}{3!}(x-x_0)^3 + \cdots \] One way of thinking about this form is that given the value of an analytic function $f(x_0)$ and its derivatives $f'(x_0), f''(x_0), f'''(x_0), ...$ evaluated at a point $x_0$, then the value of the function $f(x)$ can be obtained at any other point $x\in D$ using the above formula. Let us make the substitution $x=x_0+\varepsilon$ and rewrite the above equation to get: \[ f(x_0+\varepsilon) = f(x_0) + f'(x_0)\varepsilon + \frac{f''(x_0)}{2!}\varepsilon^2 + \frac{f'''(x_0)}{3!}\varepsilon^3 + \cdots \] Now consider $\varepsilon$ as {\it an abstract algebraic entity that never acquires a numeric value}, much like one does in basic algebra with variables like $x$ or $y$. For example, we can still manipulate entities like $xy$ and $(1+2x+3x^2)$ without having to assign specific numbers to them. Using this formula, autodiff goes in the other direction. Given a general formula/algorithm for calculating $f(x_0+\varepsilon)$, the derivatives are obtained from the coefficients of the powers of $\varepsilon$ in the resulting computation. The general coefficient for $\varepsilon^n$ is \[\frac{f^{(n)}(x_0)}{n!}.\] Thus to obtain $f^{(n)}(x_0)$, the coefficient of $\varepsilon^n$ is multiplied by $n!$. \subsubsection{Example} Apply the above technique to calculate the derivatives of $f(x)=x^4$ at $x_0=2$. The first step is to evaluate $f(x_0+\varepsilon)$ and simply go through the calculation/algorithm, treating $\varepsilon$ as an abstract algebraic entity: \begin{align*} f(x_0+\varepsilon) &= f(2+\varepsilon) \\ &= (2+\varepsilon)^4 \\ &= \left(4+4\varepsilon+\varepsilon^2\right)^2 \\ &= 16+32\varepsilon+24\varepsilon^2+8\varepsilon^3+\varepsilon^4. \end{align*} Equating the powers of $\varepsilon$ from this result with the above $\varepsilon$-taylor expansion yields the following equalities: \[ f(2) = 16, \qquad f'(2) = 32, \qquad \frac{f''(2)}{2!} = 24, \qquad \frac{f'''(2)}{3!} = 8, \qquad \frac{f^{(4)}(2)}{4!} = 1, \qquad \frac{f^{(5)}(2)}{5!} = 0. \] Multiplying both sides by the respective factorials gives \[ f(2) = 16, \qquad f'(2) = 32, \qquad f''(2) = 48, \qquad f'''(2) = 48, \qquad f^{(4)}(2) = 24, \qquad f^{(5)}(2) = 0. \] These values can be directly confirmed by the \href{https://en.wikipedia.org/wiki/Power_rule}{power rule} applied to $f(x)=x^4$. \subsection{Arithmetic} What was essentially done above was to take a formula/algorithm for calculating $f(x_0)$ from a number $x_0$, and instead apply the same formula/algorithm to a polynomial $x_0+\varepsilon$. Intermediate steps operate on values of the form \[ {\bf x} = x_0 + x_1\varepsilon + x_2\varepsilon^2 +\cdots+ x_N\varepsilon^N \] and the final return value is of this polynomial form as well. In other words, the normal arithmetic operators $+,-,\times,\div$ applied to numbers $x$ are instead applied to polynomials $\bf x$. Through the overloading of C++ operators and functions, floating point data types are replaced with data types that represent these polynomials. More specifically, C++ types such as {\tt double} are replaced with {\tt std::array}, which hold the above $N+1$ coefficients $x_i$, and are wrapped in a {\tt class} that overloads all of the arithmetic operators. The logic of these arithmetic operators simply mirror that which is applied to polynomials. We'll look at each of the 4 arithmetic operators in detail. \subsubsection{Addition} The addition of polynomials $\bf x$ and $\bf y$ is done component-wise: \begin{align*} {\bf z} &= {\bf x} + {\bf y} \\ &= \left(\sum_{i=0}^Nx_i\varepsilon^i\right) + \left(\sum_{i=0}^Ny_i\varepsilon^i\right) \\ &= \sum_{i=0}^N(x_i+y_i)\varepsilon^i \\ z_i &= x_i + y_i \qquad \text{for}\; i\in\{0,1,2,...,N\}. \end{align*} \subsubsection{Subtraction} Subtraction follows the same form as addition: \begin{align*} {\bf z} &= {\bf x} - {\bf y} \\ &= \left(\sum_{i=0}^Nx_i\varepsilon^i\right) - \left(\sum_{i=0}^Ny_i\varepsilon^i\right) \\ &= \sum_{i=0}^N(x_i-y_i)\varepsilon^i \\ z_i &= x_i - y_i \qquad \text{for}\; i\in\{0,1,2,...,N\}. \end{align*} \subsubsection{Multiplication} Multiplication produces higher-order terms: \begin{align*} {\bf z} &= {\bf x} \times {\bf y} \\ &= \left(\sum_{i=0}^Nx_i\varepsilon^i\right) \left(\sum_{i=0}^Ny_i\varepsilon^i\right) \\ &= x_0y_0 + (x_0y_1+x_1y_0)\varepsilon + (x_0y_2+x_1y_1+x_2y_0)\varepsilon^2 + \cdots + \left(\sum_{j=0}^Nx_jy_{N-j}\right)\varepsilon^N + O\left(\varepsilon^{N+1}\right) \\ &= \sum_{i=0}^N\sum_{j=0}^ix_jy_{i-j}\varepsilon^i + O\left(\varepsilon^{N+1}\right) \\ z_i &= \sum_{j=0}^ix_jy_{i-j} \qquad \text{for}\; i\in\{0,1,2,...,N\}. \end{align*} In the case of multiplication, terms involving powers of $\varepsilon$ greater than $N$, collectively denoted by $O\left(\varepsilon^{N+1}\right)$, are simply discarded. Fortunately, the values of $z_i$ for $i\le N$ do not depend on any of these discarded terms, so there is no loss of precision in the final answer. The only information that is lost are the values of higher order derivatives, which we are not interested in anyway. If we were, then we would have simply chosen a larger value of $N$ to begin with. \subsubsection{Division} Division is not directly calculated as are the others. Instead, to find the components of ${\bf z}={\bf x}\div{\bf y}$ we require that ${\bf x}={\bf y}\times{\bf z}$. This yields a recursive formula for the components $z_i$: \begin{align*} x_i &= \sum_{j=0}^iy_jz_{i-j} \\ &= y_0z_i + \sum_{j=1}^iy_jz_{i-j} \\ z_i &= \frac{1}{y_0}\left(x_i - \sum_{j=1}^iy_jz_{i-j}\right) \qquad \text{for}\; i\in\{0,1,2,...,N\}. \end{align*} In the case of division, the values for $z_i$ must be calculated sequentially, since $z_i$ depends on the previously calculated values $z_0, z_1, ..., z_{i-1}$. \subsection{General Functions} Calling standard mathematical functions such as {\tt log()}, {\tt cos()}, etc. should return accurate higher order derivatives. For example, {\tt exp(x)} may be written internally as a specific \nth{14}-degree polynomial to approximate $e^x$ when $0()} once and only once. \end{itemize} The tuple of values returned are independent. Though it is possible to achieve the same result with multiple calls to {\tt make\_fvar}, this is an easier and less error-prone method. See Section~\ref{multivar} for example usage. %\section{Usage} % %\subsection{Single Variable} % %To calculate derivatives of a single variable $x$, at a particular value $x_0$, the following must be %specified at compile-time: % %\begin{enumerate} %\item The numeric data type {\tt T} of $x_0$. Examples: {\tt double}, % {\tt boost::multiprecision::cpp\_bin\_float\_50}, etc. %\item The maximum derivative order $M$ that is to be calculated with respect to $x$. %\end{enumerate} %Note that both of these requirements are entirely analogous to declaring and using a {\tt std::array}. {\tt T} %and {\tt N} must be set as compile-time, but which elements in the array are accessed can be determined at run-time, %just as the choice of what derivatives to query in autodiff can be made during run-time. % %To declare and initialize $x$: % %\begin{verbatim} % using namespace boost::math::differentiation; % autodiff_fvar x = make_fvar(x0); %\end{verbatim} %where {\tt x0} is a run-time value of type {\tt T}. Assuming {\tt 0 < M}, this represents the polynomial $x_0 + %\varepsilon$. Internally, the member variable of type {\tt std::array} is {\tt v = \{ x0, 1, 0, 0, ... \}}, %consistent with the above mathematical treatise. % %To find the derivatives $f^{(n)}(x_0)$ for $0\le n\le M$ of a function %$f : \mathbb{R}\rightarrow\mathbb{R}$, the function can be represented as a template % %\begin{verbatim} % template % T f(T x); %\end{verbatim} %Using a generic type {\tt T} allows for {\tt x} to be of a regular type such as {\tt double}, but also allows for\\ %{\tt boost::math::differentiation::autodiff\_fvar<>} types. % %Internal calls to mathematical functions must allow for %\href{https://en.cppreference.com/w/cpp/language/adl}{argument-dependent lookup} (ADL). Many standard library functions %are overloaded in the {\tt boost::math::differentiation} namespace. For example, instead of calling {\tt std::cos(x)} %from within {\tt f}, include the line {\tt using std::cos;} and call {\tt cos(x)} without a namespace prefix. % %Calling $f$ and retrieving the calculated value and derivatives: % %\begin{verbatim} % using namespace boost::math::differentiation; % autodiff_fvar x = make_fvar(x0); % autodiff_fvar y = f(x); % for (int n=0 ; n<=M ; ++n) % std::cout << "y.derivative("<(x0); // WRONG %\end{verbatim} %Mathematically, this represents $x_0+\varepsilon_y$, since the last template parameter corresponds to the %$y$ variable, and thus the resulting value will be invalid. % %\subsubsection{Type Promotion} % %The previous example can be optimized to save some unnecessary computation, by declaring smaller arrays, %and relying on autodiff's automatic type-promotion: % %\begin{verbatim} % using namespace boost::math::differentiation; % % autodiff_fvar x = make_fvar(x0); % autodiff_fvar y = make_fvar(y0); % autodiff_fvar z = make_fvar(z0); % % autodiff_fvar w = f(x,y,z); % % for (size_t nx=0 ; nx<=Nx ; ++nx) % for (size_t ny=0 ; ny<=Ny ; ++ny) % for (size_t nz=0 ; nz<=Nz ; ++nz) % std::cout << "w.derivative("<