autodiff.tex 49 KB

1234567891011121314151617181920212223242526272829303132333435363738394041424344454647484950515253545556575859606162636465666768697071727374757677787980818283848586878889909192939495969798991001011021031041051061071081091101111121131141151161171181191201211221231241251261271281291301311321331341351361371381391401411421431441451461471481491501511521531541551561571581591601611621631641651661671681691701711721731741751761771781791801811821831841851861871881891901911921931941951961971981992002012022032042052062072082092102112122132142152162172182192202212222232242252262272282292302312322332342352362372382392402412422432442452462472482492502512522532542552562572582592602612622632642652662672682692702712722732742752762772782792802812822832842852862872882892902912922932942952962972982993003013023033043053063073083093103113123133143153163173183193203213223233243253263273283293303313323333343353363373383393403413423433443453463473483493503513523533543553563573583593603613623633643653663673683693703713723733743753763773783793803813823833843853863873883893903913923933943953963973983994004014024034044054064074084094104114124134144154164174184194204214224234244254264274284294304314324334344354364374384394404414424434444454464474484494504514524534544554564574584594604614624634644654664674684694704714724734744754764774784794804814824834844854864874884894904914924934944954964974984995005015025035045055065075085095105115125135145155165175185195205215225235245255265275285295305315325335345355365375385395405415425435445455465475485495505515525535545555565575585595605615625635645655665675685695705715725735745755765775785795805815825835845855865875885895905915925935945955965975985996006016026036046056066076086096106116126136146156166176186196206216226236246256266276286296306316326336346356366376386396406416426436446456466476486496506516526536546556566576586596606616626636646656666676686696706716726736746756766776786796806816826836846856866876886896906916926936946956966976986997007017027037047057067077087097107117127137147157167177187197207217227237247257267277287297307317327337347357367377387397407417427437447457467477487497507517527537547557567577587597607617627637647657667677687697707717727737747757767777787797807817827837847857867877887897907917927937947957967977987998008018028038048058068078088098108118128138148158168178188198208218228238248258268278288298308318328338348358368378388398408418428438448458468478488498508518528538548558568578588598608618628638648658668678688698708718728738748758768778788798808818828838848858868878888898908918928938948958968978988999009019029039049059069079089099109119129139149159169179189199209219229239249259269279289299309319329339349359369379389399409419429439449459469479489499509519529539549559569579589599609619629639649659669679689699709719729739749759769779789799809819829839849859869879889899909919929939949959969979989991000100110021003100410051006100710081009101010111012101310141015101610171018101910201021102210231024102510261027102810291030103110321033103410351036103710381039104010411042104310441045104610471048104910501051105210531054
  1. % Copyright Matthew Pulver 2018 - 2019.
  2. % Distributed under the Boost Software License, Version 1.0.
  3. % (See accompanying file LICENSE_1_0.txt or copy at
  4. % https://www.boost.org/LICENSE_1_0.txt)
  5. \documentclass{article}
  6. \usepackage{amsmath} %\usepackage{mathtools}
  7. \usepackage{amssymb} %\mathbb
  8. \usepackage{array} % m{} column in tabular
  9. \usepackage{csquotes} % displayquote
  10. \usepackage{fancyhdr}
  11. \usepackage{fancyvrb}
  12. \usepackage[margin=0.75in]{geometry}
  13. \usepackage{graphicx}
  14. \usepackage{hyperref}
  15. %\usepackage{listings}
  16. \usepackage{multirow}
  17. \usepackage[super]{nth}
  18. \usepackage{wrapfig}
  19. \usepackage{xcolor}
  20. \hypersetup{%
  21. colorlinks=false,% hyperlinks will be black
  22. linkbordercolor=blue,% hyperlink borders will be red
  23. urlbordercolor=blue,%
  24. pdfborderstyle={/S/U/W 1}% border style will be underline of width 1pt
  25. }
  26. \pagestyle{fancyplain}
  27. \fancyhf{}
  28. \renewcommand{\headrulewidth}{0pt}
  29. \cfoot[]{\thepage\\
  30. \scriptsize\color{gray} Copyright \textcopyright\/ Matthew Pulver 2018--2019.
  31. Distributed under the Boost Software License, Version 1.0.\\
  32. (See accompanying file LICENSE\_1\_0.txt or copy at
  33. \url{https://www.boost.org/LICENSE\_1\_0.txt})}
  34. \DeclareMathOperator{\sinc}{sinc}
  35. \begin{document}
  36. \title{Autodiff\\
  37. \large Automatic Differentiation C++ Library}
  38. \author{Matthew Pulver}
  39. \maketitle
  40. %\date{}
  41. %\begin{abstract}
  42. %\end{abstract}
  43. \tableofcontents
  44. %\section{Synopsis}
  45. %\begingroup
  46. %\fontsize{10pt}{10pt}\selectfont
  47. %\begin{verbatim}
  48. % example/synopsis.cpp
  49. %\end{verbatim}
  50. %\endgroup
  51. \newpage
  52. \section{Description}
  53. Autodiff is a header-only C++ library that facilitates the
  54. \href{https://en.wikipedia.org/wiki/Automatic_differentiation}{automatic differentiation} (forward mode) of
  55. mathematical functions of single and multiple variables.
  56. This implementation is based upon the \href{https://en.wikipedia.org/wiki/Taylor_series}{Taylor series} expansion of
  57. an analytic function $f$ at the point $x_0$:
  58. \begin{align*}
  59. f(x_0+\varepsilon) &= f(x_0) + f'(x_0)\varepsilon + \frac{f''(x_0)}{2!}\varepsilon^2 + \frac{f'''(x_0)}{3!}\varepsilon^3 + \cdots \\
  60. &= \sum_{n=0}^N\frac{f^{(n)}(x_0)}{n!}\varepsilon^n + O\left(\varepsilon^{N+1}\right).
  61. \end{align*}
  62. The essential idea of autodiff is the substitution of numbers with polynomials in the evaluation of $f(x_0)$. By
  63. substituting the number $x_0$ with the first-order polynomial $x_0+\varepsilon$, and using the same algorithm
  64. to compute $f(x_0+\varepsilon)$, the resulting polynomial in $\varepsilon$ contains the function's derivatives
  65. $f'(x_0)$, $f''(x_0)$, $f'''(x_0)$, ... within the coefficients. Each coefficient is equal to the derivative of
  66. its respective order, divided by the factorial of the order.
  67. In greater detail, assume one is interested in calculating the first $N$ derivatives of $f$ at $x_0$. Without loss
  68. of precision to the calculation of the derivatives, all terms $O\left(\varepsilon^{N+1}\right)$ that include powers
  69. of $\varepsilon$ greater than $N$ can be discarded. (This is due to the fact that each term in a polynomial depends
  70. only upon equal and lower-order terms under arithmetic operations.) Under these truncation rules, $f$ provides a
  71. polynomial-to-polynomial transformation:
  72. \[
  73. f \qquad : \qquad x_0+\varepsilon \qquad \mapsto \qquad
  74. \sum_{n=0}^Ny_n\varepsilon^n=\sum_{n=0}^N\frac{f^{(n)}(x_0)}{n!}\varepsilon^n.
  75. \]
  76. C++'s ability to overload operators and functions allows for the creation of a class {\tt fvar}
  77. (\underline{f}orward-mode autodiff \underline{var}iable) that represents polynomials in $\varepsilon$. Thus
  78. the same algorithm $f$ that calculates the numeric value of $y_0=f(x_0)$, when
  79. written to accept and return variables of a generic (template) type, is also used to calculate the polynomial
  80. $\sum_{n=0}^Ny_n\varepsilon^n=f(x_0+\varepsilon)$. The derivatives $f^{(n)}(x_0)$ are then found from the
  81. product of the respective factorial $n!$ and coefficient $y_n$:
  82. \[ \frac{d^nf}{dx^n}(x_0)=n!y_n. \]
  83. \section{Examples}
  84. \subsection{Example 1: Single-variable derivatives}
  85. \subsubsection{Calculate derivatives of $f(x)=x^4$ at $x=2$.}
  86. In this example, {\tt make\_fvar<double, Order>(2.0)} instantiates the polynomial $2+\varepsilon$. The {\tt Order=5}
  87. means that enough space is allocated (on the stack) to hold a polynomial of up to degree 5 during the proceeding
  88. computation.
  89. Internally, this is modeled by a {\tt std::array<double,6>} whose elements {\tt \{2, 1, 0, 0, 0, 0\}} correspond
  90. to the 6 coefficients of the polynomial upon initialization. Its fourth power, at the end of the computation, is
  91. a polynomial with coefficients {\tt y = \{16, 32, 24, 8, 1, 0\}}. The derivatives are obtained using the formula
  92. $f^{(n)}(2)=n!*{\tt y[n]}$.
  93. \begin{verbatim}
  94. #include <boost/math/differentiation/autodiff.hpp>
  95. #include <iostream>
  96. template <typename T>
  97. T fourth_power(T const& x) {
  98. T x4 = x * x; // retval in operator*() uses x4's memory via NRVO.
  99. x4 *= x4; // No copies of x4 are made within operator*=() even when squaring.
  100. return x4; // x4 uses y's memory in main() via NRVO.
  101. }
  102. int main() {
  103. using namespace boost::math::differentiation;
  104. constexpr unsigned Order = 5; // Highest order derivative to be calculated.
  105. auto const x = make_fvar<double, Order>(2.0); // Find derivatives at x=2.
  106. auto const y = fourth_power(x);
  107. for (unsigned i = 0; i <= Order; ++i)
  108. std::cout << "y.derivative(" << i << ") = " << y.derivative(i) << std::endl;
  109. return 0;
  110. }
  111. /*
  112. Output:
  113. y.derivative(0) = 16
  114. y.derivative(1) = 32
  115. y.derivative(2) = 48
  116. y.derivative(3) = 48
  117. y.derivative(4) = 24
  118. y.derivative(5) = 0
  119. */
  120. \end{verbatim}
  121. The above calculates
  122. \begin{alignat*}{3}
  123. {\tt y.derivative(0)} &=& f(2) =&& \left.x^4\right|_{x=2} &= 16\\
  124. {\tt y.derivative(1)} &=& f'(2) =&& \left.4\cdot x^3\right|_{x=2} &= 32\\
  125. {\tt y.derivative(2)} &=& f''(2) =&& \left.4\cdot 3\cdot x^2\right|_{x=2} &= 48\\
  126. {\tt y.derivative(3)} &=& f'''(2) =&& \left.4\cdot 3\cdot2\cdot x\right|_{x=2} &= 48\\
  127. {\tt y.derivative(4)} &=& f^{(4)}(2) =&& 4\cdot 3\cdot2\cdot1 &= 24\\
  128. {\tt y.derivative(5)} &=& f^{(5)}(2) =&& 0 &
  129. \end{alignat*}
  130. \subsection{Example 2: Multi-variable mixed partial derivatives with multi-precision data type}\label{multivar}
  131. \subsubsection{Calculate $\frac{\partial^{12}f}{\partial w^{3}\partial x^{2}\partial y^{4}\partial z^{3}}(11,12,13,14)$
  132. with a precision of about 50 decimal digits,\\
  133. where $f(w,x,y,z)=\exp\left(w\sin\left(\frac{x\log(y)}{z}\right)+\sqrt{\frac{wz}{xy}}\right)+\frac{w^2}{\tan(z)}$.}
  134. In this example, {\tt make\_ftuple<float50, Nw, Nx, Ny, Nz>(11, 12, 13, 14)} returns a {\tt std::tuple} of 4
  135. independent {\tt fvar} variables, with values of 11, 12, 13, and 14, for which the maximum order derivative to
  136. be calculated for each are 3, 2, 4, 3, respectively. The order of the variables is important, as it is the same
  137. order used when calling {\tt v.derivative(Nw, Nx, Ny, Nz)} in the example below.
  138. \begin{verbatim}
  139. #include <boost/math/differentiation/autodiff.hpp>
  140. #include <boost/multiprecision/cpp_bin_float.hpp>
  141. #include <iostream>
  142. using namespace boost::math::differentiation;
  143. template <typename W, typename X, typename Y, typename Z>
  144. promote<W, X, Y, Z> f(const W& w, const X& x, const Y& y, const Z& z) {
  145. using namespace std;
  146. return exp(w * sin(x * log(y) / z) + sqrt(w * z / (x * y))) + w * w / tan(z);
  147. }
  148. int main() {
  149. using float50 = boost::multiprecision::cpp_bin_float_50;
  150. constexpr unsigned Nw = 3; // Max order of derivative to calculate for w
  151. constexpr unsigned Nx = 2; // Max order of derivative to calculate for x
  152. constexpr unsigned Ny = 4; // Max order of derivative to calculate for y
  153. constexpr unsigned Nz = 3; // Max order of derivative to calculate for z
  154. // Declare 4 independent variables together into a std::tuple.
  155. auto const variables = make_ftuple<float50, Nw, Nx, Ny, Nz>(11, 12, 13, 14);
  156. auto const& w = std::get<0>(variables); // Up to Nw derivatives at w=11
  157. auto const& x = std::get<1>(variables); // Up to Nx derivatives at x=12
  158. auto const& y = std::get<2>(variables); // Up to Ny derivatives at y=13
  159. auto const& z = std::get<3>(variables); // Up to Nz derivatives at z=14
  160. auto const v = f(w, x, y, z);
  161. // Calculated from Mathematica symbolic differentiation.
  162. float50 const answer("1976.319600747797717779881875290418720908121189218755");
  163. std::cout << std::setprecision(std::numeric_limits<float50>::digits10)
  164. << "mathematica : " << answer << '\n'
  165. << "autodiff : " << v.derivative(Nw, Nx, Ny, Nz) << '\n'
  166. << std::setprecision(3)
  167. << "relative error: " << (v.derivative(Nw, Nx, Ny, Nz) / answer - 1) << '\n';
  168. return 0;
  169. }
  170. /*
  171. Output:
  172. mathematica : 1976.3196007477977177798818752904187209081211892188
  173. autodiff : 1976.3196007477977177798818752904187209081211892188
  174. relative error: 2.67e-50
  175. */
  176. \end{verbatim}
  177. \subsection{Example 3: Black-Scholes Option Pricing with Greeks Automatically Calculated}
  178. \subsubsection{Calculate greeks directly from the Black-Scholes pricing function.}
  179. Below is the standard Black-Scholes pricing function written as a function template, where the price, volatility
  180. (sigma), time to expiration (tau) and interest rate are template parameters. This means that any Greek based on
  181. these 4 variables can be calculated using autodiff. The below example calculates delta and gamma where the variable
  182. of differentiation is only the price. For examples of more exotic greeks, see {\tt example/black\_scholes.cpp}.
  183. \begin{verbatim}
  184. #include <boost/math/differentiation/autodiff.hpp>
  185. #include <iostream>
  186. using namespace boost::math::constants;
  187. using namespace boost::math::differentiation;
  188. // Equations and function/variable names are from
  189. // https://en.wikipedia.org/wiki/Greeks_(finance)#Formulas_for_European_option_Greeks
  190. // Standard normal cumulative distribution function
  191. template <typename X>
  192. X Phi(X const& x) {
  193. return 0.5 * erfc(-one_div_root_two<X>() * x);
  194. }
  195. enum class CP { call, put };
  196. // Assume zero annual dividend yield (q=0).
  197. template <typename Price, typename Sigma, typename Tau, typename Rate>
  198. promote<Price, Sigma, Tau, Rate> black_scholes_option_price(CP cp,
  199. double K,
  200. Price const& S,
  201. Sigma const& sigma,
  202. Tau const& tau,
  203. Rate const& r) {
  204. using namespace std;
  205. auto const d1 = (log(S / K) + (r + sigma * sigma / 2) * tau) / (sigma * sqrt(tau));
  206. auto const d2 = (log(S / K) + (r - sigma * sigma / 2) * tau) / (sigma * sqrt(tau));
  207. switch (cp) {
  208. case CP::call:
  209. return S * Phi(d1) - exp(-r * tau) * K * Phi(d2);
  210. case CP::put:
  211. return exp(-r * tau) * K * Phi(-d2) - S * Phi(-d1);
  212. }
  213. }
  214. int main() {
  215. double const K = 100.0; // Strike price.
  216. auto const S = make_fvar<double, 2>(105); // Stock price.
  217. double const sigma = 5; // Volatility.
  218. double const tau = 30.0 / 365; // Time to expiration in years. (30 days).
  219. double const r = 1.25 / 100; // Interest rate.
  220. auto const call_price = black_scholes_option_price(CP::call, K, S, sigma, tau, r);
  221. auto const put_price = black_scholes_option_price(CP::put, K, S, sigma, tau, r);
  222. std::cout << "black-scholes call price = " << call_price.derivative(0) << '\n'
  223. << "black-scholes put price = " << put_price.derivative(0) << '\n'
  224. << "call delta = " << call_price.derivative(1) << '\n'
  225. << "put delta = " << put_price.derivative(1) << '\n'
  226. << "call gamma = " << call_price.derivative(2) << '\n'
  227. << "put gamma = " << put_price.derivative(2) << '\n';
  228. return 0;
  229. }
  230. /*
  231. Output:
  232. black-scholes call price = 56.5136
  233. black-scholes put price = 51.4109
  234. call delta = 0.773818
  235. put delta = -0.226182
  236. call gamma = 0.00199852
  237. put gamma = 0.00199852
  238. */
  239. \end{verbatim}
  240. \section{Advantages of Automatic Differentiation}
  241. The above examples illustrate some of the advantages of using autodiff:
  242. \begin{itemize}
  243. \item Elimination of code redundancy. The existence of $N$ separate functions to calculate derivatives is a form
  244. of code redundancy, with all the liabilities that come with it:
  245. \begin{itemize}
  246. \item Changes to one function require $N$ additional changes to other functions. In the \nth{3} example above,
  247. consider how much larger and inter-dependent the above code base would be if a separate function were
  248. written for \href{https://en.wikipedia.org/wiki/Greeks\_(finance)#Formulas\_for\_European\_option\_Greeks}
  249. {each Greek} value.
  250. \item Dependencies upon a derivative function for a different purpose will break when changes are made to
  251. the original function. What doesn't need to exist cannot break.
  252. \item Code bloat, reducing conceptual integrity. Control over the evolution of code is easier/safer when
  253. the code base is smaller and able to be intuitively grasped.
  254. \end{itemize}
  255. \item Accuracy of derivatives over finite difference methods. Single-iteration finite difference methods always
  256. include a $\Delta x$ free variable that must be carefully chosen for each application. If $\Delta x$ is too
  257. small, then numerical errors become large. If $\Delta x$ is too large, then mathematical errors become large.
  258. With autodiff, there are no free variables to set and the accuracy of the answer is generally superior to finite
  259. difference methods even with the best choice of $\Delta x$.
  260. \end{itemize}
  261. \section{Mathematics}
  262. In order for the usage of the autodiff library to make sense, a basic understanding of the mathematics will help.
  263. \subsection{Truncated Taylor Series}
  264. Basic calculus courses teach that a real \href{https://en.wikipedia.org/wiki/Analytic_function}{analytic function}
  265. $f : D\rightarrow\mathbb{R}$ is one which can be expressed as a Taylor series at a point
  266. $x_0\in D\subseteq\mathbb{R}$:
  267. \[
  268. f(x) = f(x_0) + f'(x_0)(x-x_0) + \frac{f''(x_0)}{2!}(x-x_0)^2 + \frac{f'''(x_0)}{3!}(x-x_0)^3 + \cdots
  269. \]
  270. One way of thinking about this form is that given the value of an analytic function $f(x_0)$ and its derivatives
  271. $f'(x_0), f''(x_0), f'''(x_0), ...$ evaluated at a point $x_0$, then the value of the function
  272. $f(x)$ can be obtained at any other point $x\in D$ using the above formula.
  273. Let us make the substitution $x=x_0+\varepsilon$ and rewrite the above equation to get:
  274. \[
  275. f(x_0+\varepsilon) = f(x_0) + f'(x_0)\varepsilon + \frac{f''(x_0)}{2!}\varepsilon^2 + \frac{f'''(x_0)}{3!}\varepsilon^3 + \cdots
  276. \]
  277. Now consider $\varepsilon$ as {\it an abstract algebraic entity that never acquires a numeric value}, much like
  278. one does in basic algebra with variables like $x$ or $y$. For example, we can still manipulate entities
  279. like $xy$ and $(1+2x+3x^2)$ without having to assign specific numbers to them.
  280. Using this formula, autodiff goes in the other direction. Given a general formula/algorithm for calculating
  281. $f(x_0+\varepsilon)$, the derivatives are obtained from the coefficients of the powers of $\varepsilon$
  282. in the resulting computation. The general coefficient for $\varepsilon^n$ is
  283. \[\frac{f^{(n)}(x_0)}{n!}.\]
  284. Thus to obtain $f^{(n)}(x_0)$, the coefficient of $\varepsilon^n$ is multiplied by $n!$.
  285. \subsubsection{Example}
  286. Apply the above technique to calculate the derivatives of $f(x)=x^4$ at $x_0=2$.
  287. The first step is to evaluate $f(x_0+\varepsilon)$ and simply go through the calculation/algorithm, treating
  288. $\varepsilon$ as an abstract algebraic entity:
  289. \begin{align*}
  290. f(x_0+\varepsilon) &= f(2+\varepsilon) \\
  291. &= (2+\varepsilon)^4 \\
  292. &= \left(4+4\varepsilon+\varepsilon^2\right)^2 \\
  293. &= 16+32\varepsilon+24\varepsilon^2+8\varepsilon^3+\varepsilon^4.
  294. \end{align*}
  295. Equating the powers of $\varepsilon$ from this result with the above $\varepsilon$-taylor expansion
  296. yields the following equalities:
  297. \[
  298. f(2) = 16, \qquad
  299. f'(2) = 32, \qquad
  300. \frac{f''(2)}{2!} = 24, \qquad
  301. \frac{f'''(2)}{3!} = 8, \qquad
  302. \frac{f^{(4)}(2)}{4!} = 1, \qquad
  303. \frac{f^{(5)}(2)}{5!} = 0.
  304. \]
  305. Multiplying both sides by the respective factorials gives
  306. \[
  307. f(2) = 16, \qquad
  308. f'(2) = 32, \qquad
  309. f''(2) = 48, \qquad
  310. f'''(2) = 48, \qquad
  311. f^{(4)}(2) = 24, \qquad
  312. f^{(5)}(2) = 0.
  313. \]
  314. These values can be directly confirmed by the \href{https://en.wikipedia.org/wiki/Power_rule}{power rule}
  315. applied to $f(x)=x^4$.
  316. \subsection{Arithmetic}
  317. What was essentially done above was to take a formula/algorithm for calculating $f(x_0)$ from a number $x_0$,
  318. and instead apply the same formula/algorithm to a polynomial $x_0+\varepsilon$. Intermediate steps operate on
  319. values of the form
  320. \[
  321. {\bf x} = x_0 + x_1\varepsilon + x_2\varepsilon^2 +\cdots+ x_N\varepsilon^N
  322. \]
  323. and the final return value is of this polynomial form as well. In other words, the normal arithmetic operators
  324. $+,-,\times,\div$ applied to numbers $x$ are instead applied to polynomials $\bf x$. Through the overloading of C++
  325. operators and functions, floating point data types are replaced with data types that represent these polynomials. More
  326. specifically, C++ types such as {\tt double} are replaced with {\tt std::array<double,N+1>}, which hold the above
  327. $N+1$ coefficients $x_i$, and are wrapped in a {\tt class} that overloads all of the arithmetic operators.
  328. The logic of these arithmetic operators simply mirror that which is applied to polynomials. We'll look at
  329. each of the 4 arithmetic operators in detail.
  330. \subsubsection{Addition}
  331. The addition of polynomials $\bf x$ and $\bf y$ is done component-wise:
  332. \begin{align*}
  333. {\bf z} &= {\bf x} + {\bf y} \\
  334. &= \left(\sum_{i=0}^Nx_i\varepsilon^i\right) + \left(\sum_{i=0}^Ny_i\varepsilon^i\right) \\
  335. &= \sum_{i=0}^N(x_i+y_i)\varepsilon^i \\
  336. z_i &= x_i + y_i \qquad \text{for}\; i\in\{0,1,2,...,N\}.
  337. \end{align*}
  338. \subsubsection{Subtraction}
  339. Subtraction follows the same form as addition:
  340. \begin{align*}
  341. {\bf z} &= {\bf x} - {\bf y} \\
  342. &= \left(\sum_{i=0}^Nx_i\varepsilon^i\right) - \left(\sum_{i=0}^Ny_i\varepsilon^i\right) \\
  343. &= \sum_{i=0}^N(x_i-y_i)\varepsilon^i \\
  344. z_i &= x_i - y_i \qquad \text{for}\; i\in\{0,1,2,...,N\}.
  345. \end{align*}
  346. \subsubsection{Multiplication}
  347. Multiplication produces higher-order terms:
  348. \begin{align*}
  349. {\bf z} &= {\bf x} \times {\bf y} \\
  350. &= \left(\sum_{i=0}^Nx_i\varepsilon^i\right) \left(\sum_{i=0}^Ny_i\varepsilon^i\right) \\
  351. &= x_0y_0 + (x_0y_1+x_1y_0)\varepsilon + (x_0y_2+x_1y_1+x_2y_0)\varepsilon^2 + \cdots +
  352. \left(\sum_{j=0}^Nx_jy_{N-j}\right)\varepsilon^N + O\left(\varepsilon^{N+1}\right) \\
  353. &= \sum_{i=0}^N\sum_{j=0}^ix_jy_{i-j}\varepsilon^i + O\left(\varepsilon^{N+1}\right) \\
  354. z_i &= \sum_{j=0}^ix_jy_{i-j} \qquad \text{for}\; i\in\{0,1,2,...,N\}.
  355. \end{align*}
  356. In the case of multiplication, terms involving powers of $\varepsilon$ greater than $N$, collectively denoted
  357. by $O\left(\varepsilon^{N+1}\right)$, are simply discarded. Fortunately, the values of $z_i$ for $i\le N$ do not
  358. depend on any of these discarded terms, so there is no loss of precision in the final answer. The only information
  359. that is lost are the values of higher order derivatives, which we are not interested in anyway. If we were, then
  360. we would have simply chosen a larger value of $N$ to begin with.
  361. \subsubsection{Division}
  362. Division is not directly calculated as are the others. Instead, to find the components of
  363. ${\bf z}={\bf x}\div{\bf y}$ we require that ${\bf x}={\bf y}\times{\bf z}$. This yields
  364. a recursive formula for the components $z_i$:
  365. \begin{align*}
  366. x_i &= \sum_{j=0}^iy_jz_{i-j} \\
  367. &= y_0z_i + \sum_{j=1}^iy_jz_{i-j} \\
  368. z_i &= \frac{1}{y_0}\left(x_i - \sum_{j=1}^iy_jz_{i-j}\right) \qquad \text{for}\; i\in\{0,1,2,...,N\}.
  369. \end{align*}
  370. In the case of division, the values for $z_i$ must be calculated sequentially, since $z_i$
  371. depends on the previously calculated values $z_0, z_1, ..., z_{i-1}$.
  372. \subsection{General Functions}
  373. Calling standard mathematical functions such as {\tt log()}, {\tt cos()}, etc. should return accurate higher
  374. order derivatives. For example, {\tt exp(x)} may be written internally as a specific \nth{14}-degree polynomial to
  375. approximate $e^x$ when $0<x<1$. This would mean that the \nth{15} derivative, and all higher order derivatives, would
  376. be 0, however we know that $\frac{d^{15}}{dx^{15}}e^x=e^x$. How should such functions whose derivatives are known
  377. be written to provide accurate higher order derivatives? The answer again comes back to the function's Taylor series.
  378. To simplify notation, for a given polynomial ${\bf x} = x_0 + x_1\varepsilon + x_2\varepsilon^2 +\cdots+
  379. x_N\varepsilon^N$ define
  380. \[
  381. {\bf x}_\varepsilon = x_1\varepsilon + x_2\varepsilon^2 +\cdots+ x_N\varepsilon^N = \sum_{i=1}^Nx_i\varepsilon^i.
  382. \]
  383. This allows for a concise expression of a general function $f$ of $\bf x$:
  384. \begin{align*}
  385. f({\bf x}) &= f(x_0 + {\bf x}_\varepsilon) \\
  386. & = f(x_0) + f'(x_0){\bf x}_\varepsilon + \frac{f''(x_0)}{2!}{\bf x}_\varepsilon^2 + \frac{f'''(x_0)}{3!}{\bf x}_\varepsilon^3 + \cdots + \frac{f^{(N)}(x_0)}{N!}{\bf x}_\varepsilon^N + O\left(\varepsilon^{N+1}\right) \\
  387. & = \sum_{i=0}^N\frac{f^{(i)}(x_0)}{i!}{\bf x}_\varepsilon^i + O\left(\varepsilon^{N+1}\right)
  388. \end{align*}
  389. where $\varepsilon$ has been substituted with ${\bf x}_\varepsilon$ in the $\varepsilon$-taylor series
  390. for $f(x)$. This form gives a recipe for calculating $f({\bf x})$ in general from regular numeric calculations
  391. $f(x_0)$, $f'(x_0)$, $f''(x_0)$, ... and successive powers of the epsilon terms ${\bf x}_\varepsilon$.
  392. For an application in which we are interested in up to $N$ derivatives in $x$ the data structure to hold
  393. this information is an $(N+1)$-element array {\tt v} whose general element is
  394. \[ {\tt v[i]} = \frac{f^{(i)}(x_0)}{i!} \qquad \text{for}\; i\in\{0,1,2,...,N\}. \]
  395. \subsection{Multiple Variables}
  396. In C++, the generalization to mixed partial derivatives with multiple independent variables is conveniently achieved
  397. with recursion. To begin to see the recursive pattern, consider a two-variable function $f(x,y)$. Since $x$
  398. and $y$ are independent, they require their own independent epsilons $\varepsilon_x$ and $\varepsilon_y$,
  399. respectively.
  400. Expand $f(x,y)$ for $x=x_0+\varepsilon_x$:
  401. \begin{align*}
  402. f(x_0+\varepsilon_x,y) &= f(x_0,y)
  403. + \frac{\partial f}{\partial x}(x_0,y)\varepsilon_x
  404. + \frac{1}{2!}\frac{\partial^2 f}{\partial x^2}(x_0,y)\varepsilon_x^2
  405. + \frac{1}{3!}\frac{\partial^3 f}{\partial x^3}(x_0,y)\varepsilon_x^3
  406. + \cdots
  407. + \frac{1}{M!}\frac{\partial^M f}{\partial x^M}(x_0,y)\varepsilon_x^M
  408. + O\left(\varepsilon_x^{M+1}\right) \\
  409. &= \sum_{i=0}^M\frac{1}{i!}\frac{\partial^i f}{\partial x^i}(x_0,y)\varepsilon_x^i + O\left(\varepsilon_x^{M+1}\right).
  410. \end{align*}
  411. Next, expand $f(x_0+\varepsilon_x,y)$ for $y=y_0+\varepsilon_y$:
  412. \begin{align*}
  413. f(x_0+\varepsilon_x,y_0+\varepsilon_y) &= \sum_{j=0}^N\frac{1}{j!}\frac{\partial^j}{\partial y^j}
  414. \left(\sum_{i=0}^M\varepsilon_x^i\frac{1}{i!}\frac{\partial^if}{\partial x^i}\right)(x_0,y_0)\varepsilon_y^j
  415. + O\left(\varepsilon_x^{M+1}\right) + O\left(\varepsilon_y^{N+1}\right) \\
  416. &= \sum_{i=0}^M\sum_{j=0}^N\frac{1}{i!j!}\frac{\partial^{i+j}f}{\partial x^i\partial y^j}(x_0,y_0)
  417. \varepsilon_x^i\varepsilon_y^j + O\left(\varepsilon_x^{M+1}\right) + O\left(\varepsilon_y^{N+1}\right).
  418. \end{align*}
  419. Similar to the single-variable case, for an application in which we are interested in up to $M$ derivatives in
  420. $x$ and $N$ derivatives in $y$, the data structure to hold this information is an $(M+1)\times(N+1)$
  421. array {\tt v} whose element at $(i,j)$ is
  422. \[
  423. {\tt v[i][j]} = \frac{1}{i!j!}\frac{\partial^{i+j}f}{\partial x^i\partial y^j}(x_0,y_0)
  424. \qquad \text{for}\; (i,j)\in\{0,1,2,...,M\}\times\{0,1,2,...,N\}.
  425. \]
  426. The generalization to additional independent variables follows the same pattern.
  427. \subsubsection{Declaring Multiple Variables}
  428. Internally, independent variables are represented by vectors within orthogonal vector spaces. Because of this,
  429. one must be careful when declaring more than one independent variable so that they do not end up in
  430. parallel vector spaces. This can easily be achieved by following one rule:
  431. \begin{itemize}
  432. \item When declaring more than one independent variable, call {\tt make\_ftuple<>()} once and only once.
  433. \end{itemize}
  434. The tuple of values returned are independent. Though it is possible to achieve the same result with multiple calls
  435. to {\tt make\_fvar}, this is an easier and less error-prone method. See Section~\ref{multivar} for example usage.
  436. %\section{Usage}
  437. %
  438. %\subsection{Single Variable}
  439. %
  440. %To calculate derivatives of a single variable $x$, at a particular value $x_0$, the following must be
  441. %specified at compile-time:
  442. %
  443. %\begin{enumerate}
  444. %\item The numeric data type {\tt T} of $x_0$. Examples: {\tt double},
  445. % {\tt boost::multiprecision::cpp\_bin\_float\_50}, etc.
  446. %\item The maximum derivative order $M$ that is to be calculated with respect to $x$.
  447. %\end{enumerate}
  448. %Note that both of these requirements are entirely analogous to declaring and using a {\tt std::array<T,N>}. {\tt T}
  449. %and {\tt N} must be set as compile-time, but which elements in the array are accessed can be determined at run-time,
  450. %just as the choice of what derivatives to query in autodiff can be made during run-time.
  451. %
  452. %To declare and initialize $x$:
  453. %
  454. %\begin{verbatim}
  455. % using namespace boost::math::differentiation;
  456. % autodiff_fvar<T,M> x = make_fvar<T,M>(x0);
  457. %\end{verbatim}
  458. %where {\tt x0} is a run-time value of type {\tt T}. Assuming {\tt 0 < M}, this represents the polynomial $x_0 +
  459. %\varepsilon$. Internally, the member variable of type {\tt std::array<T,M>} is {\tt v = \{ x0, 1, 0, 0, ... \}},
  460. %consistent with the above mathematical treatise.
  461. %
  462. %To find the derivatives $f^{(n)}(x_0)$ for $0\le n\le M$ of a function
  463. %$f : \mathbb{R}\rightarrow\mathbb{R}$, the function can be represented as a template
  464. %
  465. %\begin{verbatim}
  466. % template<typename T>
  467. % T f(T x);
  468. %\end{verbatim}
  469. %Using a generic type {\tt T} allows for {\tt x} to be of a regular type such as {\tt double}, but also allows for\\
  470. %{\tt boost::math::differentiation::autodiff\_fvar<>} types.
  471. %
  472. %Internal calls to mathematical functions must allow for
  473. %\href{https://en.cppreference.com/w/cpp/language/adl}{argument-dependent lookup} (ADL). Many standard library functions
  474. %are overloaded in the {\tt boost::math::differentiation} namespace. For example, instead of calling {\tt std::cos(x)}
  475. %from within {\tt f}, include the line {\tt using std::cos;} and call {\tt cos(x)} without a namespace prefix.
  476. %
  477. %Calling $f$ and retrieving the calculated value and derivatives:
  478. %
  479. %\begin{verbatim}
  480. % using namespace boost::math::differentiation;
  481. % autodiff_fvar<T,M> x = make_fvar<T,M>(x0);
  482. % autodiff_fvar<T,M> y = f(x);
  483. % for (int n=0 ; n<=M ; ++n)
  484. % std::cout << "y.derivative("<<n<<") == " << y.derivative(n) << std::endl;
  485. %\end{verbatim}
  486. %{\tt y.derivative(0)} returns the undifferentiated value $f(x_0)$, and {\tt y.derivative(n)} returns $f^{(n)}(x_0)$.
  487. %Casting {\tt y} to type {\tt T} also gives the undifferentiated value. In other words, the following 3 values
  488. %are equal:
  489. %
  490. %\begin{enumerate}
  491. %\item {\tt f(x0)}
  492. %\item {\tt y.derivative(0)}
  493. %\item {\tt static\_cast<T>(y)}
  494. %\end{enumerate}
  495. %
  496. %\subsection{Multiple Variables}
  497. %
  498. %Independent variables are represented in autodiff as independent dimensions within a multi-dimensional array.
  499. %This is perhaps best illustrated with examples. The {\tt namespace boost::math::differentiation} is assumed.
  500. %
  501. %The following instantiates a variable of $x=13$ with up to 3 orders of derivatives:
  502. %
  503. %\begin{verbatim}
  504. % autodiff_fvar<double,3> x = make_fvar<double,3>(13);
  505. %\end{verbatim}
  506. %This instantiates {\bf an independent} value of $y=14$ with up to 4 orders of derivatives:
  507. %
  508. %\begin{verbatim}
  509. % autodiff_fvar<double,0,4> y = make_fvar<double,0,4>(14);
  510. %\end{verbatim}
  511. %Combining them together {\bf promotes} their data type automatically to the smallest multidimensional array that
  512. %accommodates both.
  513. %
  514. %\begin{verbatim}
  515. % // z is promoted to autodiff_fvar<double,3,4>
  516. % auto z = 10*x*x + 50*x*y + 100*y*y;
  517. %\end{verbatim}
  518. %The object {\tt z} holds a 2-dimensional array, thus {\tt derivative(...)} is a 2-parameter method:
  519. %
  520. %\[
  521. %{\tt z.derivative(i,j)} = \frac{\partial^{i+j}f}{\partial x^i\partial y^j}(13,14)
  522. % \qquad \text{for}\; (i,j)\in\{0,1,2,3\}\times\{0,1,2,3,4\}.
  523. %\]
  524. %A few values of the result can be confirmed through inspection:
  525. %
  526. %\begin{verbatim}
  527. % z.derivative(2,0) == 20
  528. % z.derivative(1,1) == 50
  529. % z.derivative(0,2) == 200
  530. %\end{verbatim}
  531. %Note how the position of the parameters in {\tt derivative(...)} match how {\tt x} and {\tt y} were declared.
  532. %This will be clarified next.
  533. %
  534. %\subsubsection{Two Rules of Variable Initialization}
  535. %
  536. %In general, there are two rules to keep in mind when dealing with multiple variables:
  537. %
  538. %\begin{enumerate}
  539. %\item Independent variables correspond to parameter position, in both the initialization {\tt make\_fvar<T,...>}
  540. % and calls to {\tt derivative(...)}.
  541. %\item The last template position in {\tt make\_fvar<T,...>} determines which variable a derivative will be
  542. % taken with respect to.
  543. %\end{enumerate}
  544. %Both rules are illustrated with an example in which there are 3 independent variables $x,y,z$ and 1 dependent
  545. %variable $w=f(x,y,z)$, though the following code readily generalizes to any number of independent variables, limited
  546. %only by the C++ compiler/memory/platform. The maximum derivative order of each variable is {\tt Nx}, {\tt Ny}, and
  547. %{\tt Nz}, respectively. Then the type for {\tt w} is {\tt boost::math::differentiation::autodiff\_fvar<T,Nx,Ny,Nz>}
  548. %and all possible mixed partial derivatives are available via
  549. %
  550. %\[
  551. %{\tt w.derivative(nx,ny,nz)} =
  552. % \frac{\partial^{n_x+n_y+n_z}f}{\partial x^{n_x}\partial y^{n_y}\partial z^{n_z} }(x_0,y_0,z_0)
  553. %\]
  554. %for $(n_x,n_y,n_z)\in\{0,1,2,...,N_x\}\times\{0,1,2,...,N_y\}\times\{0,1,2,...,N_z\}$ where $x_0, y_0, z_0$ are
  555. %the numerical values at which the function $f$ and its derivatives are evaluated.
  556. %
  557. %In code:
  558. %\begin{verbatim}
  559. % using namespace boost::math::differentiation;
  560. %
  561. % using var = autodiff_fvar<double,Nx,Ny,Nz>; // Nx, Ny, Nz are constexpr size_t.
  562. %
  563. % var x = make_fvar<double,Nx>(x0); // x0 is of type double
  564. % var y = make_fvar<double,Nx,Ny>(y0); // y0 is of type double
  565. % var z = make_fvar<double,Nx,Ny,Nz>(z0); // z0 is of type double
  566. %
  567. % var w = f(x,y,z);
  568. %
  569. % for (size_t nx=0 ; nx<=Nx ; ++nx)
  570. % for (size_t ny=0 ; ny<=Ny ; ++ny)
  571. % for (size_t nz=0 ; nz<=Nz ; ++nz)
  572. % std::cout << "w.derivative("<<nx<<','<<ny<<','<<nz<<") == "
  573. % << w.derivative(nx,ny,nz) << std::endl;
  574. %\end{verbatim}
  575. %Note how {\tt x}, {\tt y}, and {\tt z} are initialized: the last template parameter determines which variable
  576. %$x, y,$ or $z$ a derivative is taken with respect to. In terms of the $\varepsilon$-polynomials
  577. %above, this determines whether to add $\varepsilon_x, \varepsilon_y,$ or $\varepsilon_z$ to
  578. %$x_0, y_0,$ or $z_0$, respectively.
  579. %
  580. %In contrast, the following initialization of {\tt x} would be INCORRECT:
  581. %
  582. %\begin{verbatim}
  583. % var x = make_fvar<T,Nx,0>(x0); // WRONG
  584. %\end{verbatim}
  585. %Mathematically, this represents $x_0+\varepsilon_y$, since the last template parameter corresponds to the
  586. %$y$ variable, and thus the resulting value will be invalid.
  587. %
  588. %\subsubsection{Type Promotion}
  589. %
  590. %The previous example can be optimized to save some unnecessary computation, by declaring smaller arrays,
  591. %and relying on autodiff's automatic type-promotion:
  592. %
  593. %\begin{verbatim}
  594. % using namespace boost::math::differentiation;
  595. %
  596. % autodiff_fvar<double,Nx> x = make_fvar<double,Nx>(x0);
  597. % autodiff_fvar<double,0,Ny> y = make_fvar<double,0,Ny>(y0);
  598. % autodiff_fvar<double,0,0,Nz> z = make_fvar<double,0,0,Nz>(z0);
  599. %
  600. % autodiff_fvar<double,Nx,Ny,Nz> w = f(x,y,z);
  601. %
  602. % for (size_t nx=0 ; nx<=Nx ; ++nx)
  603. % for (size_t ny=0 ; ny<=Ny ; ++ny)
  604. % for (size_t nz=0 ; nz<=Nz ; ++nz)
  605. % std::cout << "w.derivative("<<nx<<','<<ny<<','<<nz<<") == "
  606. % << w.derivative(nx,ny,nz) << std::endl;
  607. %\end{verbatim}
  608. %For example, if one of the first steps in the computation of $f$ was {\tt z*z}, then a significantly less number of
  609. %multiplications and additions may occur if {\tt z} is declared as {\tt autodiff\_fvar<double,0,0,Nz>} as opposed to \\
  610. %{\tt autodiff\_fvar<double,Nx,Ny,Nz>}. There is no loss of precision with the former, since the extra dimensions
  611. %represent 0 values. Once {\tt z} is combined with {\tt x} and {\tt y} during the computation, the types will be
  612. %promoted as necessary. This is the recommended way to initialize variables in autodiff.
  613. \section{Writing Functions for Autodiff Compatibility}\label{compatibility}
  614. In this section, a general procedure is given for writing new, and transforming existing, C++ mathematical
  615. functions for compatibility with autodiff.
  616. There are 3 categories of functions that require different strategies:
  617. \begin{enumerate}
  618. \item Piecewise-rational functions. These are simply piecewise quotients of polynomials. All that is needed is to
  619. turn the function parameters and return value into generic (template) types. This will then allow the function
  620. to accept and return autodiff's {\tt fvar} types, thereby using autodiff's overloaded arithmetic operators
  621. which calculate the derivatives automatically.
  622. \item Functions that call existing autodiff functions. This is the same as the previous, but may also include
  623. calls to functions that are in the autodiff library. Examples: {\tt exp()}, {\tt log()}, {\tt tgamma()}, etc.
  624. \item New functions for which the derivatives can be calculated. This is the most general technique, as it
  625. allows for the development of a function which do not fall into the previous two categories.
  626. \end{enumerate}
  627. Functions written in any of these ways may then be added to the autodiff library.
  628. \subsection{Piecewise-Rational Functions}
  629. \[
  630. f(x) = \frac{1}{1+x^2}
  631. \]
  632. By simply writing this as a template function, autodiff can calculate derivatives for it:
  633. \begin{Verbatim}[xleftmargin=2em]
  634. #include <boost/math/differentiation/autodiff.hpp>
  635. #include <iostream>
  636. template <typename T>
  637. T rational(T const& x) {
  638. return 1 / (1 + x * x);
  639. }
  640. int main() {
  641. using namespace boost::math::differentiation;
  642. auto const x = make_fvar<double, 10>(0);
  643. auto const y = rational(x);
  644. std::cout << std::setprecision(std::numeric_limits<double>::digits10)
  645. << "y.derivative(10) = " << y.derivative(10) << std::endl;
  646. return 0;
  647. }
  648. /*
  649. Output:
  650. y.derivative(10) = -3628800
  651. */
  652. \end{Verbatim}
  653. As simple as $f(x)$ may seem, the derivatives can get increasingly complex as derivatives are taken.
  654. For example, the \nth{10} derivative has the form
  655. \[
  656. f^{(10)}(x) = -3628800\frac{1 - 55x^2 + 330x^4 - 462x^6 + 165x^8 - 11x^{10}}{(1 + x^2)^{11}}.
  657. \]
  658. Derivatives of $f(x)$ are useful, and in fact used, in calculating higher order derivatives for $\arctan(x)$
  659. for instance, since
  660. \[
  661. \arctan^{(n)}(x) = \left(\frac{d}{dx}\right)^{n-1} \frac{1}{1+x^2}\qquad\text{for}\quad 1\le n.
  662. \]
  663. \subsection{Functions That Call Existing Autodiff Functions}
  664. Many of the standard library math function are overloaded in autodiff. It is recommended to use
  665. \href{https://en.cppreference.com/w/cpp/language/adl}{argument-dependent lookup} (ADL) in order for functions to
  666. be written in a way that is general enough to accommodate standard types ({\tt double}) as well as autodiff types
  667. ({\tt fvar}).
  668. \\
  669. Example:
  670. \begin{Verbatim}[xleftmargin=2em]
  671. #include <boost/math/constants/constants.hpp>
  672. #include <cmath>
  673. using namespace boost::math::constants;
  674. // Standard normal cumulative distribution function
  675. template <typename T>
  676. T Phi(T const& x)
  677. {
  678. return 0.5 * std::erfc(-one_div_root_two<T>() * x);
  679. }
  680. \end{Verbatim}
  681. Though {\tt Phi(x)} is general enough to handle the various fundamental floating point types, this will
  682. not work if {\tt x} is an autodiff {\tt fvar} variable, since {\tt std::erfc} does not include a specialization
  683. for {\tt fvar}. The recommended solution is to remove the namespace prefix {\tt std::} from {\tt erfc}:
  684. \begin{Verbatim}[xleftmargin=2em]
  685. #include <boost/math/constants/constants.hpp>
  686. #include <boost/math/differentiation/autodiff.hpp>
  687. #include <cmath>
  688. using namespace boost::math::constants;
  689. // Standard normal cumulative distribution function
  690. template <typename T>
  691. T Phi(T const& x)
  692. {
  693. using std::erfc;
  694. return 0.5 * erfc(-one_div_root_two<T>() * x);
  695. }
  696. \end{Verbatim}
  697. In this form, when {\tt x} is of type {\tt fvar}, the C++ compiler will search for and find a function {\tt erfc}
  698. within the same namespace as {\tt fvar}, which is in the autodiff library, via ADL. Because of the using-declaration,
  699. it will also call {\tt std::erfc} when {\tt x} is a fundamental type such as {\tt double}.
  700. \subsection{New Functions For Which The Derivatives Can Be Calculated}\label{new_functions}
  701. Mathematical functions which do not fall into the previous two categories can be constructed using autodiff helper
  702. functions. This requires a separate function for calculating the derivatives. In case you are asking yourself what
  703. good is an autodiff library if one needs to supply the derivatives, the answer is that the new function will fit
  704. in with the rest of the autodiff library, thereby allowing for the creation of additional functions via all of
  705. the arithmetic operators, plus function composition, which was not readily available without the library.
  706. The example given here is for {\tt cos}:
  707. \begin{Verbatim}[xleftmargin=2em]
  708. template <typename RealType, size_t Order>
  709. fvar<RealType, Order> cos(fvar<RealType, Order> const& cr) {
  710. using std::cos;
  711. using std::sin;
  712. using root_type = typename fvar<RealType, Order>::root_type;
  713. constexpr size_t order = fvar<RealType, Order>::order_sum;
  714. root_type const d0 = cos(static_cast<root_type>(cr));
  715. if constexpr (order == 0)
  716. return fvar<RealType, Order>(d0);
  717. else {
  718. root_type const d1 = -sin(static_cast<root_type>(cr));
  719. root_type const derivatives[4]{d0, d1, -d0, -d1};
  720. return cr.apply_derivatives(order,
  721. [&derivatives](size_t i) { return derivatives[i & 3]; });
  722. }
  723. }
  724. \end{Verbatim}
  725. This uses the helper function {\tt fvar::apply\_derivatives} which takes two parameters:
  726. \begin{enumerate}
  727. \item The highest order derivative to be calculated.
  728. \item A function that maps derivative order to derivative value.
  729. \end{enumerate}
  730. The highest order derivative necessary to be calculated is generally equal to {\tt fvar::order\_sum}. In the case
  731. of {\tt sin} and {\tt cos}, the derivatives are cyclical with period 4. Thus it is sufficient to store only these
  732. 4 values into an array, and take the derivative order modulo 4 as the index into this array.
  733. A second helper function, not shown here, is {\tt apply\_coefficients}. This is used the same as
  734. {\tt apply\_derivatives} except that the supplied function calculates coefficients instead of derivatives.
  735. The relationship between a coefficient $C_n$ and derivative $D_n$ for derivative order $n$ is
  736. \[
  737. C_n = \frac{D_n}{n!}.
  738. \]
  739. Internally, {\tt fvar} holds coefficients rather than derivatives, so in case the coefficient values are more readily
  740. available than the derivatives, it can save some unnecessary computation to use {\tt apply\_coefficients}.
  741. See the definition of {\tt atan} for an example.
  742. Both of these helper functions use Horner's method when calculating the resulting polynomial {\tt fvar}. This works
  743. well when the derivatives are finite, but in cases where derivatives are infinite, this can quickly result in NaN
  744. values as the computation progresses. In these cases, one can call non-Horner versions of both function which
  745. better ``isolate'' infinite values so that they are less likely to evolve into NaN values.
  746. The four helper functions available for constructing new autodiff functions from known coefficients/derivatives are:
  747. \begin{enumerate}
  748. \item {\tt fvar::apply\_coefficients}
  749. \item {\tt fvar::apply\_coefficients\_nonhorner}
  750. \item {\tt fvar::apply\_derivatives}
  751. \item {\tt fvar::apply\_derivatives\_nonhorner}
  752. \end{enumerate}
  753. \section{Function Writing Guidelines}
  754. At a high level there is one fairly simple principle, loosely and intuitively speaking, to writing functions for
  755. which autodiff can effectively calculate derivatives: \\
  756. {\bf Autodiff Function Principle (AFP)}
  757. \begin{displayquote}
  758. A function whose branches in logic correspond to piecewise analytic calculations over non-singleton intervals,
  759. with smooth transitions between the intervals, and is free of indeterminate forms in the calculated value and
  760. higher order derivatives, will work fine with autodiff.
  761. \end{displayquote}
  762. Stating this with greater mathematical rigor can be done. However what seems to be more practical, in this
  763. case, is to give examples and categories of examples of what works, what doesn't, and how to remedy some of the
  764. common problems that may be encountered. That is the approach taken here.
  765. \subsection{Example 1: $f(x)=\max(0,x)$}
  766. One potential implementation of $f(x)=\max(0,x)$ is:
  767. \begin{verbatim}
  768. template<typename T>
  769. T f(const T& x)
  770. {
  771. return 0 < x ? x : 0;
  772. }
  773. \end{verbatim}
  774. Though this is consistent with Section~\ref{compatibility}, there are two problems with it:
  775. \begin{enumerate}
  776. \item {\tt f(nan) = 0}. This problem is independent of autodiff, but is worth addressing anyway. If there is
  777. an indeterminate form that arises within a calculation and is input into $f$, then it gets ``covered up'' by
  778. this implementation leading to an unknowingly incorrect result. Better for functions in general to propagate
  779. NaN values, so that the user knows something went wrong and doesn't rely on an incorrect result, and likewise
  780. the developer can track down where the NaN originated from and remedy it.
  781. \item $f'(0) = 0$ when autodiff is applied. This is because {\tt f} returns 0 as a constant when {\tt x==0}, wiping
  782. out any of the derivatives (or sensitivities) that {\tt x} was holding as an autodiff variable. Instead, let us
  783. apply the AFP and identify the two intervals over which $f$ is defined: $(-\infty,0]\cup(0,\infty)$.
  784. Though the function itself is not analytic at $x=0$, we can attempt somewhat to smooth out this transition
  785. point by averaging the calculation of $f(x)$ at $x=0$ from each interval. If $x<0$ then the result is simply
  786. 0, and if $0<x$ then the result is $x$. The average is $\frac{1}{2}(0 + x)$ which will allow autodiff to
  787. calculate $f'(0)=\frac{1}{2}$. This is a more reasonable answer.
  788. \end{enumerate}
  789. A better implementation that resolves both issues is:
  790. \begin{verbatim}
  791. template<typename T>
  792. T f(const T& x)
  793. {
  794. if (x < 0)
  795. return 0;
  796. else if (x == 0)
  797. return 0.5*x;
  798. else
  799. return x;
  800. }
  801. \end{verbatim}
  802. \subsection{Example 2: $f(x)=\sinc(x)$}
  803. The definition of $\sinc:\mathbb{R}\rightarrow\mathbb{R}$ is
  804. \[
  805. \sinc(x) = \begin{cases}
  806. 1 &\text{if}\; x = 0 \\
  807. \frac{\sin(x)}{x} &\text{otherwise.}\end{cases}
  808. \]
  809. A potential implementation is:
  810. \begin{verbatim}
  811. template<typename T>
  812. T sinc(const T& x)
  813. {
  814. using std::sin;
  815. return x == 0 ? 1 : sin(x) / x;
  816. }
  817. \end{verbatim}
  818. Though this is again consistent with Section~\ref{compatibility}, and returns correct non-derivative values,
  819. it returns a constant when {\tt x==0} thereby losing all derivative information contained in {\tt x} and
  820. contributions from $\sinc$. For example, $\sinc''(0)=-\frac{1}{3}$, however {\tt y.derivative(2) == 0} when
  821. {\tt y = sinc(make\_fvar<double,2>(0))} using the above incorrect implementation. Applying the AFP, the intervals
  822. upon which separate branches of logic are applied are $(-\infty,0)\cup[0,0]\cup(0,\infty)$. The violation occurs
  823. due to the singleton interval $[0,0]$, even though a constant function of 1 is technically analytic. The remedy
  824. is to define a custom $\sinc$ overload and add it to the autodiff library. This has been done. Mathematically, it
  825. is well-defined and free of indeterminate forms, as is the \nth{3} expression in the equalities
  826. \[
  827. \frac{1}{x}\sin(x) = \frac{1}{x}\sum_{n=0}^\infty\frac{(-1)^n}{(2n+1)!}x^{2n+1}
  828. = \sum_{n=0}^\infty\frac{(-1)^n}{(2n+1)!}x^{2n}.
  829. \]
  830. The autodiff library contains helper functions to help write function overloads when the derivatives of a function
  831. are known. This is an advanced feature and documentation for this may be added at a later time.
  832. For now, it is worth understanding the ways in which indeterminate forms can occur within a mathematical calculation,
  833. and avoid them when possible by rewriting the function. Table~\ref{3nans} compares 3 types of indeterminate
  834. forms. Assume the product {\tt a*b} is a positive finite value.
  835. \begin{table}[h]
  836. \centering\begin{tabular}{m{7em}||c|c|c}
  837. & $\displaystyle f(x)=\left(\frac{a}{x}\right)\times(bx^2)$
  838. & $\displaystyle g(x)=\left(\frac{a}{x}\right)\times(bx)$
  839. & $\displaystyle h(x)=\left(\frac{a}{x^2}\right)\times(bx)$ \\[0.618em]
  840. \hline\hline
  841. Mathematical\newline Limit
  842. & $\displaystyle\lim_{x\rightarrow0}f(x) = 0$
  843. & $\displaystyle\lim_{x\rightarrow0}g(x) = ab$
  844. & $\displaystyle\lim_{x\rightarrow0}h(x) = \infty$ \\
  845. \hline
  846. Floating Point\newline Arithmetic
  847. & {\tt f(0) = inf*0 = nan} & {\tt g(0) = inf*0 = nan} & {\tt h(0) = inf*0 = nan}
  848. \end{tabular}
  849. \caption{Automatic differentiation does not compute limits.
  850. Indeterminate forms must be simplified manually. (These cases are not meant to be exhaustive.)}\label{3nans}
  851. \end{table}
  852. Indeterminate forms result in NaN values within a calculation. Mathematically, if they occur at locally isolated
  853. points, then we generally prefer the mathematical limit as the result, even if it is infinite. As demonstrated in
  854. Table~\ref{3nans}, depending upon the nature of the indeterminate form, the mathematical limit can be 0 (no matter
  855. the values of $a$ or $b$), or $ab$, or $\infty$, but these 3 cases cannot be distinguished by the floating point
  856. result of nan. Floating point arithmetic does not perform limits (directly), and neither does the autodiff library.
  857. Thus it is up to the diligence of the developer to keep a watchful eye over where indeterminate forms can arise.
  858. \subsection{Example 3: $f(x)=\sqrt x$ and $f'(0)=\infty$}
  859. When working with functions that have infinite higher order derivatives, this can very quickly result in nans in
  860. higher order derivatives as the computation progresses, as {\tt inf-inf}, {\tt inf/inf}, and {\tt 0*inf} result
  861. in {\tt nan}. See Table~\ref{sqrtnan} for an example.
  862. \begin{table}[h]
  863. \centering\begin{tabular}{c||c|c|c|c}
  864. $f(x)$ & $f(0)$ & $f'(0)$ & $f''(0)$ & $f'''(0)$ \\
  865. \hline\hline
  866. {\tt sqrt(x)} & {\tt 0} & {\tt inf} & {\tt -inf} & {\tt inf} \\
  867. \hline
  868. {\tt sqr(sqrt(x)+1)} & {\tt 1} & {\tt inf} & {\tt nan} & {\tt nan} \\
  869. \hline
  870. {\tt x+2*sqrt(x)+1} & {\tt 1} & {\tt inf} & {\tt -inf}& {\tt inf}
  871. \end{tabular}
  872. \caption{Indeterminate forms in higher order derivatives. {\tt sqr(x) == x*x}.}\label{sqrtnan}
  873. \end{table}
  874. Calling the autodiff-overloaded implementation of $f(x)=\sqrt x$ at the value {\tt x==0} results in the
  875. \nth{1} row (after the header row) of Table~\ref{sqrtnan}, as is mathematically correct. The \nth{2} row shows
  876. $f(x)=(\sqrt{x}+1)^2$ resulting in {\tt nan} values for $f''(0)$ and all higher order derivatives. This is due to
  877. the internal arithmetic in which {\tt inf} is added to {\tt -inf} during the squaring, resulting in a {\tt nan}
  878. value for $f''(0)$ and all higher orders. This is typical of {\tt inf} values in autodiff. Where they show up,
  879. they are correct, however they can quickly introduce {\tt nan} values into the computation upon the addition of
  880. oppositely signed {\tt inf} values, division by {\tt inf}, or multiplication by {\tt 0}. It is worth noting that
  881. the infection of {\tt nan} only spreads upward in the order of derivatives, since lower orders do not depend upon
  882. higher orders (which is also why dropping higher order terms in an autodiff computation does not result in any
  883. loss of precision for lower order terms.)
  884. The resolution in this case is to manually perform the squaring in the computation, replacing the \nth{2} row
  885. with the \nth{3}: $f(x)=x+2\sqrt{x}+1$. Though mathematically equivalent, it allows autodiff to avoid {\tt nan}
  886. values since $\sqrt x$ is more ``isolated'' in the computation. That is, the {\tt inf} values that unavoidably
  887. show up in the derivatives of {\tt sqrt(x)} for {\tt x==0} do not have the chance to interact with other {\tt inf}
  888. values as with the squaring.
  889. \subsection{Summary}
  890. The AFP gives a high-level unified guiding principle for writing C++ template functions that autodiff can
  891. effectively evaluate derivatives for.
  892. Examples have been given to illustrate some common items to avoid doing:
  893. \begin{enumerate}
  894. \item It is not enough for functions to be piecewise continuous. On boundary points between intervals, consider
  895. returning the average expression of both intervals, rather than just one of them. Example: $\max(0,x)$ at $x=0$.
  896. In cases where the limits from both sides must match, and they do not, then {\tt nan} may be a more appropriate
  897. value depending on the application.
  898. \item Avoid returning individual constant values (e.g. $\sinc(0)=1$.) Values must be computed uniformly along
  899. with other values in its local interval. If that is not possible, then the function must be overloaded to
  900. compute the derivatives precisely using the helper functions from Section~\ref{new_functions}.
  901. \item Avoid intermediate indeterminate values in both the value ($\sinc(x)$ at $x=0$) and derivatives
  902. ($(\sqrt{x}+1)^2$ at $x=0$). Try to isolate expressions that may contain infinite values/derivatives so
  903. that they do not introduce NaN values into the computation.
  904. \end{enumerate}
  905. \section{Acknowledgments}
  906. \begin{itemize}
  907. \item Kedar Bhat --- C++11 compatibility, Boost Special Functions compatibility testing, codecov integration,
  908. and feedback.
  909. \item Nick Thompson --- Initial feedback and help with Boost integration.
  910. \item John Maddock --- Initial feedback and help with Boost integration.
  911. \end{itemize}
  912. \begin{thebibliography}{1}
  913. \bibitem{ad} \url{https://en.wikipedia.org/wiki/Automatic\_differentiation}
  914. \bibitem{ed} Andreas Griewank, Andrea Walther. \textit{Evaluating Derivatives}. SIAM, 2nd ed. 2008.
  915. \end{thebibliography}
  916. \end{document}