advanced_topics.qbk 6.3 KB

123456789101112131415161718192021222324252627282930313233343536373839404142434445464748495051525354555657585960616263646566676869707172737475767778798081828384858687888990919293949596979899100101102103104105106107108109110111112113114115116117118119120121122123124125126127128129130131132133134135136137138139140141142143144145146147148149150151152153154155156157158159160161162163164165166167168169170171172173174175176177178179180181182183184185186187188189190191192193194195196197198199200201202203204205206207208209210
  1. [/===========================================================================
  2. Copyright (c) 2013-2015 Kyle Lutz <kyle.r.lutz@gmail.com>
  3. Distributed under the Boost Software License, Version 1.0
  4. See accompanying file LICENSE_1_0.txt or copy at
  5. http://www.boost.org/LICENSE_1_0.txt
  6. =============================================================================/]
  7. [section Advanced Topics]
  8. The following topics show advanced features of the Boost Compute library.
  9. [section Vector Data Types]
  10. In addition to the built-in scalar types (e.g. `int` and `float`), OpenCL
  11. also provides vector data types (e.g. `int2` and `vector4`). These can be
  12. used with the Boost Compute library on both the host and device.
  13. Boost.Compute provides typedefs for these types which take the form:
  14. `boost::compute::scalarN_` where `scalar` is a scalar data type (e.g. `int`,
  15. `float`, `char`) and `N` is the size of the vector. Supported vector sizes
  16. are: 2, 4, 8, and 16.
  17. The following example shows how to transfer a set of 3D points stored as an
  18. array of `float`s on the host the device and then calculate the sum of the
  19. point coordinates using the [funcref boost::compute::accumulate accumulate()]
  20. function. The sum is transferred to the host and the centroid computed by
  21. dividing by the total number of points.
  22. Note that even though the points are in 3D, they are stored as `float4` due to
  23. OpenCL's alignment requirements.
  24. [import ../example/point_centroid.cpp]
  25. [point_centroid_example]
  26. [endsect] [/ vector data types]
  27. [section Custom Functions]
  28. The OpenCL runtime and the Boost Compute library provide a number of built-in
  29. functions such as sqrt() and dot() but many times these are not sufficient for
  30. solving the problem at hand.
  31. The Boost Compute library provides a few different ways to create custom
  32. functions that can be passed to the provided algorithms such as
  33. [funcref boost::compute::transform transform()] and
  34. [funcref boost::compute::reduce reduce()].
  35. The most basic method is to provide the raw source code for a function:
  36. ``
  37. boost::compute::function<int (int)> add_four =
  38. boost::compute::make_function_from_source<int (int)>(
  39. "add_four",
  40. "int add_four(int x) { return x + 4; }"
  41. );
  42. boost::compute::transform(input.begin(), input.end(), output.begin(), add_four, queue);
  43. ``
  44. This can also be done more succinctly using the [macroref BOOST_COMPUTE_FUNCTION
  45. BOOST_COMPUTE_FUNCTION()] macro:
  46. ``
  47. BOOST_COMPUTE_FUNCTION(int, add_four, (int x),
  48. {
  49. return x + 4;
  50. });
  51. boost::compute::transform(input.begin(), input.end(), output.begin(), add_four, queue);
  52. ``
  53. Also see [@http://kylelutz.blogspot.com/2014/03/custom-opencl-functions-in-c-with.html
  54. "Custom OpenCL functions in C++ with Boost.Compute"] for more details.
  55. [endsect] [/ custom functions]
  56. [section Custom Types]
  57. Boost.Compute provides the [macroref BOOST_COMPUTE_ADAPT_STRUCT
  58. BOOST_COMPUTE_ADAPT_STRUCT()] macro which allows a C++ struct/class to be
  59. wrapped and used in OpenCL.
  60. [endsect] [/ custom types]
  61. [section Complex Values]
  62. While OpenCL itself doesn't natively support complex data types, the Boost
  63. Compute library provides them.
  64. To use complex values first include the following header:
  65. ``
  66. #include <boost/compute/types/complex.hpp>
  67. ``
  68. A vector of complex values can be created like so:
  69. ``
  70. // create vector on device
  71. boost::compute::vector<std::complex<float> > vector;
  72. // insert two complex values
  73. vector.push_back(std::complex<float>(1.0f, 3.0f));
  74. vector.push_back(std::complex<float>(2.0f, 4.0f));
  75. ``
  76. [endsect] [/ complex values]
  77. [section Lambda Expressions]
  78. The lambda expression framework allows for functions and predicates to be
  79. defined at the call-site of an algorithm.
  80. Lambda expressions use the placeholders `_1` and `_2` to indicate the
  81. arguments. The following declarations will bring the lambda placeholders into
  82. the current scope:
  83. ``
  84. using boost::compute::lambda::_1;
  85. using boost::compute::lambda::_2;
  86. ``
  87. The following examples show how to use lambda expressions along with the
  88. Boost.Compute algorithms to perform more complex operations on the device.
  89. To count the number of odd values in a vector:
  90. ``
  91. boost::compute::count_if(vector.begin(), vector.end(), _1 % 2 == 1, queue);
  92. ``
  93. To multiply each value in a vector by three and subtract four:
  94. ``
  95. boost::compute::transform(vector.begin(), vector.end(), vector.begin(), _1 * 3 - 4, queue);
  96. ``
  97. Lambda expressions can also be used to create function<> objects:
  98. ``
  99. boost::compute::function<int(int)> add_four = _1 + 4;
  100. ``
  101. [endsect] [/ lambda expressions]
  102. [section Asynchronous Operations]
  103. A major performance bottleneck in GPGPU applications is memory transfer. This
  104. can be alleviated by overlapping memory transfer with computation. The Boost
  105. Compute library provides the [funcref boost::compute::copy_async copy_async()]
  106. function which performs an asynchronous memory transfers between the host and
  107. the device.
  108. For example, to initiate a copy from the host to the device and then perform
  109. other actions:
  110. ``
  111. // data on the host
  112. std::vector<float> host_vector = ...
  113. // create a vector on the device
  114. boost::compute::vector<float> device_vector(host_vector.size(), context);
  115. // copy data to the device asynchronously
  116. boost::compute::future<void> f = boost::compute::copy_async(
  117. host_vector.begin(), host_vector.end(), device_vector.begin(), queue
  118. );
  119. // perform other work on the host or device
  120. // ...
  121. // ensure the copy is completed
  122. f.wait();
  123. // use data on the device (e.g. sort)
  124. boost::compute::sort(device_vector.begin(), device_vector.end(), queue);
  125. ``
  126. [endsect] [/ asynchronous operations]
  127. [section Performance Timing]
  128. For example, to measure the time to copy a vector of data from the host to the
  129. device:
  130. [import ../example/time_copy.cpp]
  131. [time_copy_example]
  132. [endsect]
  133. [section OpenCL API Interoperability]
  134. The Boost Compute library is designed to easily interoperate with the OpenCL
  135. API. All of the wrapped classes have conversion operators to their underlying
  136. OpenCL types which allows them to be passed directly to the OpenCL functions.
  137. For example,
  138. ``
  139. // create context object
  140. boost::compute::context ctx = boost::compute::default_context();
  141. // query number of devices using the OpenCL API
  142. cl_uint num_devices;
  143. clGetContextInfo(ctx, CL_CONTEXT_NUM_DEVICES, sizeof(cl_uint), &num_devices, 0);
  144. std::cout << "num_devices: " << num_devices << std::endl;
  145. ``
  146. [endsect] [/ opencl api interoperability]
  147. [endsect] [/ advanced topics]