codegen.qbk 4.0 KB

123456789101112131415161718192021222324252627282930313233343536373839404142434445464748495051525354555657585960616263646566676869707172737475767778798081828384858687888990919293949596979899100101102103104105106107108109110111112113114115116117118119120121122123124125126127128129130
  1. [section Object Code]
  2. Let's look at some assembly. All assembly here was produced with Clang 4.0
  3. with `-O3`. Given these definitions:
  4. [arithmetic_perf_decls]
  5. Here is a _yap_-based arithmetic function:
  6. [arithmetic_perf_eval_as_yap_expr]
  7. and the assembly it produces:
  8. arithmetic_perf[0x100001c00] <+0>: pushq %rbp
  9. arithmetic_perf[0x100001c01] <+1>: movq %rsp, %rbp
  10. arithmetic_perf[0x100001c04] <+4>: mulsd %xmm1, %xmm0
  11. arithmetic_perf[0x100001c08] <+8>: addsd %xmm2, %xmm0
  12. arithmetic_perf[0x100001c0c] <+12>: movapd %xmm0, %xmm1
  13. arithmetic_perf[0x100001c10] <+16>: mulsd %xmm1, %xmm1
  14. arithmetic_perf[0x100001c14] <+20>: addsd %xmm0, %xmm1
  15. arithmetic_perf[0x100001c18] <+24>: movapd %xmm1, %xmm0
  16. arithmetic_perf[0x100001c1c] <+28>: popq %rbp
  17. arithmetic_perf[0x100001c1d] <+29>: retq
  18. And for the equivalent function using builtin expressions:
  19. [arithmetic_perf_eval_as_cpp_expr]
  20. the assembly is:
  21. arithmetic_perf[0x100001e10] <+0>: pushq %rbp
  22. arithmetic_perf[0x100001e11] <+1>: movq %rsp, %rbp
  23. arithmetic_perf[0x100001e14] <+4>: mulsd %xmm1, %xmm0
  24. arithmetic_perf[0x100001e18] <+8>: addsd %xmm2, %xmm0
  25. arithmetic_perf[0x100001e1c] <+12>: movapd %xmm0, %xmm1
  26. arithmetic_perf[0x100001e20] <+16>: mulsd %xmm1, %xmm1
  27. arithmetic_perf[0x100001e24] <+20>: addsd %xmm0, %xmm1
  28. arithmetic_perf[0x100001e28] <+24>: movapd %xmm1, %xmm0
  29. arithmetic_perf[0x100001e2c] <+28>: popq %rbp
  30. arithmetic_perf[0x100001e2d] <+29>: retq
  31. If we increase the number of terminals by a factor of four:
  32. [arithmetic_perf_eval_as_yap_expr_4x]
  33. the results are the same: in this simple case, the _yap_ and builtin
  34. expressions result in the same object code.
  35. However, increasing the number of terminals by an additional factor of 2.5
  36. (for a total of 90 terminals), the inliner can no longer do as well for _yap_
  37. expressions as for builtin ones.
  38. More complex nonarithmetic code produces more mixed results. For example, here
  39. is a function using code from the Map Assign example:
  40. std::map<std::string, int> make_map_with_boost_yap ()
  41. {
  42. return map_list_of
  43. ("<", 1)
  44. ("<=",2)
  45. (">", 3)
  46. (">=",4)
  47. ("=", 5)
  48. ("<>",6)
  49. ;
  50. }
  51. By contrast, here is the Boost.Assign version of the same function:
  52. std::map<std::string, int> make_map_with_boost_assign ()
  53. {
  54. return boost::assign::map_list_of
  55. ("<", 1)
  56. ("<=",2)
  57. (">", 3)
  58. (">=",4)
  59. ("=", 5)
  60. ("<>",6)
  61. ;
  62. }
  63. Here is how you might do it "manually":
  64. std::map<std::string, int> make_map_manually ()
  65. {
  66. std::map<std::string, int> retval;
  67. retval.emplace("<", 1);
  68. retval.emplace("<=",2);
  69. retval.emplace(">", 3);
  70. retval.emplace(">=",4);
  71. retval.emplace("=", 5);
  72. retval.emplace("<>",6);
  73. return retval;
  74. }
  75. Finally, here is the same map created from an initializer list:
  76. std::map<std::string, int> make_map_inializer_list ()
  77. {
  78. std::map<std::string, int> retval = {
  79. {"<", 1},
  80. {"<=",2},
  81. {">", 3},
  82. {">=",4},
  83. {"=", 5},
  84. {"<>",6}
  85. };
  86. return retval;
  87. }
  88. All of these produce roughly the same amount of assembly instructions.
  89. Benchmarking these four functions with Google Benchmark yields these results:
  90. [table Runtimes of Different Map Constructions
  91. [[Function] [Time (ns)]]
  92. [[make_map_with_boost_yap()] [1285]]
  93. [[make_map_with_boost_assign()] [1459]]
  94. [[make_map_manually()] [985]]
  95. [[make_map_inializer_list()] [954]]
  96. ]
  97. The _yap_-based implementation finishes in the middle of the pack.
  98. In general, the expression trees produced by _yap_ get evaluated down to
  99. something close to the hand-written equivalent. There is an abstraction
  100. penalty, but it is small for reasonably-sized expressions.
  101. [endsect]