123456789101112131415161718192021222324252627282930313233343536373839404142434445464748495051525354555657585960616263646566676869707172737475767778798081828384858687888990919293949596979899100101102103104105106107108109110111112113114115116117118119120121122123124125126127128129130 |
- [section Object Code]
- Let's look at some assembly. All assembly here was produced with Clang 4.0
- with `-O3`. Given these definitions:
- [arithmetic_perf_decls]
- Here is a _yap_-based arithmetic function:
- [arithmetic_perf_eval_as_yap_expr]
- and the assembly it produces:
- arithmetic_perf[0x100001c00] <+0>: pushq %rbp
- arithmetic_perf[0x100001c01] <+1>: movq %rsp, %rbp
- arithmetic_perf[0x100001c04] <+4>: mulsd %xmm1, %xmm0
- arithmetic_perf[0x100001c08] <+8>: addsd %xmm2, %xmm0
- arithmetic_perf[0x100001c0c] <+12>: movapd %xmm0, %xmm1
- arithmetic_perf[0x100001c10] <+16>: mulsd %xmm1, %xmm1
- arithmetic_perf[0x100001c14] <+20>: addsd %xmm0, %xmm1
- arithmetic_perf[0x100001c18] <+24>: movapd %xmm1, %xmm0
- arithmetic_perf[0x100001c1c] <+28>: popq %rbp
- arithmetic_perf[0x100001c1d] <+29>: retq
- And for the equivalent function using builtin expressions:
- [arithmetic_perf_eval_as_cpp_expr]
- the assembly is:
- arithmetic_perf[0x100001e10] <+0>: pushq %rbp
- arithmetic_perf[0x100001e11] <+1>: movq %rsp, %rbp
- arithmetic_perf[0x100001e14] <+4>: mulsd %xmm1, %xmm0
- arithmetic_perf[0x100001e18] <+8>: addsd %xmm2, %xmm0
- arithmetic_perf[0x100001e1c] <+12>: movapd %xmm0, %xmm1
- arithmetic_perf[0x100001e20] <+16>: mulsd %xmm1, %xmm1
- arithmetic_perf[0x100001e24] <+20>: addsd %xmm0, %xmm1
- arithmetic_perf[0x100001e28] <+24>: movapd %xmm1, %xmm0
- arithmetic_perf[0x100001e2c] <+28>: popq %rbp
- arithmetic_perf[0x100001e2d] <+29>: retq
- If we increase the number of terminals by a factor of four:
- [arithmetic_perf_eval_as_yap_expr_4x]
- the results are the same: in this simple case, the _yap_ and builtin
- expressions result in the same object code.
- However, increasing the number of terminals by an additional factor of 2.5
- (for a total of 90 terminals), the inliner can no longer do as well for _yap_
- expressions as for builtin ones.
- More complex nonarithmetic code produces more mixed results. For example, here
- is a function using code from the Map Assign example:
- std::map<std::string, int> make_map_with_boost_yap ()
- {
- return map_list_of
- ("<", 1)
- ("<=",2)
- (">", 3)
- (">=",4)
- ("=", 5)
- ("<>",6)
- ;
- }
- By contrast, here is the Boost.Assign version of the same function:
- std::map<std::string, int> make_map_with_boost_assign ()
- {
- return boost::assign::map_list_of
- ("<", 1)
- ("<=",2)
- (">", 3)
- (">=",4)
- ("=", 5)
- ("<>",6)
- ;
- }
- Here is how you might do it "manually":
- std::map<std::string, int> make_map_manually ()
- {
- std::map<std::string, int> retval;
- retval.emplace("<", 1);
- retval.emplace("<=",2);
- retval.emplace(">", 3);
- retval.emplace(">=",4);
- retval.emplace("=", 5);
- retval.emplace("<>",6);
- return retval;
- }
- Finally, here is the same map created from an initializer list:
- std::map<std::string, int> make_map_inializer_list ()
- {
- std::map<std::string, int> retval = {
- {"<", 1},
- {"<=",2},
- {">", 3},
- {">=",4},
- {"=", 5},
- {"<>",6}
- };
- return retval;
- }
- All of these produce roughly the same amount of assembly instructions.
- Benchmarking these four functions with Google Benchmark yields these results:
- [table Runtimes of Different Map Constructions
- [[Function] [Time (ns)]]
- [[make_map_with_boost_yap()] [1285]]
- [[make_map_with_boost_assign()] [1459]]
- [[make_map_manually()] [985]]
- [[make_map_inializer_list()] [954]]
- ]
- The _yap_-based implementation finishes in the middle of the pack.
- In general, the expression trees produced by _yap_ get evaluated down to
- something close to the hand-written equivalent. There is an abstraction
- penalty, but it is small for reasonably-sized expressions.
- [endsect]
|