まずは環境。
( ՞ਊ ՞) :~/dev/crystal $ crystal --version Crystal 0.19.4 [7f82f79] (2016-10-07) ( ՞ਊ ՞) :~/dev/crystal $ cat /etc/issue Ubuntu 16.10 \n \l ( ՞ਊ ՞) :~/dev/crystal $ cat /proc/version Linux version 4.8.0-26-generic (buildd@lgw01-58) (gcc version 6.2.0 20161005 (Ubuntu 6.2.0-5ubuntu12) ) #28-Ubuntu SMP Tue Oct 18 14:39:52 UTC 2016
こういう雑なコードを書く。
arr = [] of Int32 (0..1e7).each do |i| arr << i end
-d つきでビルドしてDWARF吐くように。
( ՞ਊ ՞) :~/dev/crystal $ crystal build -d arr_push.cr ( ՞ਊ ՞) :~/dev/crystal $ crystal build --release -d arr_push.cr # リリースビルド
time (bash組み込み)
まあ一番雑に
( ՞ਊ ՞) :~/dev/crystal $ time ./arr_push real 0m0.198s user 0m0.212s sys 0m0.020s
こちらは releaseビルドの結果。
( ՞ਊ ՞) :~/dev/crystal $ time ./arr_push real 0m0.105s user 0m0.096s sys 0m0.040s
/usr/bin/time
もう少し細かく。見方は man time
見ればなんとなくわかるはず。
( ՞ਊ ՞) :~/dev/crystal $ /usr/bin/time -v ./arr_push Command being timed: "./arr_push" User time (seconds): 0.20 System time (seconds): 0.02 Percent of CPU this job got: 115% Elapsed (wall clock) time (h:mm:ss or m:ss): 0:00.19 Average shared text size (kbytes): 0 Average unshared data size (kbytes): 0 Average stack size (kbytes): 0 Average total size (kbytes): 0 Maximum resident set size (kbytes): 108504 Average resident set size (kbytes): 0 Major (requiring I/O) page faults: 0 Minor (reclaiming a frame) page faults: 17448 Voluntary context switches: 192 Involuntary context switches: 9 Swaps: 0 File system inputs: 0 File system outputs: 0 Socket messages sent: 0 Socket messages received: 0 Signals delivered: 0 Page size (bytes): 4096 Exit status: 0
perf
Linuxならなんだかんだこれが楽ですよね。
( ՞ਊ ՞) :~/dev/crystal $ perf record ./arr_push [ perf record: Woken up 1 times to write data ] [ perf record: Captured and wrote 0.053 MB perf.data (936 samples) ] ( ՞ਊ ՞) :~/dev/crystal $ perf report | head -20 [kernel.kallsyms] with build id 500e786445f3c205298601de61824e582ab15bef not found, continuing without symbols # To display the perf.data header info, please use --header/--header-only options. # # # Total Lost Samples: 17 # # Samples: 936 of event 'cycles:ppp' # Event count (approx.): 559710139 # # Overhead Command Shared Object Symbol # ........ ........ .................. .................................................................... # 16.59% arr_push arr_push [.] GC_mark_from 15.30% arr_push arr_push [.] *Array(Int32)@Array(T)#push<Int32>:Array(Int32) 12.96% arr_push arr_push [.] *Array(Int32)@Array(T)#check_needs_resize:(Pointer(Int32) | Nil) 12.06% arr_push arr_push [.] __crystal_main 7.65% arr_push arr_push [.] *Pointer(Int32)@Pointer(T)#[]=<Int32, Int32>:Int32 7.10% arr_push arr_push [.] *Array(Int32)@Array(T)#<<<Int32>:Array(Int32) 6.67% arr_push arr_push [.] *Pointer(Int32)@Pointer(T)#+<Int32>:Pointer(Int32) 4.70% arr_push [unknown] [k] 0x00007ff8b3229f0a 2.75% arr_push arr_push [.] *Int32@Int#succ:Int32
GC_mark_from
が一番時間かかっていることがわかる。
releaseビルドも同様。
( ՞ਊ ՞) :~/dev/crystal $ perf report | head -20 [kernel.kallsyms] with build id 500e786445f3c205298601de61824e582ab15bef not found, continuing without symbols # To display the perf.data header info, please use --header/--header-only options. # # # Total Lost Samples: 8 # # Samples: 504 of event 'cycles:ppp' # Event count (approx.): 279905518 # # Overhead Command Shared Object Symbol # ........ ........ ................. ................................................... # 32.09% arr_push arr_push [.] GC_mark_from 23.87% arr_push arr_push [.] *Array(Int32)@Array(T)#push<Int32>:Array(Int32) 9.43% arr_push [unknown] [k] 0x00007f8aa3ab5f0a 7.99% arr_push arr_push [.] main 4.57% arr_push libc-2.24.so [.] __memmove_avx_unaligned_erms 3.10% arr_push [kernel.kallsyms] [k] 0xffffffff9ea3c907 1.44% arr_push [kernel.kallsyms] [.] 0xffffffff9ee9fa57 0.81% arr_push libc-2.24.so [.] __memset_avx2_erms 0.61% arr_push [kernel.kallsyms] [k] 0xffffffff9e7e0114
check_needs_resize
とかなくなってるけど、inline展開されたんだろうか。
( ՞ਊ ՞) :~/dev/crystal $ readelf -s ./arr_push| grep check_needs_resize 419: 0000000000021400 197 FUNC GLOBAL DEFAULT 15 *Array(Pointer(Void))@Array(T)#check_needs_resize:(Pointer(Pointer(Void)) | Nil) 683: 000000000002c2f0 197 FUNC GLOBAL DEFAULT 15 *Array(String)@Array(T)#check_needs_resize:(Pointer(String) | Nil) 847: 0000000000032bf0 197 FUNC GLOBAL DEFAULT 15 *Array(Proc(Int32, Nil))@Array(T)#check_needs_resize:(Pointer(Proc(Int32, Nil)) | Nil) 1972: 0000000000034580 197 FUNC GLOBAL DEFAULT 15 *Array(Int32)@Array(T)#check_needs_resize:(Pointer(Int32) | Nil)
いや、関数本体は残るのだっけ。objdump -drS
でみてみよう。
0000000000021310 <*Array(Pointer(Void))@Array(T)#push<Pointer(Void)>:Array(Pointer(Void))>: # # a = ["a", "b"] of (Int32 | String) # a.push("c") # => ["a", "b", "c"] # a.push(1) # => ["a", "b", "c", 1] # ``` def push(value : T) 21310: 41 57 push %r15 21312: 41 56 push %r14 21314: 41 54 push %r12 21316: 53 push %rbx 21317: 50 push %rax 21318: 49 89 f6 mov %rsi,%r14 2131b: 48 89 fb mov %rdi,%rbx zip?(other) { |x, y| pairs << {x, y} } pairs end private def check_needs_resize double_capacity if @size == @capacity 2131e: 8b 43 04 mov 0x4(%rbx),%eax 21321: 3b 43 08 cmp 0x8(%rbx),%eax 21324: 75 2d jne 21353 <*Array(Pointer(Void))@Array(T)#push<Pointer(Void)>:Array(Pointer(Void))+0x43> end private def double_capacity resize_to_capacity(@capacity == 0 ? 3 : (@capacity * 2)) 21326: 8d 0c 00 lea (%rax,%rax,1),%ecx 21329: 85 c0 test %eax,%eax 2132b: 41 bf 03 00 00 00 mov $0x3,%r15d 21331: 44 0f 45 f9 cmovne %ecx,%r15d end
おお、なんかinline化されてる雰囲気だ。LLVMさまさままだな。
- 追記
valgrind(callgrind) を失念していた。。
Rustのcargo profilerとかの裏側はこれ。
( ՞ਊ ՞) :~/dev/crystal $ valgrind --tool=callgrind ./arr_push ==30942== Callgrind, a call-graph generating cache profiler ==30942== Copyright (C) 2002-2015, and GNU GPL'd, by Josef Weidendorfer et al. ==30942== Using Valgrind-3.12.0.SVN and LibVEX; rerun with -h for copyright info ==30942== Command: ./arr_push ==30942== ==30942== For interactive control, run 'callgrind_control -h'. ==30942== brk segment overflow in thread #1: can't grow to 0x4bea000 ==30942== (see section Limitations in user manual) ==30942== ==30942== Events : Ir ==30942== Collected : 1026463326 ==30942== ==30942== I refs: 1,026,463,326
出力された*.outみたいなファイルをkcachegrindに食わせるとGUIないい感じのインターフェースが出てくる。
$ kcachegrind callgrind.out.30942