Performance issue with ReturnedBuffer
Branch codepoint-properties at commit #f203c6e2 has a modified xch which generates output to a ReturnedBuffer() rather than stdout.
However, this seems to be a performance bottleneck with excessive synchronization time required in p2s:
cameron@cs-osl-08:~/parabix-devel/build12$ sudo perf stat bin/xch -prop=slc ~/Wikibooks/wiki-books-all.xml -EnableCycleCounter >wislc1
CYCLE COUNTER:
# NAME ITEMS CYCLES RATE SYNC PART EXPD COPY PIPE EXEC %
1 mmap_source16384@8 649901238 9971611 0.0 41.6 0.0 0.0 0.0 0.0 27.3 0.1 +- 0.0
2 s2p8 649901238 546614405 0.8 0.3 0.0 0.0 7.5 0.0 90.9 3.4 +- 0.0
3 cc_7f8e068d851d690f52bf8d67c46df1bb0ffdeee1+CMCompressed 649901238 87582778 0.1 2.7 0.0 0.1 8.4 0.0 84.3 0.6 +- 0.0
4 adjust_bixnum1x1+CMCompressed 649901238 183158518 0.3 1.1 0.1 0.0 0.0 0.0 96.6 1.2 +- 0.0
5 unitInsertionExtractionMasks1x1_Before 649901238 28898199 0.0 6.0 0.0 0.0 0.0 0.0 85.8 0.2 +- 0.0
6 PopCountP8192 1299802476 20029050 0.0 3.0 0.0 0.0 0.0 0.0 85.9 0.1 +- 0.0
7 FilterByMask64__select_<i1>[1]@0:0_:_select_<i1>[1]@0:0_ 1299802476 239533649 0.2 1.9 0.0 0.0 3.7 0.0 92.8 1.5 +- 0.0
8 PopCountP2048 649901242 18436352 0.0 5.0 0.3 0.0 0.1 0.0 73.8 0.1 +- 0.0
9 streamExpand4:64_8:8 649901242 173845768 0.3 0.8 0.0 0.0 0.5 0.0 96.2 1.1 +- 0.0
10 FieldDeposit64_8 649901242 302462530 0.5 2.1 0.0 0.0 18.5 0.0 77.8 1.9 +- 0.0
11 cc_8c7de60ab46889c81d5f889be46cb35d44eb1bb0+CMCompressed 649901242 153222112 0.2 3.2 0.0 0.0 6.4 0.0 88.0 1.0 +- 0.0
12 cc_d7a768472fc15029ed17c5abad56ff65b4dc4693+CMCompressed 649901242 651987912 1.0 14.2 0.0 0.0 16.6 0.0 68.4 4.1 +- 0.0
13 UTF8_BytePosition+CMCompressed 649901242 201231964 0.3 0.6 0.0 0.0 4.7 0.0 91.9 1.3 +- 0.0
14 u8_delmask2x1+CMCompressed 649901242 189572156 0.3 1.6 0.0 0.0 2.7 0.0 93.6 1.2 +- 0.0
15 PopCountP2048 649901242 18526422 0.0 5.9 0.0 0.0 2.1 0.0 76.9 0.1 +- 0.0
16 UTF8_Target_Class3x1+ins+del+CMCompressed 649901242 177515785 0.3 0.3 0.0 0.0 0.9 0.0 95.9 1.1 +- 0.0
17 u8_transform_bits_16x1+ins+del+CMCompressed 649901242 572807532 0.9 3.3 0.0 0.0 0.0 0.0 95.7 3.6 +- 0.0
18 fieldCompress64__select_<i1>[1]@0:0_:_select_<i1>[8]@0:01234567_ 649901242 231152280 0.4 0.4 0.0 0.0 0.0 0.0 97.7 1.5 +- 0.0
19 streamCompress64_8 649901242 294243971 0.5 0.2 0.0 0.0 3.6 0.0 94.0 1.8 +- 0.0
20 p2s_8 649889520 11796413425 18.2 65.2 0.0 32.6 0.0 0.0 2.1 74.1 +- 0.1
TOTAL: 49.3 0.0 24.2 1.6 0.0 24.2 99.9
xlated buffer length: 649889520
Performance counter stats for 'bin/xch -prop=slc /home/cameron/Wikibooks/wiki-books-all.xml -EnableCycleCounter':
5,661.78 msec task-clock # 2.816 CPUs utilized
67 context-switches # 11.834 /sec
0 cpu-migrations # 0.000 /sec
487,093 page-faults # 86.032 K/sec
15,706,883,693 cycles # 2.774 GHz
20,438,537,075 instructions # 1.30 insn per cycle
4,525,549,271 branches # 799.315 M/sec
2,807,389 branch-misses # 0.06% of all branches
2.010657317 seconds time elapsed
4.601109000 seconds user
1.060255000 seconds sys
To upload designs, you'll need to enable LFS and have an admin enable hashed storage. More information