Performance issues: gb18030
The gb18030 application transcodes files using the GB18030 character encoding into equivalent UTF-8 files. It is an interesting application with a fairly long pipeline.
Performance Study
We use the GB18030 file zhwikibooks-20141225-pages-articles.xml
in QA/gb18030/TestFiles
for performance
studies. This is an 18MB file that can be converted to UTF-8 format with the following command:
bin/gb18030 ../QA/gb18030/TestFiles/zhwikibooks-20141225-pages-articles.xml > zhw.u8
Single-Threaded Performance Regression
Current single-threaded performance is substantially poorer than earlier check-ins.
2f0cd368
Revision git-cameron@Robs-iMac-Pro build9 % time bin/gb18030 -thread-num=1 -EnableCycleCounter ../QA/gb18030/TestFiles/zhwikibooks-20141225-pages-articles.xml -segment-size=16384 >zh.u8
CYCLE COUNTER:
# NAME ITEMS CYCLES RATE SYNC PART EXPD COPY PIPE EXEC %
1 mmap_source16384@8 1.82E+07 3.75E+05 0.02 5.9 0.0 0.0 0.0 31.7 62.4 0.0
2 s2p8 1.82E+07 3.00E+07 1.64 0.1 0.0 0.9 0.0 0.9 98.1 2.6
3 GB_18030_Parser+CMCompressed 1.82E+07 7.17E+06 0.39 0.3 0.0 2.3 0.0 2.4 94.9 0.6
4 GB_18030_ExtractionMasks+CMCompressed 1.82E+07 8.84E+06 0.48 0.2 0.0 3.4 0.0 7.9 88.5 0.8
5 ErrorMonitorKernel1 1.82E+07 1.04E+07 0.57 0.2 0.0 2.2 0.0 10.8 86.7 0.9
6 PopCountP256 1.82E+07 1.48E+06 0.08 1.5 0.0 3.6 0.0 18.1 76.9 0.1
7 PopCountP256 1.82E+07 1.32E+06 0.07 2.6 0.0 2.9 0.0 12.2 82.3 0.1
8 PopCountP256 1.82E+07 1.37E+06 0.08 2.3 0.0 2.6 0.0 15.9 79.2 0.1
9 PopCountP256 1.82E+07 1.38E+06 0.08 2.6 0.0 2.3 0.0 15.8 79.4 0.1
10 fieldCompress64__select_<i1>[1]@0:0_:_select_<i1>[8]@0:01234567_ 1.82E+07 4.04E+06 0.22 0.8 0.0 0.0 0.0 10.9 88.3 0.3
11 fieldCompress64__select_<i1>[1]@0:0_:_select_<i1>[8]@0:0123_ 1.82E+07 2.48E+06 0.14 1.1 0.0 0.0 0.0 9.5 89.4 0.2
12 streamCompress64_8 1.82E+07 3.06E+07 1.68 0.1 0.0 23.6 0.0 1.4 74.9 2.6
13 streamCompress64_4 1.82E+07 1.91E+07 1.05 0.1 0.0 17.9 0.0 1.2 80.8 1.6
14 fieldCompress64__select_<i1>[1]@0:0_:_select_<i1>[8]@0:0123_ 1.82E+07 2.53E+06 0.14 1.6 0.0 0.0 0.0 9.4 89.0 0.2
15 streamCompress64_4 1.82E+07 1.89E+07 1.03 0.2 0.0 17.8 0.0 1.7 80.4 1.6
16 fieldCompress64__select_<i1>[1]@0:0_:_select_<i1>[8]@0:01234567_ 1.82E+07 4.00E+06 0.22 0.6 0.0 0.0 0.0 8.9 90.5 0.3
17 streamCompress64_8 1.82E+07 2.97E+07 1.63 0.1 0.0 22.7 0.0 0.9 76.3 2.5
18 fieldCompress64__select_<i1>[1]@0:0_:_select_<i1>[1]@0:0_ 1.82E+07 1.33E+06 0.07 1.7 0.0 0.0 0.0 17.1 81.2 0.1
19 streamCompress64_1 1.82E+07 1.04E+07 0.57 0.2 0.0 8.5 0.0 2.5 88.8 0.9
20 GB_18030_InitializeASCII+CMCompressed 1.53E+07 4.91E+06 0.32 0.5 0.0 3.1 0.0 6.6 89.9 0.4
21 GB_18030_DoubleByteIndex+CMCompressed 1.53E+07 1.22E+07 0.79 0.2 0.0 0.0 0.0 4.8 95.0 1.0
22 GB_18030_DoubleByteRangeKernel0-2047+CMCompressed 1.53E+07 2.43E+07 1.59 0.1 0.0 0.0 0.0 1.1 98.8 2.1
23 GB_18030_DoubleByteRangeKernel2048-4095+CMCompressed 1.53E+07 2.47E+07 1.61 0.1 0.0 0.0 0.0 0.9 99.0 2.1
24 GB_18030_DoubleByteRangeKernel4096-6143+CMCompressed 1.53E+07 2.06E+07 1.34 0.2 0.0 0.0 0.0 1.3 98.5 1.8
25 GB_18030_DoubleByteRangeKernel6144-8191+CMCompressed 1.53E+07 2.76E+07 1.80 0.1 0.0 0.0 0.0 0.8 99.2 2.4
26 GB_18030_DoubleByteRangeKernel8192-10239+CMCompressed 1.53E+07 1.21E+08 7.87 0.0 0.0 0.0 0.0 0.2 99.7 10.3
27 GB_18030_DoubleByteRangeKernel10240-12287+CMCompressed 1.53E+07 1.61E+08 10.49 0.0 0.0 0.0 0.0 0.2 99.8 13.7
28 GB_18030_DoubleByteRangeKernel12288-14335+CMCompressed 1.53E+07 1.48E+08 9.67 0.0 0.0 0.0 0.0 0.1 99.9 12.6
29 GB_18030_DoubleByteRangeKernel14336-16383+CMCompressed 1.53E+07 1.80E+08 11.76 0.0 0.0 0.0 0.0 0.2 99.8 15.3
30 GB_18030_DoubleByteRangeKernel16384-18431+CMCompressed 1.53E+07 3.91E+07 2.55 0.1 0.0 0.0 0.0 0.7 99.2 3.3
31 GB_18030_DoubleByteRangeKernel18432-20479+CMCompressed 1.53E+07 2.03E+07 1.32 0.1 0.0 0.0 0.0 1.3 98.6 1.7
32 GB_18030_DoubleByteRangeKernel20480-22527+CMCompressed 1.53E+07 1.73E+07 1.13 0.2 0.0 0.0 0.0 1.6 98.2 1.5
33 GB_18030_DoubleByteRangeKernel22528-24575+CMCompressed 1.53E+07 1.33E+07 0.87 0.2 0.0 1.1 0.0 2.5 96.2 1.1
34 GB_18030_FourByteLogic+CMCompressed 1.53E+07 2.67E+07 1.74 0.3 0.0 0.2 0.0 3.2 96.3 2.3
35 u8depositMask 1.53E+07 7.14E+06 0.47 0.3 0.0 0.0 0.0 4.1 95.6 0.6
36 PopCountP256 6.13E+07 4.12E+06 0.07 0.5 0.0 5.3 0.0 4.8 89.4 0.4
37 streamCompress64_1 6.13E+07 2.98E+07 0.49 0.1 0.0 0.2 0.0 0.6 99.1 2.5
38 PopCountP256 2.05E+07 1.97E+06 0.10 1.1 0.0 3.6 0.0 32.2 63.2 0.2
39 streamExpand64_21:6 2.05E+07 8.20E+06 0.40 0.3 0.0 0.0 0.0 5.6 94.1 0.7
40 FieldDeposit64_6 2.05E+07 3.58E+06 0.17 0.6 0.0 1.3 0.0 4.8 93.4 0.3
41 UTF8_DepositMasks+CMCompressed 2.05E+07 3.41E+06 0.17 0.7 0.0 1.7 0.0 11.2 86.4 0.3
42 PopCountP256 2.05E+07 1.51E+06 0.07 1.5 0.0 2.7 0.0 13.9 81.8 0.1
43 PopCountP256 2.05E+07 1.52E+06 0.07 1.5 0.0 2.6 0.0 15.3 80.6 0.1
44 PopCountP256 2.05E+07 1.56E+06 0.08 1.4 0.0 2.3 0.0 14.9 81.4 0.1
45 streamExpand64_21:3 2.05E+07 6.35E+06 0.31 0.4 0.0 0.0 0.0 8.2 91.5 0.5
46 FieldDeposit64_3 2.05E+07 2.04E+06 0.10 1.1 0.0 0.0 0.0 10.2 88.7 0.2
47 streamExpand64_21:6 2.05E+07 7.73E+06 0.38 0.3 0.0 0.0 0.0 4.2 95.6 0.7
48 FieldDeposit64_6 2.05E+07 3.30E+06 0.16 0.7 0.0 0.0 0.0 5.5 93.8 0.3
49 streamExpand64_21:6 2.05E+07 7.50E+06 0.37 0.3 0.0 0.0 0.0 3.8 95.9 0.6
50 FieldDeposit64_6 2.05E+07 3.62E+06 0.18 1.3 0.0 0.0 0.0 6.5 92.2 0.3
51 UTF8assembly+CMCompressed 2.05E+07 6.02E+06 0.29 0.7 0.0 0.0 0.0 10.0 89.2 0.5
52 p2s_8 2.05E+07 8.15E+06 0.40 0.3 0.0 4.6 0.0 1.3 93.8 0.7
53 stdout8 2.05E+07 4.05E+07 1.97 0.1 0.0 0.0 0.0 0.6 99.3 3.4
bin/gb18030 -thread-num=1 -EnableCycleCounter -segment-size=16384 > zh.u8 0.36s user 0.03s system 98% cpu 0.395 total
2f0cd368
Revision git-cameron@Robs-iMac-Pro build9 % time bin/gb18030 -thread-num=1 -EnableCycleCounter ../QA/gb18030/TestFiles/zhwikibooks-20141225-pages-articles.xml -segment-size=16384 >zh.u8
CYCLE COUNTER:
# NAME ITEMS CYCLES RATE SYNC BUFF COPY PIPE EXEC %
1 mmap_source16384@8 1.82E+07 2.77E+05 0.02 0.0 0.0 0.0 47.6 52.4 0.1
2 s2p8 1.82E+07 2.74E+07 1.50 0.0 0.0 0.0 0.9 99.1 5.0
3 GB_18030_Parser+CMCompressed 1.82E+07 6.80E+06 0.37 0.0 0.2 0.3 7.3 92.2 1.2
4 GB_18030_ExtractionMasks+CMCompressed 1.82E+07 4.52E+06 0.25 0.0 0.0 0.0 13.8 86.2 0.8
5 ErrorMonitorKernel1 1.82E+07 4.59E+06 0.25 0.0 0.0 0.0 21.1 78.9 0.8
6 fieldCompress64__select_<i1>[1]@0:0_:_select_<i1>[1]@0:0_ 1.82E+07 1.02E+06 0.06 0.0 0.0 0.0 39.2 60.8 0.2
7 fieldCompress64__select_<i1>[1]@0:0_:_select_<i1>[8]@0:01234567_ 1.82E+07 2.97E+06 0.16 0.0 0.0 0.0 11.6 88.4 0.5
8 fieldCompress64__select_<i1>[1]@0:0_:_select_<i1>[8]@0:0123_ 1.82E+07 1.92E+06 0.11 0.0 0.0 0.0 17.4 82.6 0.4
9 fieldCompress64__select_<i1>[1]@0:0_:_select_<i1>[8]@0:01234567_ 1.82E+07 2.88E+06 0.16 0.0 0.0 0.0 10.7 89.3 0.5
10 fieldCompress64__select_<i1>[1]@0:0_:_select_<i1>[8]@0:0123_ 1.82E+07 1.63E+06 0.09 0.0 0.0 0.0 18.0 82.0 0.3
11 PopCountP256 1.82E+07 1.10E+06 0.06 0.0 0.0 0.3 14.7 85.0 0.2
12 streamCompress64_1 1.82E+07 6.99E+06 0.38 0.0 0.0 0.1 12.6 87.3 1.3
13 streamCompress64_8 1.82E+07 1.60E+07 0.88 0.0 0.0 0.1 5.6 94.3 2.9
14 GB_18030_InitializeASCII+CMCompressed 1.53E+07 4.14E+06 0.27 0.0 0.0 0.0 10.3 89.7 0.8
15 PopCountP256 1.82E+07 1.20E+06 0.07 0.0 0.0 0.3 19.7 80.1 0.2
16 streamCompress64_4 1.82E+07 9.02E+06 0.49 0.0 0.0 0.1 8.9 91.0 1.6
17 PopCountP256 1.82E+07 1.15E+06 0.06 0.0 0.0 0.3 19.5 80.2 0.2
18 streamCompress64_8 1.82E+07 1.52E+07 0.83 0.0 0.0 0.1 5.9 94.1 2.8
19 GB_18030_DoubleByteIndex+CMCompressed 1.53E+07 6.56E+06 0.43 0.0 0.0 0.0 7.9 92.1 1.2
20 GB_18030_DoubleByteRangeKernel0-2047+CMCompressed 1.53E+07 1.04E+07 0.68 0.0 0.0 0.0 2.5 97.5 1.9
21 GB_18030_DoubleByteRangeKernel2048-4095+CMCompressed 1.53E+07 1.01E+07 0.66 0.0 0.0 0.0 2.3 97.7 1.8
22 GB_18030_DoubleByteRangeKernel4096-6143+CMCompressed 1.53E+07 9.03E+06 0.59 0.0 0.0 0.0 2.4 97.6 1.6
23 GB_18030_DoubleByteRangeKernel6144-8191+CMCompressed 1.53E+07 9.89E+06 0.65 0.0 0.0 0.0 2.4 97.6 1.8
24 GB_18030_DoubleByteRangeKernel8192-10239+CMCompressed 1.53E+07 4.01E+07 2.62 0.0 0.0 0.0 0.6 99.4 7.3
25 GB_18030_DoubleByteRangeKernel10240-12287+CMCompressed 1.53E+07 5.74E+07 3.74 0.0 0.0 0.0 0.4 99.6 10.4
26 GB_18030_DoubleByteRangeKernel12288-14335+CMCompressed 1.53E+07 5.26E+07 3.43 0.0 0.0 0.0 0.5 99.5 9.6
27 GB_18030_DoubleByteRangeKernel14336-16383+CMCompressed 1.53E+07 6.31E+07 4.12 0.0 0.0 0.0 0.4 99.6 11.5
28 GB_18030_DoubleByteRangeKernel16384-18431+CMCompressed 1.53E+07 1.45E+07 0.95 0.0 0.0 0.0 1.8 98.2 2.6
29 GB_18030_DoubleByteRangeKernel18432-20479+CMCompressed 1.53E+07 9.21E+06 0.60 0.0 0.0 0.0 3.0 97.0 1.7
30 GB_18030_DoubleByteRangeKernel20480-22527+CMCompressed 1.53E+07 8.22E+06 0.54 0.0 0.0 0.0 2.9 97.1 1.5
31 GB_18030_DoubleByteRangeKernel22528-24575+CMCompressed 1.53E+07 7.25E+06 0.47 0.0 0.0 0.0 4.6 95.4 1.3
32 PopCountP256 1.82E+07 1.28E+06 0.07 0.0 0.0 0.2 20.1 79.7 0.2
33 streamCompress64_4 1.82E+07 9.42E+06 0.52 0.0 0.0 0.2 11.5 88.3 1.7
34 GB_18030_FourByteLogic+CMCompressed 1.53E+07 1.34E+07 0.87 0.0 0.0 1.4 8.6 90.0 2.4
35 u8depositMask 1.53E+07 6.46E+06 0.42 0.0 0.0 0.0 4.9 95.1 1.2
36 PopCountP256 6.13E+07 2.96E+06 0.05 0.0 0.0 0.2 9.0 90.8 0.5
37 streamCompress64_1 6.13E+07 1.91E+07 0.31 0.0 0.0 0.0 3.5 96.4 3.5
38 UTF8_DepositMasks+CMCompressed 2.05E+07 2.37E+06 0.12 0.0 0.0 0.0 21.3 78.7 0.4
39 PopCountP256 2.05E+07 1.28E+06 0.06 0.0 0.0 0.2 25.3 74.5 0.2
40 PopCountP256 2.05E+07 1.33E+06 0.06 0.0 0.0 0.2 27.5 72.3 0.2
41 PopCountP256 2.05E+07 1.26E+06 0.06 0.0 0.0 0.2 26.7 73.1 0.2
42 PopCountP256 2.05E+07 1.28E+06 0.06 0.0 0.0 0.2 24.6 75.3 0.2
43 streamExpand64_21:3 2.05E+07 5.62E+06 0.27 0.0 0.0 0.0 12.1 87.9 1.0
44 FieldDeposit64_3 2.05E+07 1.56E+06 0.08 0.0 0.0 0.0 18.8 81.2 0.3
45 streamExpand64_21:6 2.05E+07 7.32E+06 0.36 0.0 0.0 0.0 9.2 90.8 1.3
46 FieldDeposit64_6 2.05E+07 2.70E+06 0.13 0.0 0.0 0.0 13.8 86.2 0.5
47 streamExpand64_21:6 2.05E+07 7.39E+06 0.36 0.0 0.0 0.0 9.2 90.8 1.3
48 FieldDeposit64_6 2.05E+07 2.73E+06 0.13 0.0 0.0 0.0 12.2 87.8 0.5
49 streamExpand64_21:6 2.05E+07 7.42E+06 0.36 0.0 0.0 0.0 10.1 89.9 1.3
50 FieldDeposit64_6 2.05E+07 2.81E+06 0.14 0.0 0.0 0.0 14.6 85.4 0.5
51 UTF8assembly+CMCompressed 2.05E+07 5.34E+06 0.26 0.0 0.0 0.0 18.8 81.2 1.0
52 p2s_8 2.05E+07 6.53E+06 0.32 0.0 0.0 0.0 3.2 96.8 1.2
53 stdout8 2.05E+07 3.29E+07 1.61 0.0 0.0 0.0 0.3 99.7 6.0
bin/gb18030 -thread-num=1 -EnableCycleCounter -segment-size=16384 > zh.u8 0.17s user 0.02s system 98% cpu 0.193 total
To upload designs, you'll need to enable LFS and have an admin enable hashed storage. More information