Skip to content

GitLab

  • Projects
  • Groups
  • Snippets
  • Help
    • Loading...
  • Help
    • Help
    • Support
    • Community forum
    • Submit feedback
    • Contribute to GitLab
  • Sign in
P parabix-devel
  • Project overview
    • Project overview
    • Details
    • Activity
    • Releases
  • Repository
    • Repository
    • Files
    • Commits
    • Branches
    • Tags
    • Contributors
    • Graph
    • Compare
  • Issues 9
    • Issues 9
    • List
    • Boards
    • Labels
    • Service Desk
    • Milestones
  • Merge requests 2
    • Merge requests 2
  • CI/CD
    • CI/CD
    • Pipelines
    • Jobs
    • Schedules
  • Operations
    • Operations
    • Incidents
    • Environments
  • Packages & Registries
    • Packages & Registries
    • Container Registry
  • Analytics
    • Analytics
    • CI/CD
    • Repository
    • Value Stream
  • Wiki
    • Wiki
  • Snippets
    • Snippets
  • Members
    • Members
  • Activity
  • Graph
  • Create a new issue
  • Jobs
  • Commits
  • Issue Boards
Collapse sidebar
  • cameron
  • parabix-devel
  • Issues
  • #21

Closed
Open
Created Mar 04, 2021 by cameron@cameronMaintainer

Performance issues: gb18030

The gb18030 application transcodes files using the GB18030 character encoding into equivalent UTF-8 files. It is an interesting application with a fairly long pipeline.

Performance Study

We use the GB18030 file zhwikibooks-20141225-pages-articles.xml in QA/gb18030/TestFiles for performance studies. This is an 18MB file that can be converted to UTF-8 format with the following command:

bin/gb18030 ../QA/gb18030/TestFiles/zhwikibooks-20141225-pages-articles.xml > zhw.u8

Single-Threaded Performance Regression

Current single-threaded performance is substantially poorer than earlier check-ins.

Revision git-2f0cd368

cameron@Robs-iMac-Pro build9 % time bin/gb18030 -thread-num=1 -EnableCycleCounter ../QA/gb18030/TestFiles/zhwikibooks-20141225-pages-articles.xml -segment-size=16384  >zh.u8
CYCLE COUNTER:

  # NAME                                                                     ITEMS     CYCLES       RATE  SYNC  PART  EXPD  COPY  PIPE   EXEC     %
  1 mmap_source16384@8                                                    1.82E+07   3.75E+05       0.02   5.9   0.0   0.0   0.0  31.7   62.4   0.0
  2 s2p8                                                                  1.82E+07   3.00E+07       1.64   0.1   0.0   0.9   0.0   0.9   98.1   2.6
  3 GB_18030_Parser+CMCompressed                                          1.82E+07   7.17E+06       0.39   0.3   0.0   2.3   0.0   2.4   94.9   0.6
  4 GB_18030_ExtractionMasks+CMCompressed                                 1.82E+07   8.84E+06       0.48   0.2   0.0   3.4   0.0   7.9   88.5   0.8
  5 ErrorMonitorKernel1                                                   1.82E+07   1.04E+07       0.57   0.2   0.0   2.2   0.0  10.8   86.7   0.9
  6 PopCountP256                                                          1.82E+07   1.48E+06       0.08   1.5   0.0   3.6   0.0  18.1   76.9   0.1
  7 PopCountP256                                                          1.82E+07   1.32E+06       0.07   2.6   0.0   2.9   0.0  12.2   82.3   0.1
  8 PopCountP256                                                          1.82E+07   1.37E+06       0.08   2.3   0.0   2.6   0.0  15.9   79.2   0.1
  9 PopCountP256                                                          1.82E+07   1.38E+06       0.08   2.6   0.0   2.3   0.0  15.8   79.4   0.1
 10 fieldCompress64__select_<i1>[1]@0:0_:_select_<i1>[8]@0:01234567_      1.82E+07   4.04E+06       0.22   0.8   0.0   0.0   0.0  10.9   88.3   0.3
 11 fieldCompress64__select_<i1>[1]@0:0_:_select_<i1>[8]@0:0123_          1.82E+07   2.48E+06       0.14   1.1   0.0   0.0   0.0   9.5   89.4   0.2
 12 streamCompress64_8                                                    1.82E+07   3.06E+07       1.68   0.1   0.0  23.6   0.0   1.4   74.9   2.6
 13 streamCompress64_4                                                    1.82E+07   1.91E+07       1.05   0.1   0.0  17.9   0.0   1.2   80.8   1.6
 14 fieldCompress64__select_<i1>[1]@0:0_:_select_<i1>[8]@0:0123_          1.82E+07   2.53E+06       0.14   1.6   0.0   0.0   0.0   9.4   89.0   0.2
 15 streamCompress64_4                                                    1.82E+07   1.89E+07       1.03   0.2   0.0  17.8   0.0   1.7   80.4   1.6
 16 fieldCompress64__select_<i1>[1]@0:0_:_select_<i1>[8]@0:01234567_      1.82E+07   4.00E+06       0.22   0.6   0.0   0.0   0.0   8.9   90.5   0.3
 17 streamCompress64_8                                                    1.82E+07   2.97E+07       1.63   0.1   0.0  22.7   0.0   0.9   76.3   2.5
 18 fieldCompress64__select_<i1>[1]@0:0_:_select_<i1>[1]@0:0_             1.82E+07   1.33E+06       0.07   1.7   0.0   0.0   0.0  17.1   81.2   0.1
 19 streamCompress64_1                                                    1.82E+07   1.04E+07       0.57   0.2   0.0   8.5   0.0   2.5   88.8   0.9
 20 GB_18030_InitializeASCII+CMCompressed                                 1.53E+07   4.91E+06       0.32   0.5   0.0   3.1   0.0   6.6   89.9   0.4
 21 GB_18030_DoubleByteIndex+CMCompressed                                 1.53E+07   1.22E+07       0.79   0.2   0.0   0.0   0.0   4.8   95.0   1.0
 22 GB_18030_DoubleByteRangeKernel0-2047+CMCompressed                     1.53E+07   2.43E+07       1.59   0.1   0.0   0.0   0.0   1.1   98.8   2.1
 23 GB_18030_DoubleByteRangeKernel2048-4095+CMCompressed                  1.53E+07   2.47E+07       1.61   0.1   0.0   0.0   0.0   0.9   99.0   2.1
 24 GB_18030_DoubleByteRangeKernel4096-6143+CMCompressed                  1.53E+07   2.06E+07       1.34   0.2   0.0   0.0   0.0   1.3   98.5   1.8
 25 GB_18030_DoubleByteRangeKernel6144-8191+CMCompressed                  1.53E+07   2.76E+07       1.80   0.1   0.0   0.0   0.0   0.8   99.2   2.4
 26 GB_18030_DoubleByteRangeKernel8192-10239+CMCompressed                 1.53E+07   1.21E+08       7.87   0.0   0.0   0.0   0.0   0.2   99.7  10.3
 27 GB_18030_DoubleByteRangeKernel10240-12287+CMCompressed                1.53E+07   1.61E+08      10.49   0.0   0.0   0.0   0.0   0.2   99.8  13.7
 28 GB_18030_DoubleByteRangeKernel12288-14335+CMCompressed                1.53E+07   1.48E+08       9.67   0.0   0.0   0.0   0.0   0.1   99.9  12.6
 29 GB_18030_DoubleByteRangeKernel14336-16383+CMCompressed                1.53E+07   1.80E+08      11.76   0.0   0.0   0.0   0.0   0.2   99.8  15.3
 30 GB_18030_DoubleByteRangeKernel16384-18431+CMCompressed                1.53E+07   3.91E+07       2.55   0.1   0.0   0.0   0.0   0.7   99.2   3.3
 31 GB_18030_DoubleByteRangeKernel18432-20479+CMCompressed                1.53E+07   2.03E+07       1.32   0.1   0.0   0.0   0.0   1.3   98.6   1.7
 32 GB_18030_DoubleByteRangeKernel20480-22527+CMCompressed                1.53E+07   1.73E+07       1.13   0.2   0.0   0.0   0.0   1.6   98.2   1.5
 33 GB_18030_DoubleByteRangeKernel22528-24575+CMCompressed                1.53E+07   1.33E+07       0.87   0.2   0.0   1.1   0.0   2.5   96.2   1.1
 34 GB_18030_FourByteLogic+CMCompressed                                   1.53E+07   2.67E+07       1.74   0.3   0.0   0.2   0.0   3.2   96.3   2.3
 35 u8depositMask                                                         1.53E+07   7.14E+06       0.47   0.3   0.0   0.0   0.0   4.1   95.6   0.6
 36 PopCountP256                                                          6.13E+07   4.12E+06       0.07   0.5   0.0   5.3   0.0   4.8   89.4   0.4
 37 streamCompress64_1                                                    6.13E+07   2.98E+07       0.49   0.1   0.0   0.2   0.0   0.6   99.1   2.5
 38 PopCountP256                                                          2.05E+07   1.97E+06       0.10   1.1   0.0   3.6   0.0  32.2   63.2   0.2
 39 streamExpand64_21:6                                                   2.05E+07   8.20E+06       0.40   0.3   0.0   0.0   0.0   5.6   94.1   0.7
 40 FieldDeposit64_6                                                      2.05E+07   3.58E+06       0.17   0.6   0.0   1.3   0.0   4.8   93.4   0.3
 41 UTF8_DepositMasks+CMCompressed                                        2.05E+07   3.41E+06       0.17   0.7   0.0   1.7   0.0  11.2   86.4   0.3
 42 PopCountP256                                                          2.05E+07   1.51E+06       0.07   1.5   0.0   2.7   0.0  13.9   81.8   0.1
 43 PopCountP256                                                          2.05E+07   1.52E+06       0.07   1.5   0.0   2.6   0.0  15.3   80.6   0.1
 44 PopCountP256                                                          2.05E+07   1.56E+06       0.08   1.4   0.0   2.3   0.0  14.9   81.4   0.1
 45 streamExpand64_21:3                                                   2.05E+07   6.35E+06       0.31   0.4   0.0   0.0   0.0   8.2   91.5   0.5
 46 FieldDeposit64_3                                                      2.05E+07   2.04E+06       0.10   1.1   0.0   0.0   0.0  10.2   88.7   0.2
 47 streamExpand64_21:6                                                   2.05E+07   7.73E+06       0.38   0.3   0.0   0.0   0.0   4.2   95.6   0.7
 48 FieldDeposit64_6                                                      2.05E+07   3.30E+06       0.16   0.7   0.0   0.0   0.0   5.5   93.8   0.3
 49 streamExpand64_21:6                                                   2.05E+07   7.50E+06       0.37   0.3   0.0   0.0   0.0   3.8   95.9   0.6
 50 FieldDeposit64_6                                                      2.05E+07   3.62E+06       0.18   1.3   0.0   0.0   0.0   6.5   92.2   0.3
 51 UTF8assembly+CMCompressed                                             2.05E+07   6.02E+06       0.29   0.7   0.0   0.0   0.0  10.0   89.2   0.5
 52 p2s_8                                                                 2.05E+07   8.15E+06       0.40   0.3   0.0   4.6   0.0   1.3   93.8   0.7
 53 stdout8                                                               2.05E+07   4.05E+07       1.97   0.1   0.0   0.0   0.0   0.6   99.3   3.4

bin/gb18030 -thread-num=1 -EnableCycleCounter  -segment-size=16384 > zh.u8  0.36s user 0.03s system 98% cpu 0.395 total

Revision git-2f0cd368

cameron@Robs-iMac-Pro build9 % time bin/gb18030 -thread-num=1 -EnableCycleCounter ../QA/gb18030/TestFiles/zhwikibooks-20141225-pages-articles.xml -segment-size=16384  >zh.u8
CYCLE COUNTER:

  # NAME                                                                     ITEMS     CYCLES       RATE  SYNC  BUFF  COPY  PIPE   EXEC     %
  1 mmap_source16384@8                                                    1.82E+07   2.77E+05       0.02   0.0   0.0   0.0  47.6   52.4   0.1
  2 s2p8                                                                  1.82E+07   2.74E+07       1.50   0.0   0.0   0.0   0.9   99.1   5.0
  3 GB_18030_Parser+CMCompressed                                          1.82E+07   6.80E+06       0.37   0.0   0.2   0.3   7.3   92.2   1.2
  4 GB_18030_ExtractionMasks+CMCompressed                                 1.82E+07   4.52E+06       0.25   0.0   0.0   0.0  13.8   86.2   0.8
  5 ErrorMonitorKernel1                                                   1.82E+07   4.59E+06       0.25   0.0   0.0   0.0  21.1   78.9   0.8
  6 fieldCompress64__select_<i1>[1]@0:0_:_select_<i1>[1]@0:0_             1.82E+07   1.02E+06       0.06   0.0   0.0   0.0  39.2   60.8   0.2
  7 fieldCompress64__select_<i1>[1]@0:0_:_select_<i1>[8]@0:01234567_      1.82E+07   2.97E+06       0.16   0.0   0.0   0.0  11.6   88.4   0.5
  8 fieldCompress64__select_<i1>[1]@0:0_:_select_<i1>[8]@0:0123_          1.82E+07   1.92E+06       0.11   0.0   0.0   0.0  17.4   82.6   0.4
  9 fieldCompress64__select_<i1>[1]@0:0_:_select_<i1>[8]@0:01234567_      1.82E+07   2.88E+06       0.16   0.0   0.0   0.0  10.7   89.3   0.5
 10 fieldCompress64__select_<i1>[1]@0:0_:_select_<i1>[8]@0:0123_          1.82E+07   1.63E+06       0.09   0.0   0.0   0.0  18.0   82.0   0.3
 11 PopCountP256                                                          1.82E+07   1.10E+06       0.06   0.0   0.0   0.3  14.7   85.0   0.2
 12 streamCompress64_1                                                    1.82E+07   6.99E+06       0.38   0.0   0.0   0.1  12.6   87.3   1.3
 13 streamCompress64_8                                                    1.82E+07   1.60E+07       0.88   0.0   0.0   0.1   5.6   94.3   2.9
 14 GB_18030_InitializeASCII+CMCompressed                                 1.53E+07   4.14E+06       0.27   0.0   0.0   0.0  10.3   89.7   0.8
 15 PopCountP256                                                          1.82E+07   1.20E+06       0.07   0.0   0.0   0.3  19.7   80.1   0.2
 16 streamCompress64_4                                                    1.82E+07   9.02E+06       0.49   0.0   0.0   0.1   8.9   91.0   1.6
 17 PopCountP256                                                          1.82E+07   1.15E+06       0.06   0.0   0.0   0.3  19.5   80.2   0.2
 18 streamCompress64_8                                                    1.82E+07   1.52E+07       0.83   0.0   0.0   0.1   5.9   94.1   2.8
 19 GB_18030_DoubleByteIndex+CMCompressed                                 1.53E+07   6.56E+06       0.43   0.0   0.0   0.0   7.9   92.1   1.2
 20 GB_18030_DoubleByteRangeKernel0-2047+CMCompressed                     1.53E+07   1.04E+07       0.68   0.0   0.0   0.0   2.5   97.5   1.9
 21 GB_18030_DoubleByteRangeKernel2048-4095+CMCompressed                  1.53E+07   1.01E+07       0.66   0.0   0.0   0.0   2.3   97.7   1.8
 22 GB_18030_DoubleByteRangeKernel4096-6143+CMCompressed                  1.53E+07   9.03E+06       0.59   0.0   0.0   0.0   2.4   97.6   1.6
 23 GB_18030_DoubleByteRangeKernel6144-8191+CMCompressed                  1.53E+07   9.89E+06       0.65   0.0   0.0   0.0   2.4   97.6   1.8
 24 GB_18030_DoubleByteRangeKernel8192-10239+CMCompressed                 1.53E+07   4.01E+07       2.62   0.0   0.0   0.0   0.6   99.4   7.3
 25 GB_18030_DoubleByteRangeKernel10240-12287+CMCompressed                1.53E+07   5.74E+07       3.74   0.0   0.0   0.0   0.4   99.6  10.4
 26 GB_18030_DoubleByteRangeKernel12288-14335+CMCompressed                1.53E+07   5.26E+07       3.43   0.0   0.0   0.0   0.5   99.5   9.6
 27 GB_18030_DoubleByteRangeKernel14336-16383+CMCompressed                1.53E+07   6.31E+07       4.12   0.0   0.0   0.0   0.4   99.6  11.5
 28 GB_18030_DoubleByteRangeKernel16384-18431+CMCompressed                1.53E+07   1.45E+07       0.95   0.0   0.0   0.0   1.8   98.2   2.6
 29 GB_18030_DoubleByteRangeKernel18432-20479+CMCompressed                1.53E+07   9.21E+06       0.60   0.0   0.0   0.0   3.0   97.0   1.7
 30 GB_18030_DoubleByteRangeKernel20480-22527+CMCompressed                1.53E+07   8.22E+06       0.54   0.0   0.0   0.0   2.9   97.1   1.5
 31 GB_18030_DoubleByteRangeKernel22528-24575+CMCompressed                1.53E+07   7.25E+06       0.47   0.0   0.0   0.0   4.6   95.4   1.3
 32 PopCountP256                                                          1.82E+07   1.28E+06       0.07   0.0   0.0   0.2  20.1   79.7   0.2
 33 streamCompress64_4                                                    1.82E+07   9.42E+06       0.52   0.0   0.0   0.2  11.5   88.3   1.7
 34 GB_18030_FourByteLogic+CMCompressed                                   1.53E+07   1.34E+07       0.87   0.0   0.0   1.4   8.6   90.0   2.4
 35 u8depositMask                                                         1.53E+07   6.46E+06       0.42   0.0   0.0   0.0   4.9   95.1   1.2
 36 PopCountP256                                                          6.13E+07   2.96E+06       0.05   0.0   0.0   0.2   9.0   90.8   0.5
 37 streamCompress64_1                                                    6.13E+07   1.91E+07       0.31   0.0   0.0   0.0   3.5   96.4   3.5
 38 UTF8_DepositMasks+CMCompressed                                        2.05E+07   2.37E+06       0.12   0.0   0.0   0.0  21.3   78.7   0.4
 39 PopCountP256                                                          2.05E+07   1.28E+06       0.06   0.0   0.0   0.2  25.3   74.5   0.2
 40 PopCountP256                                                          2.05E+07   1.33E+06       0.06   0.0   0.0   0.2  27.5   72.3   0.2
 41 PopCountP256                                                          2.05E+07   1.26E+06       0.06   0.0   0.0   0.2  26.7   73.1   0.2
 42 PopCountP256                                                          2.05E+07   1.28E+06       0.06   0.0   0.0   0.2  24.6   75.3   0.2
 43 streamExpand64_21:3                                                   2.05E+07   5.62E+06       0.27   0.0   0.0   0.0  12.1   87.9   1.0
 44 FieldDeposit64_3                                                      2.05E+07   1.56E+06       0.08   0.0   0.0   0.0  18.8   81.2   0.3
 45 streamExpand64_21:6                                                   2.05E+07   7.32E+06       0.36   0.0   0.0   0.0   9.2   90.8   1.3
 46 FieldDeposit64_6                                                      2.05E+07   2.70E+06       0.13   0.0   0.0   0.0  13.8   86.2   0.5
 47 streamExpand64_21:6                                                   2.05E+07   7.39E+06       0.36   0.0   0.0   0.0   9.2   90.8   1.3
 48 FieldDeposit64_6                                                      2.05E+07   2.73E+06       0.13   0.0   0.0   0.0  12.2   87.8   0.5
 49 streamExpand64_21:6                                                   2.05E+07   7.42E+06       0.36   0.0   0.0   0.0  10.1   89.9   1.3
 50 FieldDeposit64_6                                                      2.05E+07   2.81E+06       0.14   0.0   0.0   0.0  14.6   85.4   0.5
 51 UTF8assembly+CMCompressed                                             2.05E+07   5.34E+06       0.26   0.0   0.0   0.0  18.8   81.2   1.0
 52 p2s_8                                                                 2.05E+07   6.53E+06       0.32   0.0   0.0   0.0   3.2   96.8   1.2
 53 stdout8                                                               2.05E+07   3.29E+07       1.61   0.0   0.0   0.0   0.3   99.7   6.0

bin/gb18030 -thread-num=1 -EnableCycleCounter  -segment-size=16384 > zh.u8  0.17s user 0.02s system 98% cpu 0.193 total
To upload designs, you'll need to enable LFS and have an admin enable hashed storage. More information
Assignee
Assign to
None
Milestone
None
Assign milestone
Time tracking