Skip to content

GitLab

  • Projects
  • Groups
  • Snippets
  • Help
    • Loading...
  • Help
    • Help
    • Support
    • Community forum
    • Submit feedback
    • Contribute to GitLab
  • Sign in
P parabix-devel
  • Project overview
    • Project overview
    • Details
    • Activity
    • Releases
  • Repository
    • Repository
    • Files
    • Commits
    • Branches
    • Tags
    • Contributors
    • Graph
    • Compare
  • Issues 9
    • Issues 9
    • List
    • Boards
    • Labels
    • Service Desk
    • Milestones
  • Merge requests 2
    • Merge requests 2
  • CI/CD
    • CI/CD
    • Pipelines
    • Jobs
    • Schedules
  • Operations
    • Operations
    • Incidents
    • Environments
  • Packages & Registries
    • Packages & Registries
    • Container Registry
  • Analytics
    • Analytics
    • CI/CD
    • Repository
    • Value Stream
  • Wiki
    • Wiki
  • Snippets
    • Snippets
  • Members
    • Members
  • Activity
  • Graph
  • Create a new issue
  • Jobs
  • Commits
  • Issue Boards
Collapse sidebar
  • cameron
  • parabix-devel
  • Issues
  • #52

Closed
Open
Created Feb 11, 2024 by cameron@cameronMaintainer

Performance issue with ReturnedBuffer

Branch codepoint-properties at commit #f203c6e2 has a modified xch which generates output to a ReturnedBuffer() rather than stdout.

However, this seems to be a performance bottleneck with excessive synchronization time required in p2s:

cameron@cs-osl-08:~/parabix-devel/build12$ sudo perf stat bin/xch -prop=slc ~/Wikibooks/wiki-books-all.xml -EnableCycleCounter >wislc1
CYCLE COUNTER:

  # NAME                                                                  ITEMS      CYCLES RATE  SYNC  PART  EXPD  COPY  PIPE  EXEC     %
  1 mmap_source16384@8                                                649901238     9971611  0.0  41.6   0.0   0.0   0.0   0.0  27.3   0.1 +-  0.0
  2 s2p8                                                              649901238   546614405  0.8   0.3   0.0   0.0   7.5   0.0  90.9   3.4 +-  0.0
  3 cc_7f8e068d851d690f52bf8d67c46df1bb0ffdeee1+CMCompressed          649901238    87582778  0.1   2.7   0.0   0.1   8.4   0.0  84.3   0.6 +-  0.0
  4 adjust_bixnum1x1+CMCompressed                                     649901238   183158518  0.3   1.1   0.1   0.0   0.0   0.0  96.6   1.2 +-  0.0
  5 unitInsertionExtractionMasks1x1_Before                            649901238    28898199  0.0   6.0   0.0   0.0   0.0   0.0  85.8   0.2 +-  0.0
  6 PopCountP8192                                                    1299802476    20029050  0.0   3.0   0.0   0.0   0.0   0.0  85.9   0.1 +-  0.0
  7 FilterByMask64__select_<i1>[1]@0:0_:_select_<i1>[1]@0:0_         1299802476   239533649  0.2   1.9   0.0   0.0   3.7   0.0  92.8   1.5 +-  0.0
  8 PopCountP2048                                                     649901242    18436352  0.0   5.0   0.3   0.0   0.1   0.0  73.8   0.1 +-  0.0
  9 streamExpand4:64_8:8                                              649901242   173845768  0.3   0.8   0.0   0.0   0.5   0.0  96.2   1.1 +-  0.0
 10 FieldDeposit64_8                                                  649901242   302462530  0.5   2.1   0.0   0.0  18.5   0.0  77.8   1.9 +-  0.0
 11 cc_8c7de60ab46889c81d5f889be46cb35d44eb1bb0+CMCompressed          649901242   153222112  0.2   3.2   0.0   0.0   6.4   0.0  88.0   1.0 +-  0.0
 12 cc_d7a768472fc15029ed17c5abad56ff65b4dc4693+CMCompressed          649901242   651987912  1.0  14.2   0.0   0.0  16.6   0.0  68.4   4.1 +-  0.0
 13 UTF8_BytePosition+CMCompressed                                    649901242   201231964  0.3   0.6   0.0   0.0   4.7   0.0  91.9   1.3 +-  0.0
 14 u8_delmask2x1+CMCompressed                                        649901242   189572156  0.3   1.6   0.0   0.0   2.7   0.0  93.6   1.2 +-  0.0
 15 PopCountP2048                                                     649901242    18526422  0.0   5.9   0.0   0.0   2.1   0.0  76.9   0.1 +-  0.0
 16 UTF8_Target_Class3x1+ins+del+CMCompressed                         649901242   177515785  0.3   0.3   0.0   0.0   0.9   0.0  95.9   1.1 +-  0.0
 17 u8_transform_bits_16x1+ins+del+CMCompressed                       649901242   572807532  0.9   3.3   0.0   0.0   0.0   0.0  95.7   3.6 +-  0.0
 18 fieldCompress64__select_<i1>[1]@0:0_:_select_<i1>[8]@0:01234567_  649901242   231152280  0.4   0.4   0.0   0.0   0.0   0.0  97.7   1.5 +-  0.0
 19 streamCompress64_8                                                649901242   294243971  0.5   0.2   0.0   0.0   3.6   0.0  94.0   1.8 +-  0.0
 20 p2s_8                                                             649889520 11796413425 18.2  65.2   0.0  32.6   0.0   0.0   2.1  74.1 +-  0.1

                                                                                 TOTAL:  49.3   0.0  24.2   1.6   0.0  24.2  99.9
xlated buffer length: 649889520

 Performance counter stats for 'bin/xch -prop=slc /home/cameron/Wikibooks/wiki-books-all.xml -EnableCycleCounter':

          5,661.78 msec task-clock                       #    2.816 CPUs utilized             
                67      context-switches                 #   11.834 /sec                      
                 0      cpu-migrations                   #    0.000 /sec                      
           487,093      page-faults                      #   86.032 K/sec                     
    15,706,883,693      cycles                           #    2.774 GHz                       
    20,438,537,075      instructions                     #    1.30  insn per cycle            
     4,525,549,271      branches                         #  799.315 M/sec                     
         2,807,389      branch-misses                    #    0.06% of all branches           

       2.010657317 seconds time elapsed

       4.601109000 seconds user
       1.060255000 seconds sys
To upload designs, you'll need to enable LFS and have an admin enable hashed storage. More information
Assignee
Assign to
None
Milestone
None
Assign milestone
Time tracking