Improve pack
Reimplements hsimd_packh
and hsimd_packl
in terms of the uzp1
and uzp2
NEON instructions. The commit history shows some other approaches I tried. It appears that the uzp
implementation performs the best.
I think the saturating versions of hsimd_packl
(hsimd_packus
and hsimd_packss
) should also be given a once-over, but I don't believe we'll be able to use uzp
instructions as they don't saturate the values. Instead, we should use uqxtn
and sqxtn
. The existing implmentation of hsimd_packus
does this already, but it could be expanded for fw >= 16 && fw <= 64
. I will make a note of that when I make a list of functions to update, but thought it was worth mentioning here.