I do see a difference in the performance of the assembly function crc_t10dif_pcl
between two difference kernels, one is 5.1.20 and the newer one in 5.5.6.
Here is the function in:
- 5.1.20: https://elixir.bootlin.com/linux/v5.1.20/source/arch/x86/crypto/crct10dif-pcl-asm_64.S#L98
- 5.5.6: https://elixir.bootlin.com/linux/v5.5.6/source/arch/x86/crypto/crct10dif-pcl-asm_64.S#L98
Now, I do not see the any significant difference between the two.
But I see that in v5.1.20 it finishes in ~400 microseconds, and in ~100 microseconds in v5.5.6.
This is on an AMD EPYC 7351 16-Core Processor
for both cases.
I am not sure where is that coming from?
Is there a cpu feature that could explain this? or something else?