I'm trying to measure performance of my code in linux kernel with pmu. First of all I want to test pmu therefore created simple loop of couple operations in kernel. I placed it under spin lock with disabled interrupts so my test code can't be preempted. Then I printed cycle counter to check how much CPU cycles this loop takes. But I see very different values at each print: 100, 500, 1000, 200, ... My question is: why I see so different values every time? PS: in countrary to cycle counter, pmu's instruction counter is stable and I see same values every time. I also tried to use arm timer but it also showing different values similar to pmu's cycle counter. Here is how I use ARM timer to measure performance:
unsigned long long ticks_start, ticks_end;
int i = 0, j;
unsigned long flags;
spin_lock_irqsave(&lock, flags);
while (i++ < 100) {
j = 0;
asm volatile("mrs %0, CNTPCT_EL0" : "=r" (ticks_start));
while (j++ < 10000) {
asm volatile ("nop");
}
asm volatile("mrs %0, CNTPCT_EL0" : "=r" (ticks_end));
printk("ticks %d are: %llu\n", i, ticks_end - ticks_start);
}
spin_unlock_irqrestore(&lock, flags);
and output on real device are (cortex A-57):
...
ticks 31 are: 2287
ticks 32 are: 2287
ticks 33 are: 2287
ticks 34 are: 1984
ticks 35 are: 457
ticks 36 are: 1604
ticks 37 are: 2287
...