
元 label if the compare was less-than or equal-to. 18 cmpl $999999, -4(%rbp): compare of size long the constant 999,999 to the earlier variable.16 addl $1, -4(%rpb): add of size long the constant 1 to the memory address pointed to by the rpb register minus 4 bytes (this will be the local variable, i).Now compile to assembly (with -O0, for fewer optimizations), and look for the loop: This instance happens to be a Xen guest on AWS EC2.įirst, instead of writing assembly from scratch, lets take a shortcut. An approach I trust much more is to simply read cycle counters from the CPU Performance Monitoring Unit (eg, using perf), but I have limited or no access to these in virtualized environments. Since I'm running this on Linux I could just read /proc/cpuinfo, but I don't completely trust it in virtualized environments (which can fake the cpuid). The result provides a baseline before more complex CPU benchmarks are tried. It's intended to be simple, minimizing variation caused by cache misses, stall cycles, and branch misprediction. It is a procedure for measuring CPU clock speed using an unrolled No-Operation (NOP) loop. This is for CPU benchmarking, and I use it along with other tools (eg, sysbench, lmbench), and an active benchmarking approach. It's usually impractical to do this, but I'll share one case where you can.


How else do you know what the CPU is actually doing? My preference is to write benchmark tools myself, in assembly, and then to dissassemble the compiled machine code for verification. Systems Performance: Enterprise and the Cloud, 2nd Edition

#Benchmark cpu how to
How To Add eBPF Observability To Your ProductīPF binaries: BTF, CO-RE, and the future of BPF perf tools USENIX LISA2021 Computing Performance: On the Horizon
