Hello,
I am analyzing “memory bound” metric in my code with Vtune. According to "Intel® 64 and IA-32 Architectures Optimization Reference Manual-B.3.2.3":
%L2 Bound =(CYCLE_ACTIVITY.STALLS_L1D_PENDING - CYCLE_ACTIVITY.STALLS_L2_PENDING)/CLOCKS
But in my Vtune results, CYCLE_ACTIVITY.STALLS_L1D_PENDING is smaller than CYCLE_ACTIVITY.STALLS_L2_PENDING, why?