Welcome to Intel® Software Network Quick Login | Join | Help |
Search in Intel® Software Network Forums
in Go

Measuring cache misses.

Last post 07-24-2008, 2:30 PM by tim18. 1 replies.
Sort Posts: Previous Next
 07-24-2008, 9:30 AM 30259950  

Measuring cache misses.

I am analyzing a program that spends most of it's time doing linear algebra. I suspect the algorithm is inefficient in the way it accesses matrices. I have sampled the program measuring L1 and L2 cache load misses. Are these the appropriate events to measure? I also see that L2 and L3 read misses can be measured. How are these different?

The tuning assistant shows that the program spends 10% and 25% of its time on the L1 and L2 misses respectively. This seems very large. Never having done a measurement like this, I am not sure what to expect.

 
 07-24-2008, 2:30 PM 30259973 in reply to 30259950  

Re: Measuring cache misses.

Currently, only enterprise MP platforms (usually 4 CPU sockets) have L3.

If you have  many L1 and L2 misses, TLB misses may be more (or less) important to performance.  We can't guess your CPU model.  If you have an option to select cache misses retired, those are the ones which count.  Some of the events may be available separately for instructions and data, or by cache lines.

High data cache miss rates are possible, if you didn't organize your program to use sequential memory access. You would want to do that anyway, to enable vectorization.  Intel Fortran will make a few of those changes automatically, at -O3.

Tuning assistant performance assessments are rough, could be off by a factor of 2.

 
View as RSS news feed in XML

Shortcuts


Tags For This Post

...

Community Tags

...