Abstract
An important issue when designing numerical code in High Performance Computing is cache optimisation in order to exploit the performance potential of a given target architecture. This includes techniques to improve memory access locality as well as prefetching. Inherent algorithm constrains often limit the first approach, which typically uses a blocking technique. While there exist automatic prefetching mechanisms in hardware and/or compilers, they can not complement blocking with additional prefetching. We provide an infrastructure for off-loading application controlled prefetching on a chip multiprocessor, allowing to further improve numerical code already optimised by standard cache optimisation. Clear benefits are shown for real workloads on existing hardware.
| Original language | English |
|---|---|
| Pages (from-to) | 22-28 |
| Journal | International Journal of Computational Science and Engineering |
| Volume | 4 |
| Issue number | 1 |
| DOIs | |
| Publication status | Published - 1 Nov 2008 |
Keywords
- cache optimisation
Fingerprint
Dive into the research topics of 'Off-loading application controlled data prefetching in numerical codes for multi-core processors'. Together they form a unique fingerprint.Cite this
- APA
- Author
- BIBTEX
- Harvard
- Standard
- RIS
- Vancouver