Skip to search boxSkip to navigationSkip to main content

Cache-oblivious matrix algorithms in the age of multicores and many cores

  • Carsten Trinitis
    ,
  • Alexander Heinecke
Research Output: Contribution to journal Article Peer-review

Abstract

This article highlights the issue of upcoming wider single-instruction, multiple-data units as well as steadily increasing core counts on contemporary and future processor architectures. We present the recent port to and latest results of cache-oblivious algorithms and implementations of our TifaMMy code on four architectures: SGI's UltraViolet distributed shared-memory machine, Intel's latest x86 architecture code-named Sandy Bridge, AMD's new Bulldozer architecture, and Intel's future Many Integrated Core architecture. TifaMMy's matrix multiplication and LU decomposition routines have been adapted and tuned with regard to these architectures. Results are discussed and compared with vendors’ architecture-specific and optimized libraries, Math Kernel Library and AMD Core Math Library, for both a standard C++ version with vectorization compiler switches and TifaMMy's highly optimized vector intrinsics version. We provide insights into architectural properties and comment on the feasibility of heterogeneous cores and accelerators, namely graphics processing units. Besides bare-metal performance, the test platforms’ ease of use is analyzed in detail, and the portability of our approach to new and upcoming silicon is discussed with regard to required effort on code change abstraction levels.

Publication Information

Output type

Research Output: Contribution to journal Article Peer-review

Original language

English

Pages from-to (Number of pages)

Pages 2215-2234

Journal (Volume, Issue Number)

Concurrency and Computation: Practice and Experience (Volume 27, Issue 9)

Publication milestones

  • Published - 01/01/2012

Publication status

Published - 01/01/2012

ISSN

1532-0626

External Publication IDs

  • handle.net: 10547/275816
  • Scopus: 84929893702

Publication metrics