TY - JOUR
T1 - Code optimizations for complex microprocessors applied to CFD software
AU - Hauser, Th
AU - Mattox, T. I.
AU - Lebeau, R. P.
AU - Dietz, H. G.
AU - Huang, P. G.
PY - 2003
Y1 - 2003
N2 - Improving large scale, time-dependent numerical simulation of the Navier-Stokes equations is critical for the future of computational fluid dynamics (CFD) in engineering applications. Unfortunately, these computations require massive, and generally expensive, computing resources. With the continuing advances in commodity computer hardware, an alternative approach to computationally expensive CFD computations is emerging in the form of PC clusters. However, to take advantage of clusters most CFD programs require extensive modifications so that they run efficiently on cache-based microprocessor systems. This paper presents techniques and tools that we have developed and used to optimize Navier-Stokes solvers on a single node of a PC cluster through the example CFD code DNSTool. After describing DNSTool, the paper demonstrates how this code is tuned to improve performance through profiling the computational cost of each of the subroutines, adapting the code for cache-based memory systems, and including SWAR (SIMD within a register) based routines. The effect of these improvements is to halve the computational cost on a single node, which in turn significantly increases the performance of the code on a PC cluster.
AB - Improving large scale, time-dependent numerical simulation of the Navier-Stokes equations is critical for the future of computational fluid dynamics (CFD) in engineering applications. Unfortunately, these computations require massive, and generally expensive, computing resources. With the continuing advances in commodity computer hardware, an alternative approach to computationally expensive CFD computations is emerging in the form of PC clusters. However, to take advantage of clusters most CFD programs require extensive modifications so that they run efficiently on cache-based microprocessor systems. This paper presents techniques and tools that we have developed and used to optimize Navier-Stokes solvers on a single node of a PC cluster through the example CFD code DNSTool. After describing DNSTool, the paper demonstrates how this code is tuned to improve performance through profiling the computational cost of each of the subroutines, adapting the code for cache-based memory systems, and including SWAR (SIMD within a register) based routines. The effect of these improvements is to halve the computational cost on a single node, which in turn significantly increases the performance of the code on a PC cluster.
KW - Cache-based microprocessors
KW - Fluid dynamics
KW - Performance analysis
UR - http://www.scopus.com/inward/record.url?scp=4043088395&partnerID=8YFLogxK
UR - http://www.scopus.com/inward/citedby.url?scp=4043088395&partnerID=8YFLogxK
U2 - 10.1137/S1064827502410530
DO - 10.1137/S1064827502410530
M3 - Article
AN - SCOPUS:4043088395
SN - 1064-8275
VL - 25
SP - 1461
EP - 1477
JO - SIAM Journal on Scientific Computing
JF - SIAM Journal on Scientific Computing
IS - 4
ER -