TY - JOUR
T1 - Low-cost microarchitectural support for improved floating-point accuracy
AU - Dieter, William R.
AU - Kaveti, Akil
AU - Dietz, Henry G.
PY - 2007/1
Y1 - 2007/1
N2 - Some processors designed for consumer applications, such as Graphics Processing Units (GPUs) and the CELL processor, promise outstanding floating-point performance for scientific applications at commodity prices. However, IEEE single precision is the most precise floating-point data type these processors directly support in hardware. Pairs of native floating-point numbers can be used to represent a base result and a residual term to increase accuracy, but the resulting order of magnitude slowdown dramatically reduces the price/performance advantage of these systems. By adding a few simple microarchitectural features, acceptable accuracy can be obtained with relatively little performance penalty. To reduce the cost of native-pair arithmetic, a residual register is used to hold information that would normally have been discarded after each floating-point computation. The residual register dramatically simplifies the code, providing both lower latency and better instruction-level parallelism.
AB - Some processors designed for consumer applications, such as Graphics Processing Units (GPUs) and the CELL processor, promise outstanding floating-point performance for scientific applications at commodity prices. However, IEEE single precision is the most precise floating-point data type these processors directly support in hardware. Pairs of native floating-point numbers can be used to represent a base result and a residual term to increase accuracy, but the resulting order of magnitude slowdown dramatically reduces the price/performance advantage of these systems. By adding a few simple microarchitectural features, acceptable accuracy can be obtained with relatively little performance penalty. To reduce the cost of native-pair arithmetic, a residual register is used to hold information that would normally have been discarded after each floating-point computation. The residual register dramatically simplifies the code, providing both lower latency and better instruction-level parallelism.
UR - http://www.scopus.com/inward/record.url?scp=34547677745&partnerID=8YFLogxK
UR - http://www.scopus.com/inward/citedby.url?scp=34547677745&partnerID=8YFLogxK
U2 - 10.1109/L-CA.2007.1
DO - 10.1109/L-CA.2007.1
M3 - Article
AN - SCOPUS:34547677745
SN - 1556-6056
VL - 6
SP - 13
EP - 16
JO - IEEE Computer Architecture Letters
JF - IEEE Computer Architecture Letters
IS - 1
ER -