Abstract
Some processors designed for consumer applications, such as Graphics Processing Units (GPUs) and the CELL processor, promise outstanding floating-point performance for scientific applications at commodity prices. However, IEEE single precision is the most precise floating-point data type these processors directly support in hardware. Pairs of native floating-point numbers can be used to represent a base result and a residual term to increase accuracy, but the resulting order of magnitude slowdown dramatically reduces the price/performance advantage of these systems. By adding a few simple microarchitectural features, acceptable accuracy can be obtained with relatively little performance penalty. To reduce the cost of native-pair arithmetic, a residual register is used to hold information that would normally have been discarded after each floating-point computation. The residual register dramatically simplifies the code, providing both lower latency and better instruction-level parallelism.
| Original language | English |
|---|---|
| Pages (from-to) | 13-16 |
| Number of pages | 4 |
| Journal | IEEE Computer Architecture Letters |
| Volume | 6 |
| Issue number | 1 |
| DOIs | |
| State | Published - Jan 2007 |
ASJC Scopus subject areas
- Hardware and Architecture