TY - GEN
T1 - Compiler techniques for fine-grain execution on workstation clusters using PAPERS
AU - Dietz, H. G.
AU - Cohen, W. E.
AU - Muhammad, T.
AU - Mattox, T. I.
PY - 1995
Y1 - 1995
N2 - Just a few years ago, parallel computers were tightly-coupled SIMD, VLTW, or MIMD machines. Now, they are clusters of workstations connected by communication networks yielding ever-higher bandwidth (e.g., Ethernet, FDDI, HiPPI, ATM). For these clusters, compiler research is centered on techniques for hiding huge synchronization and communication latencies, etc. — in general, trying to make parallel programs based on fine-grain aggregate operations fit an existing network execution model that is optimized for point-to-point block transfers. In contrast, we suggest that the network execution model can and should be altered to more directly support fine-grain aggregate operations. By augmenting workstation hardware with a simple barrier mechanism (PAPERS: Purdue's Adapter for Parallel Execution and Rapid Synchronization), and appropriate operating system hooks for its direct use from user processes, the user is given a variety of efficient aggregate operations and the compiler is provided with a more static (i.e., more predictable), lower-latency, target execution model. This paper centers on compiler techniques that use this new target model to achieve more efficient parallel execution: first, techniques that statically schedule aggregate operations across processors, second, techniques that implement SIMD and VLIW execution. This work was supported in part by the Office of Naval Research (ONR) under grant number N00014-91-J-4013 and by the National Science Foundation (NSF) under award number 9015696-CDA.
AB - Just a few years ago, parallel computers were tightly-coupled SIMD, VLTW, or MIMD machines. Now, they are clusters of workstations connected by communication networks yielding ever-higher bandwidth (e.g., Ethernet, FDDI, HiPPI, ATM). For these clusters, compiler research is centered on techniques for hiding huge synchronization and communication latencies, etc. — in general, trying to make parallel programs based on fine-grain aggregate operations fit an existing network execution model that is optimized for point-to-point block transfers. In contrast, we suggest that the network execution model can and should be altered to more directly support fine-grain aggregate operations. By augmenting workstation hardware with a simple barrier mechanism (PAPERS: Purdue's Adapter for Parallel Execution and Rapid Synchronization), and appropriate operating system hooks for its direct use from user processes, the user is given a variety of efficient aggregate operations and the compiler is provided with a more static (i.e., more predictable), lower-latency, target execution model. This paper centers on compiler techniques that use this new target model to achieve more efficient parallel execution: first, techniques that statically schedule aggregate operations across processors, second, techniques that implement SIMD and VLIW execution. This work was supported in part by the Office of Naval Research (ONR) under grant number N00014-91-J-4013 and by the National Science Foundation (NSF) under award number 9015696-CDA.
UR - http://www.scopus.com/inward/record.url?scp=84947721898&partnerID=8YFLogxK
UR - http://www.scopus.com/inward/citedby.url?scp=84947721898&partnerID=8YFLogxK
U2 - 10.1007/bfb0025869
DO - 10.1007/bfb0025869
M3 - Conference contribution
AN - SCOPUS:84947721898
SN - 354058868X
SN - 9783540588689
T3 - Lecture Notes in Computer Science (including subseries Lecture Notes in Artificial Intelligence and Lecture Notes in Bioinformatics)
SP - 31
EP - 45
BT - Languages and Compilers for Parallel Computing - 7th International Workshop, 1994, Proceedings
A2 - Pingali, Keshav
A2 - Banerjee, Utpal
A2 - Gelernter, David
A2 - Nicolau, Alex
A2 - Padua, David
T2 - 7th International Workshop on Languages and Compilers for Parallel Computing, 1994
Y2 - 8 August 1994 through 10 August 1994
ER -