Although barrier synchronization has long been considered a useful construct for parallel programming, it has generally been either layered on top of a communication system or used as a completely independent mechanism. Instead, we propose that all communication be made a side-effect of barrier synchronization. This is done by extending the barrier synchronization unit to collect a datum from each processor, compute an aggregate function, and return the corresponding result to each processor. This paper describes a scalable prototype implementation of PAPERS (Purdue's Adapter for Parallel Execution and Rapid Synchronization). Despite the fact that the prototype is implemented as very simple TTL hardware connecting conventional workstations, measured performance on fine-grain parallel communication operations is far superior to that obtained using conventional workstation networks. It is comparable to the performance of commercially available supercomputers.
|Title of host publication||Architecture|
|Number of pages||4|
|State||Published - 1996|
|Event||25th International Conference on Parallel Processing, ICPP 1996 - Ithaca, United States|
Duration: Aug 12 1996 → Aug 16 1996
|Name||Proceedings of the International Conference on Parallel Processing|
|Conference||25th International Conference on Parallel Processing, ICPP 1996|
|Period||8/12/96 → 8/16/96|
Bibliographical noteFunding Information:
This work was supported in part by ONR Grant No. N0001-91-J-4013 and NSF Grant No. CDA-9015696.
© 1996 IEEE.
ASJC Scopus subject areas
- Mathematics (all)
- Hardware and Architecture