Abstract
University research computing centers are increasingly faced with the need to support applications that are better suited for cloud infrastructure than HPC infrastructure. A common approach is to shoehorn cloud-based applications onto the university’s existing HPC system, which has been done with varying levels of success. Another approach as been to create stand-alone HPC systems and private cloud systems, resulting in ineffective use of resources. A more recent approach has been to use hybrid systems where the HPC system “bursts” excess jobs to private cloud nodes configured as bare-metal nodes built from the same (expensive) hardware as the HPC system. This paper explores another model, namely the use of private cloud infrastructure (built from inexpensive commodity networks and storage systems) to host both HPC jobs and VMs simultaneously Utilizing VMs allows these emerging applications to leverage cloud frameworks specifically designed for them (e.g., OpenStack, Kubernetes, Mesos, Hadoop, and Spark), while at the same time effectively supporting a growing percentage of the HPC jobs (e.g., single node jobs, and embarrassingly parallel jobs). Because the system can be constructed from commodity cloud networks and storage, it makes cost-effective use of the resources as opposed to HPC systems used to run jobs that do not use (waste) its expensive resources. To demonstrate the advantages of using cloud infrastructure for both cloud applications and HPC applications, we describe a system that can dynamically launch OpenHPC systems on commodity OpenStack infrastructure. Moreover, users can use the system to deploy “personal” OpenHPC clusters, customized to their application’s needs (e.g., number of nodes, cores per node, memory per node). We have used the system to effectively run OpenHPC workloads on a cluster of large memory OpenStack nodes, allowing users to create, for example, a large memory HPC-style cluster of 500 GB nodes running OpenHPC, and a cluster of 1TB VMs operating simultaneously. Performance degradation due to virtualization has been insignificant, particularly when compared to the advantages of being able to use optimized frameworks running on cost-effective hardware.
Original language | English |
---|---|
Title of host publication | Proceedings of the Practice and Experience in Advanced Research Computing |
Subtitle of host publication | Rise of the Machines (Learning), PEARC 2019 |
ISBN (Electronic) | 9781450372275 |
DOIs | |
State | Published - Jul 28 2019 |
Event | 2019 Conference on Practice and Experience in Advanced Research Computing: Rise of the Machines (Learning), PEARC 2019 - Chicago, United States Duration: Jul 28 2019 → Aug 1 2019 |
Publication series
Name | ACM International Conference Proceeding Series |
---|
Conference
Conference | 2019 Conference on Practice and Experience in Advanced Research Computing: Rise of the Machines (Learning), PEARC 2019 |
---|---|
Country/Territory | United States |
City | Chicago |
Period | 7/28/19 → 8/1/19 |
Bibliographical note
Publisher Copyright:© 2019 Association for Computing Machinery.
ASJC Scopus subject areas
- Software
- Human-Computer Interaction
- Computer Vision and Pattern Recognition
- Computer Networks and Communications