Navigating the unexpected realities of big data transfers in a cloud-based world

Sergio Rivera, Mami Hayashida, Jacob Chappell, James Griffioen, Pinyi Shi, Yongwook Song, Zongming Fei, Bhushan Chitre, Lowell Pike, Charles Carpenter, Hussamuddin Nasir

Research output: Chapter in Book/Report/Conference proceedingConference contributionpeer-review

Abstract

The emergence of big data has created new challenges for researchers transmitting big data sets across campus networks to local (HPC) cloud resources, or over wide area networks to public cloud services. Unlike conventional HPC systems where the network is carefully architected (e.g., a high speed local interconnect, or a wide area connection between Data Transfer Nodes), today's big data communication often occurs over shared network infrastructures with many external and uncontrolled factors influencing performance. This paper describes our efforts to understand and characterize the performance of various big data transfer tools such as rclone, cyberduck, and other provider-specific CLI tools when moving data to/from public and private cloud resources. We analyze the various parameter settings available on each of these tools and their impact on performance. Our experimental results give insights into the performance of cloud providers and transfer tools, and provide guidance for parameter settings when using cloud transfer tools. We also explore performance when coming from HPC DTN nodes as well as researcher machines located deep in the campus network, and show that emerging SDN approaches such as the VIP Lanes he campus network, and show that emerging SDN approaches such as the VIP Lanes system can deliver excellent performance even from researchers' machines.

Original languageEnglish
Title of host publicationPractice and Experience in Advanced Research Computing 2018
Subtitle of host publicationSeamless Creativity, PEARC 2018
DOIs
StatePublished - Jul 22 2018
Event2018 Practice and Experience in Advanced Research Computing Conference: Seamless Creativity, PEARC 2018 - Pittsburgh, United States
Duration: Jul 22 2017Jul 26 2017

Publication series

NameACM International Conference Proceeding Series

Conference

Conference2018 Practice and Experience in Advanced Research Computing Conference: Seamless Creativity, PEARC 2018
Country/TerritoryUnited States
CityPittsburgh
Period7/22/177/26/17

Bibliographical note

Publisher Copyright:
© 2018 Association for Computing Machinery.

Keywords

  • Big Data Flows
  • Data Transfer Tools
  • Software-Defined Networks

ASJC Scopus subject areas

  • Software
  • Human-Computer Interaction
  • Computer Vision and Pattern Recognition
  • Computer Networks and Communications

Fingerprint

Dive into the research topics of 'Navigating the unexpected realities of big data transfers in a cloud-based world'. Together they form a unique fingerprint.

Cite this