Hadoop-EDF: Large-scale Distributed Processing of Electrophysiological Signal Data in Hadoop MapReduce

Yuanyuan Wu, Xiaojin Li, Jinze Liu, Licong Cui

Research output: Chapter in Book/Report/Conference proceedingConference contributionpeer-review

4 Scopus citations

Abstract

Rapidly growing volume of electrophysiological signals has been generated for clinical research in neurological disorders. European Data Format (EDF) is a standard format for storing electrophysiological signals. However, the bottleneck of existing signal analysis tools for handling large-scale datasets is the sequential way of loading large EDF files before performing signal analyses. To overcome this, we develop Hadoop-EDF, a distributed signal processing tool to load EDF data in a parallel manner using Hadoop MapReduce. Hadoop-EDF uses a robust data partition algorithm making EDF data parallelly processable. We evaluate Hadoop-EDF's scalability and performance by leveraging two datasets from the National Sleep Research Resource and running experiments on Amazon Web Service clusters. The performance of Hadoop-EDF on a 20-node cluster achieved about 26 times and 47 times faster than the sequential processing of 200 small-size files and 200 large-size files, respectively. The results demonstrate that Hadoop-EDF is more suitable and effective in processing large EDF files.

Original languageEnglish
Title of host publicationProceedings - 2019 IEEE International Conference on Bioinformatics and Biomedicine, BIBM 2019
EditorsIllhoi Yoo, Jinbo Bi, Xiaohua Tony Hu
Pages2265-2271
Number of pages7
ISBN (Electronic)9781728118673
DOIs
StatePublished - Nov 2019
Event2019 IEEE International Conference on Bioinformatics and Biomedicine, BIBM 2019 - San Diego, United States
Duration: Nov 18 2019Nov 21 2019

Publication series

NameProceedings - 2019 IEEE International Conference on Bioinformatics and Biomedicine, BIBM 2019

Conference

Conference2019 IEEE International Conference on Bioinformatics and Biomedicine, BIBM 2019
Country/TerritoryUnited States
CitySan Diego
Period11/18/1911/21/19

Bibliographical note

Funding Information:
This work was supported by the US National Institutes of Health under grants R24HL114473 and U01NS090408. Correspondence: li-cong.cui@uth.tmc.edu

Publisher Copyright:
© 2019 IEEE.

Keywords

  • Cloud Computing
  • Electrophysiological Signals
  • European Data Format
  • Hadoop MapReduce

ASJC Scopus subject areas

  • Biochemistry
  • Biotechnology
  • Molecular Medicine
  • Modeling and Simulation
  • Health Informatics
  • Pharmacology (medical)
  • Public Health, Environmental and Occupational Health

Fingerprint

Dive into the research topics of 'Hadoop-EDF: Large-scale Distributed Processing of Electrophysiological Signal Data in Hadoop MapReduce'. Together they form a unique fingerprint.

Cite this