Hadoop-EDF: Large-scale Distributed Processing of Electrophysiological Signal Data in Hadoop MapReduce

Yuanyuan Wu, Xiaojin Li, Jinze Liu, Licong Cui

Research output: Chapter in Book/Report/Conference proceedingConference contributionpeer-review

5 Scopus citations

Abstract

Rapidly growing volume of electrophysiological signals has been generated for clinical research in neurological disorders. European Data Format (EDF) is a standard format for storing electrophysiological signals. However, the bottleneck of existing signal analysis tools for handling large-scale datasets is the sequential way of loading large EDF files before performing signal analyses. To overcome this, we develop Hadoop-EDF, a distributed signal processing tool to load EDF data in a parallel manner using Hadoop MapReduce. Hadoop-EDF uses a robust data partition algorithm making EDF data parallelly processable. We evaluate Hadoop-EDF's scalability and performance by leveraging two datasets from the National Sleep Research Resource and running experiments on Amazon Web Service clusters. The performance of Hadoop-EDF on a 20-node cluster achieved about 26 times and 47 times faster than the sequential processing of 200 small-size files and 200 large-size files, respectively. The results demonstrate that Hadoop-EDF is more suitable and effective in processing large EDF files.

Original languageEnglish
Title of host publicationProceedings - 2019 IEEE International Conference on Bioinformatics and Biomedicine, BIBM 2019
EditorsIllhoi Yoo, Jinbo Bi, Xiaohua Tony Hu
Pages2265-2271
Number of pages7
ISBN (Electronic)9781728118673
DOIs
StatePublished - Nov 2019
Event2019 IEEE International Conference on Bioinformatics and Biomedicine, BIBM 2019 - San Diego, United States
Duration: Nov 18 2019Nov 21 2019

Publication series

NameProceedings - 2019 IEEE International Conference on Bioinformatics and Biomedicine, BIBM 2019

Conference

Conference2019 IEEE International Conference on Bioinformatics and Biomedicine, BIBM 2019
Country/TerritoryUnited States
CitySan Diego
Period11/18/1911/21/19

Bibliographical note

Publisher Copyright:
© 2019 IEEE.

Funding

This work was supported by the US National Institutes of Health under grants R24HL114473 and U01NS090408. Correspondence: [email protected]

FundersFunder number
National Institutes of Health (NIH)U01NS090408, R24HL114473
National Institutes of Health (NIH)

    Keywords

    • Cloud Computing
    • Electrophysiological Signals
    • European Data Format
    • Hadoop MapReduce

    ASJC Scopus subject areas

    • Biochemistry
    • Biotechnology
    • Molecular Medicine
    • Modeling and Simulation
    • Health Informatics
    • Pharmacology (medical)
    • Public Health, Environmental and Occupational Health

    Fingerprint

    Dive into the research topics of 'Hadoop-EDF: Large-scale Distributed Processing of Electrophysiological Signal Data in Hadoop MapReduce'. Together they form a unique fingerprint.

    Cite this