Abstract
Rapidly growing volume of electrophysiological signals has been generated for clinical research in neurological disorders. European Data Format (EDF) is a standard format for storing electrophysiological signals. However, the bottleneck of existing signal analysis tools for handling large-scale datasets is the sequential way of loading large EDF files before performing signal analyses. To overcome this, we develop Hadoop-EDF, a distributed signal processing tool to load EDF data in a parallel manner using Hadoop MapReduce. Hadoop-EDF uses a robust data partition algorithm making EDF data parallelly processable. We evaluate Hadoop-EDF's scalability and performance by leveraging two datasets from the National Sleep Research Resource and running experiments on Amazon Web Service clusters. The performance of Hadoop-EDF on a 20-node cluster achieved about 26 times and 47 times faster than the sequential processing of 200 small-size files and 200 large-size files, respectively. The results demonstrate that Hadoop-EDF is more suitable and effective in processing large EDF files.
Original language | English |
---|---|
Title of host publication | Proceedings - 2019 IEEE International Conference on Bioinformatics and Biomedicine, BIBM 2019 |
Editors | Illhoi Yoo, Jinbo Bi, Xiaohua Tony Hu |
Pages | 2265-2271 |
Number of pages | 7 |
ISBN (Electronic) | 9781728118673 |
DOIs | |
State | Published - Nov 2019 |
Event | 2019 IEEE International Conference on Bioinformatics and Biomedicine, BIBM 2019 - San Diego, United States Duration: Nov 18 2019 → Nov 21 2019 |
Publication series
Name | Proceedings - 2019 IEEE International Conference on Bioinformatics and Biomedicine, BIBM 2019 |
---|
Conference
Conference | 2019 IEEE International Conference on Bioinformatics and Biomedicine, BIBM 2019 |
---|---|
Country/Territory | United States |
City | San Diego |
Period | 11/18/19 → 11/21/19 |
Bibliographical note
Publisher Copyright:© 2019 IEEE.
Funding
This work was supported by the US National Institutes of Health under grants R24HL114473 and U01NS090408. Correspondence: [email protected]
Funders | Funder number |
---|---|
National Institutes of Health (NIH) | U01NS090408, R24HL114473 |
National Institutes of Health (NIH) |
Keywords
- Cloud Computing
- Electrophysiological Signals
- European Data Format
- Hadoop MapReduce
ASJC Scopus subject areas
- Biochemistry
- Biotechnology
- Molecular Medicine
- Modeling and Simulation
- Health Informatics
- Pharmacology (medical)
- Public Health, Environmental and Occupational Health