SeqOthello: querying RNA-seq experiments at scale

Ye Yu, Jinpeng Liu, Xinan Liu, Yi Zhang, Eamonn Magner, Erik Lehnert, Chen Qian, Jinze Liu

Research output: Contribution to journalArticlepeer-review

30 Scopus citations

Abstract

We present SeqOthello, an ultra-fast and memory-efficient indexing structure to support arbitrary sequence query against large collections of RNA-seq experiments. It takes SeqOthello only 5 min and 19.1 GB memory to conduct a global survey of 11,658 fusion events against 10,113 TCGA Pan-Cancer RNA-seq datasets. The query recovers 92.7% of tier-1 fusions curated by TCGA Fusion Gene Database and reveals 270 novel occurrences, all of which are present as tumor-specific. By providing a reference-free, alignment-free, and parameter-free sequence search system, SeqOthello will enable large-scale integrative studies using sequence-level data, an undertaking not previously practicable for many individual labs.

Original languageEnglish
Article number167
JournalGenome Biology
Volume19
Issue number1
DOIs
StatePublished - Oct 19 2018

Bibliographical note

Publisher Copyright:
© 2018 The Author(s).

Funding

This work was supported by US National Science Foundation [award grant number 1054631 to J.L., CNS-1717948 and CNS-1750704 to C.Q.] and National Institutes of Health [grant number P30CA177558 and 1UL1TR001998-01 to J.L.] The Seven Bridges Cancer Genomics Cloud [47] has been funded in whole or in part with Federal funds from the National Cancer Institute, National Institutes of Health, Contract No. HHSN261201400008C and ID/IQ Agreement No. 17X146 under Contract No. HHSN261201500003I.

FundersFunder number
U.S. Department of Energy Chinese Academy of Sciences Guangzhou Municipal Science and Technology Project Oak Ridge National Laboratory Extreme Science and Engineering Discovery Environment National Science Foundation National Energy Research Scientific Computing Center National Natural Science Foundation of ChinaCNS-1717948, 1750704, 1717948, 1054631
U.S. Department of Energy Chinese Academy of Sciences Guangzhou Municipal Science and Technology Project Oak Ridge National Laboratory Extreme Science and Engineering Discovery Environment National Science Foundation National Energy Research Scientific Computing Center National Natural Science Foundation of China
National Institutes of Health (NIH)P30CA177558, 1UL1TR001998-01
National Institutes of Health (NIH)
National Childhood Cancer Registry – National Cancer InstituteHHSN261201500003I, 17X146
National Childhood Cancer Registry – National Cancer Institute

    Keywords

    • Compression
    • Gene fusion
    • Othello
    • Pan-cancer
    • Query
    • RNA-seq
    • SeqOthello
    • TCGA

    ASJC Scopus subject areas

    • Ecology, Evolution, Behavior and Systematics
    • Genetics
    • Cell Biology

    Fingerprint

    Dive into the research topics of 'SeqOthello: querying RNA-seq experiments at scale'. Together they form a unique fingerprint.

    Cite this