SeqOthello: querying RNA-seq experiments at scale

Ye Yu, Jinpeng Liu, Xinan Liu, Yi Zhang, Eamonn Magner, Erik Lehnert, Chen Qian, Jinze Liu

Research output: Contribution to journalArticlepeer-review

29 Scopus citations

Abstract

We present SeqOthello, an ultra-fast and memory-efficient indexing structure to support arbitrary sequence query against large collections of RNA-seq experiments. It takes SeqOthello only 5 min and 19.1 GB memory to conduct a global survey of 11,658 fusion events against 10,113 TCGA Pan-Cancer RNA-seq datasets. The query recovers 92.7% of tier-1 fusions curated by TCGA Fusion Gene Database and reveals 270 novel occurrences, all of which are present as tumor-specific. By providing a reference-free, alignment-free, and parameter-free sequence search system, SeqOthello will enable large-scale integrative studies using sequence-level data, an undertaking not previously practicable for many individual labs.

Original languageEnglish
Article number167
JournalGenome Biology
Volume19
Issue number1
DOIs
StatePublished - Oct 19 2018

Bibliographical note

Publisher Copyright:
© 2018 The Author(s).

Funding

This work was supported by US National Science Foundation [award grant number 1054631 to J.L., CNS-1717948 and CNS-1750704 to C.Q.] and National Institutes of Health [grant number P30CA177558 and 1UL1TR001998-01 to J.L.] The Seven Bridges Cancer Genomics Cloud [47] has been funded in whole or in part with Federal funds from the National Cancer Institute, National Institutes of Health, Contract No. HHSN261201400008C and ID/IQ Agreement No. 17X146 under Contract No. HHSN261201500003I.

FundersFunder number
National Science Foundation Arctic Social Science ProgramCNS-1717948, 1750704, 1717948, 1054631
National Science Foundation Arctic Social Science Program
National Institutes of Health (NIH)P30CA177558, 1UL1TR001998-01
National Institutes of Health (NIH)
National Childhood Cancer Registry – National Cancer InstituteHHSN261201500003I, 17X146
National Childhood Cancer Registry – National Cancer Institute

    Keywords

    • Compression
    • Gene fusion
    • Othello
    • Pan-cancer
    • Query
    • RNA-seq
    • SeqOthello
    • TCGA

    ASJC Scopus subject areas

    • Ecology, Evolution, Behavior and Systematics
    • Genetics
    • Cell Biology

    Fingerprint

    Dive into the research topics of 'SeqOthello: querying RNA-seq experiments at scale'. Together they form a unique fingerprint.

    Cite this