Abstract
We present SeqOthello, an ultra-fast and memory-efficient indexing structure to support arbitrary sequence query against large collections of RNA-seq experiments. It takes SeqOthello only 5 min and 19.1 GB memory to conduct a global survey of 11,658 fusion events against 10,113 TCGA Pan-Cancer RNA-seq datasets. The query recovers 92.7% of tier-1 fusions curated by TCGA Fusion Gene Database and reveals 270 novel occurrences, all of which are present as tumor-specific. By providing a reference-free, alignment-free, and parameter-free sequence search system, SeqOthello will enable large-scale integrative studies using sequence-level data, an undertaking not previously practicable for many individual labs.
| Original language | English |
|---|---|
| Article number | 167 |
| Journal | Genome Biology |
| Volume | 19 |
| Issue number | 1 |
| DOIs | |
| State | Published - Oct 19 2018 |
Bibliographical note
Publisher Copyright:© 2018 The Author(s).
Funding
This work was supported by US National Science Foundation [award grant number 1054631 to J.L., CNS-1717948 and CNS-1750704 to C.Q.] and National Institutes of Health [grant number P30CA177558 and 1UL1TR001998-01 to J.L.] The Seven Bridges Cancer Genomics Cloud [47] has been funded in whole or in part with Federal funds from the National Cancer Institute, National Institutes of Health, Contract No. HHSN261201400008C and ID/IQ Agreement No. 17X146 under Contract No. HHSN261201500003I.
| Funders | Funder number |
|---|---|
| U.S. Department of Energy Chinese Academy of Sciences Guangzhou Municipal Science and Technology Project Oak Ridge National Laboratory Extreme Science and Engineering Discovery Environment National Science Foundation National Energy Research Scientific Computing Center National Natural Science Foundation of China | CNS-1717948, 1750704, 1717948, 1054631 |
| U.S. Department of Energy Chinese Academy of Sciences Guangzhou Municipal Science and Technology Project Oak Ridge National Laboratory Extreme Science and Engineering Discovery Environment National Science Foundation National Energy Research Scientific Computing Center National Natural Science Foundation of China | |
| National Institutes of Health (NIH) | P30CA177558, 1UL1TR001998-01 |
| National Institutes of Health (NIH) | |
| National Childhood Cancer Registry – National Cancer Institute | HHSN261201500003I, 17X146 |
| National Childhood Cancer Registry – National Cancer Institute |
Keywords
- Compression
- Gene fusion
- Othello
- Pan-cancer
- Query
- RNA-seq
- SeqOthello
- TCGA
ASJC Scopus subject areas
- Ecology, Evolution, Behavior and Systematics
- Genetics
- Cell Biology