Abstract
We present SeqOthello, an ultra-fast and memory-efficient indexing structure to support arbitrary sequence query against large collections of RNA-seq experiments. It takes SeqOthello only 5 min and 19.1 GB memory to conduct a global survey of 11,658 fusion events against 10,113 TCGA Pan-Cancer RNA-seq datasets. The query recovers 92.7% of tier-1 fusions curated by TCGA Fusion Gene Database and reveals 270 novel occurrences, all of which are present as tumor-specific. By providing a reference-free, alignment-free, and parameter-free sequence search system, SeqOthello will enable large-scale integrative studies using sequence-level data, an undertaking not previously practicable for many individual labs.
Original language | English |
---|---|
Article number | 167 |
Journal | Genome Biology |
Volume | 19 |
Issue number | 1 |
DOIs | |
State | Published - Oct 19 2018 |
Bibliographical note
Publisher Copyright:© 2018 The Author(s).
Keywords
- Compression
- Gene fusion
- Othello
- Pan-cancer
- Query
- RNA-seq
- SeqOthello
- TCGA
ASJC Scopus subject areas
- Ecology, Evolution, Behavior and Systematics
- Genetics
- Cell Biology