TY - JOUR
T1 - RNA-seq mixology
T2 - Designing realistic control experiments to compare protocols and analysis methods
AU - Holik, Aliaksei Z.
AU - Law, Charity W.
AU - Liu, Ruijie
AU - Wang, Zeya
AU - Wang, Wenyi
AU - Ahn, Jaeil
AU - Asselin-Labat, Marie Liesse
AU - Smyth, Gordon K.
AU - Ritchie, Matthew E.
N1 - Publisher Copyright:
© The Author(s) 2016.
PY - 2017/3/17
Y1 - 2017/3/17
N2 - Carefully designed control experiments provide a gold standard for benchmarking different genomics research tools. A shortcoming of many gene expression control studies is that replication involves profiling the same reference RNA samplemultiple times. This leads to low, pure technical noise that is atypical of regular studies. To achieve a more realistic noise structure, we generated a RNA-sequencing mixture experiment using two cell lines of the same cancer type. Variability was added by extracting RNA from independent cell cultures and degrading particular samples. The systematic gene expression changes induced by this design allowed benchmarking of different library preparation kits (standard poly-A versus total RNA with Ribozero depletion) and analysis pipelines. Data generated using the total RNA kit had more signal for introns and various RNA classes (ncRNA, snRNA, snoRNA) and less variability after degradation. For differential expression analysis, voom with quality weights marginally outperformed other popular methods, while for differential splicing, DEXSeq was simultaneously the most sensitive and the most inconsistent method. For sample deconvolution analysis, DeMix outperformed IsoPure convincingly. Our RNA-sequencing data set provides a valuable resource for benchmarking different protocols and data pre-processing workflows. The extra noise mimics routine lab experiments more closely, ensuring any conclusions are widely applicable.
AB - Carefully designed control experiments provide a gold standard for benchmarking different genomics research tools. A shortcoming of many gene expression control studies is that replication involves profiling the same reference RNA samplemultiple times. This leads to low, pure technical noise that is atypical of regular studies. To achieve a more realistic noise structure, we generated a RNA-sequencing mixture experiment using two cell lines of the same cancer type. Variability was added by extracting RNA from independent cell cultures and degrading particular samples. The systematic gene expression changes induced by this design allowed benchmarking of different library preparation kits (standard poly-A versus total RNA with Ribozero depletion) and analysis pipelines. Data generated using the total RNA kit had more signal for introns and various RNA classes (ncRNA, snRNA, snoRNA) and less variability after degradation. For differential expression analysis, voom with quality weights marginally outperformed other popular methods, while for differential splicing, DEXSeq was simultaneously the most sensitive and the most inconsistent method. For sample deconvolution analysis, DeMix outperformed IsoPure convincingly. Our RNA-sequencing data set provides a valuable resource for benchmarking different protocols and data pre-processing workflows. The extra noise mimics routine lab experiments more closely, ensuring any conclusions are widely applicable.
UR - http://www.scopus.com/inward/record.url?scp=85018289156&partnerID=8YFLogxK
UR - http://www.scopus.com/inward/citedby.url?scp=85018289156&partnerID=8YFLogxK
U2 - 10.1093/nar/gkw1063
DO - 10.1093/nar/gkw1063
M3 - Article
C2 - 27899618
AN - SCOPUS:85018289156
SN - 0305-1048
VL - 45
JO - Nucleic Acids Research
JF - Nucleic Acids Research
IS - 5
M1 - 1063
ER -