Ultrafast convolution/superposition using tabulated and exponential kernels on GPU

Quan Chen, Mingli Chen, Weiguo Lu

Research output: Contribution to journalArticlepeer-review

36 Scopus citations


Purpose: Collapsed-cone convolution/superposition (CCCS) dose calculation is the workhorse for IMRT dose calculation. The authors present a novel algorithm for computing CCCS dose on the modern graphic processing unit (GPU). Methods: The GPU algorithm includes a novel TERMA calculation that has no write-conflicts and has linear computation complexity. The CCCS algorithm uses either tabulated or exponential cumulative-cumulative kernels (CCKs) as reported in literature. The authors have demonstrated that the use of exponential kernels can reduce the computation complexity by order of a dimension and achieve excellent accuracy. Special attentions are paid to the unique architecture of GPU, especially the memory accessing pattern, which increases performance by more than tenfold. Results: As a result, the tabulated kernel implementation in GPU is two to three times faster than other GPU implementations reported in literature. The implementation of CCCS showed significant speedup on GPU over single core CPU. On tabulated CCK, speedups as high as 70 are observed; on exponential CCK, speedups as high as 90 are observed. Conclusions: Overall, the GPU algorithm using exponential CCK is 1000-3000 times faster over a highly optimized single-threaded CPU implementation using tabulated CCK, while the dose differences are within 0.5% and 0.5 mm. This ultrafast CCCS algorithm will allow many time-sensitive applications to use accurate dose calculation.

Original languageEnglish
Pages (from-to)1150-1161
Number of pages12
JournalMedical Physics
Issue number3
StatePublished - Mar 2011


  • GPU
  • convolution superposition
  • dose calculation
  • exponential kernels
  • tabulated kernels
  • treatment planning

ASJC Scopus subject areas

  • Biophysics
  • Radiology Nuclear Medicine and imaging


Dive into the research topics of 'Ultrafast convolution/superposition using tabulated and exponential kernels on GPU'. Together they form a unique fingerprint.

Cite this