Detecting Large Deletions at Base Pair Level by Combining Split Read and Paired Read Data.
Department of Physics and Computer Science - Dual Degree Engineering
Background: Genomic structural variants (SV) play a significant role in the onset and progression of cancer. Genomic deletions can create oncogenic fusion genes or cause the loss of tumor suppressing gene function which can lead to tumorigenesis by downregulating these genes. Detecting these variants has clinical importance in the treatment of diseases. Furthermore, it is also clinically important to detect their breakpoint boundaries at high resolution. We have generalized the framework of a previously-published algorithm that located translocations, and we have applied that framework to develop a method to locate deletions at base pair level using next-generation sequencing data. Our method uses abnormally mapped read pairs, and then subsequently maps split reads to identify precise breakpoints. Results: On a primary prostate cancer dataset and a simulated dataset, our method predicted the number, type, and breakpoints of biologically validated SVs at high accuracy. It also outperformed two existing algorithms on precise breakpoint prediction, which is clinically important. Conclusion: Our algorithm, called Pegasus, accurately calls deletion breakpoints. However, the method must be extended to allow for germline variant filtering and heterozygous deletion detection. The source code that implements Pegasus can be downloaded from the following URL: http://github.com/mhayes20/Pegasus.
Hayes, Matthew and Pearson, J. S., "Detecting Large Deletions at Base Pair Level by Combining Split Read and Paired Read Data." (2017). Faculty and Staff Publications. 76.