Structural variant detection: a novel approach

152740-Thumbnail Image.png
Description
Genomic structural variation (SV) is defined as gross alterations in the genome broadly classified as insertions/duplications, deletions inversions and translocations. DNA sequencing ushered structural variant discovery beyond laboratory detection techniques to high resolution informatics approaches. Bioinformatics tools for computational discovery

Genomic structural variation (SV) is defined as gross alterations in the genome broadly classified as insertions/duplications, deletions inversions and translocations. DNA sequencing ushered structural variant discovery beyond laboratory detection techniques to high resolution informatics approaches. Bioinformatics tools for computational discovery of SVs however are still missing variants in the complex cancer genome. This study aimed to define genomic context leading to tool failure and design novel algorithm addressing this context. Methods: The study tested the widely held but unproven hypothesis that tools fail to detect variants which lie in repeat regions. Publicly available 1000-Genomes dataset with experimentally validated variants was tested with SVDetect-tool for presence of true positives (TP) SVs versus false negative (FN) SVs, expecting that FNs would be overrepresented in repeat regions. Further, the novel algorithm designed to informatically capture the biological etiology of translocations (non-allelic homologous recombination and 3&ndashD; placement of chromosomes in cells –context) was tested using simulated dataset. Translocations were created in known translocation hotspots and the novel&ndashalgorithm; tool compared with SVDetect and BreakDancer. Results: 53% of false negative (FN) deletions were within repeat structure compared to 81% true positive (TP) deletions. Similarly, 33% FN insertions versus 42% TP, 26% FN duplication versus 57% TP and 54% FN novel sequences versus 62% TP were within repeats. Repeat structure was not driving the tool's inability to detect variants and could not be used as context. The novel algorithm with a redefined context, when tested against SVDetect and BreakDancer was able to detect 10/10 simulated translocations with 30X coverage dataset and 100% allele frequency, while SVDetect captured 4/10 and BreakDancer detected 6/10. For 15X coverage dataset with 100% allele frequency, novel algorithm was able to detect all ten translocations albeit with fewer reads supporting the same. BreakDancer detected 4/10 and SVDetect detected 2/10 Conclusion: This study showed that presence of repetitive elements in general within a structural variant did not influence the tool's ability to capture it. This context-based algorithm proved better than current tools even with half the genome coverage than accepted protocol and provides an important first step for novel translocation discovery in cancer genome.
Date Created
2014
Agent