Next generation sequence characterization of complex genome structural variation
Dept. Computer Engineering, Bilkent University, Ankara, Turkey
Structural variation, in the broadest sense, is defined as the genomic changes among individuals that are not single nucleotide variants. Rapid computational methods are needed to comprehensively detect and characterize specific classes of structural variation using next-gen sequencing technology. We have developed a suite of tools using a new aligner, mrFAST, and algorithms focused on the characterization of structural variants that have been more difficult to assay : (i) deletions, small insertions, inversions and mobile element insertions using read-pair signatures (VariationHunter), (ii) novel sequence insertions coupling read-pair data local sequence assembly (NovelSeq), (iii) absolute copy number of duplicated genes using read-depth analysis coupled with single-unique nucleotide (SUN) identifiers. I will present a summary of our results of 9 high-coverage human genomes regarding these particular classes of structural variation compared to other datasets. In particular, I will also summarize our read-depth analysis of 159 low-coverage human genomes for copy number variation of duplicated genes. We use these data to correct CNV genotypes for copy number and location and discover previously hidden patterns of complex polymorphism. Our results demonstrate, for the first time, the ability to assay both copy and content of complex regions of the human genome, opening these regions to disease association studies and further population and evolutionary analyses. The algorithms we have developed will provide a much needed step towards a highly reliable and comprehensive structural variation discovery framework, which, in turn will enable genomics researchers to better understand the variations in the genomes of newly sequenced human individuals including patient genomes.
Can Alkan is currently an Assistant at the Department of Computer Engineering at the Bilkent University since January 2012. He graduated from Bilkent University Dept. of Computer Engineering in 2000, and received his Ph.D. in Computer Science from Case Western Reserve University in 2005. During his Ph.D. he worked on the evolution of centromeric DNA, RNA-RNA interaction prediction and RNA folding problems. He then joined the lab of Evan Eichler at the Department of Genome Sciences of the University of Washington as a postdoctoral fellow. Since then his work includes computational prediction of human genomic structural variation, and characterization of segmental duplications and copy-number polymorphisms using next generation sequencing data.