Ana içeriğe atla

Department of Biomedical Informatics... G. Özer,Dec. 01, 13:40 L030

 
 H. Gulcin Ozer, PhD

  Department of Biomedical Informatics, The Ohio State University Medical Center

Biomedical Informatics Shared Resource, The Ohio State University Comprehensive Cancer Center

Gulcin Ozer received her undergraduate degree from Bogazici University Computer Engineering Deaprtment in 2000. She got her master degree from Gaziantep University Biophysics Department in 2003 and her PhD from Ohio State University Biophysics Department in 2008. Currently she is working at Department of Biomedical Informatics at the Ohio State University Comprehensive Cancer Center Her research interests mainly focus on analysis of next generation sequencing data and ChIP-seq data.

 

Analysis of Next Generation Sequencing Data


Next Generation Sequencing (NGS) technologies can sequence tens of millions of DNA segments in a single experiment. It has been widely used in many genome-wide studies such as genome re-sequencing, protein-DNA interaction (ChIP-seq), RNA sequencing and microRNA screening. Analysis of NGS data is divided into three steps: primary, secondary and tertiary. Primary data analysis includes processing of the raw images and generation of sequence reads for the experiment. Most of the time this step is completed on site by the instrument vendor software in real time. Secondary data analysis includes alignment of the sequence reads to the reference genome. Reference genome can be in form of individual chromosomes, or subset of a genome like expressed genes, or bacteria artificial chromosomes (BACs). Depending on the application, alignment algorithm and its parameters are adjusted. This step is computationally expensive and should be optimized for each application type. At the end of the secondary analysis raw sequence reads becomes well-characterized and biologically meaningful datasets. However, the volume of the datasets are still very large to interpret. Therefore, application specific tertiary data analysis is necessary to interpret the data into information. In this step alignment data from secondary analysis can be summarized, visualized, combined or compared with other datasets. Most common analyses are peak detection, motif finding, SNP analysis, gene annotation, expression analysis, variant detection and normalization.

 

Comparing Multiple Protein Binding Profiles in ChIP-seq Experiments

 

Next Generation Sequencing (NGS) technologies can generate millions of short sequences in a single experiment. As the size of the data increases, comparison of multiple experiments on different cell lines under different experimental conditions becomes a big challenge. In this study, we investigate ways to compare multiple ChIP-seq experiments. We specifically studied epigenetic regulation of breast cancer and the effect of estrogen using 50 ChIP-seq data from Illumina Genome Analizer II. First, we evaluate the correlation among different experiments focusing on total number of reads in transcribed regions of the genome. Then, we adopted the method that is used to identify most stable genes in RT-PCR experiments to understand background signal across all experiments and to identify most variably transcribed regions of the genome. Gene ontology and function enrichment analysis on the 100 most variable genes demonstrate the biological relevance of the results. In this study, we present a method that can effectively select differentially transcribed regions based on protein binding profiles over multiple experiments using real data points without any normalization among the samples.

 

Home

MDBF Dekanlık Ofisi

Orta Mahalle, 34956 Tuzla, İstanbul, Türkiye

+90 216 483 96 00

© Sabancı Üniversitesi 2023