Skip to main content

SEMINAR:Domain Knowledge-Based Feature Grouping and Scoring for Omics...

Time:July 10 2024 2:40 FENS 2019(Hybrid)


Please find the abstract of the talk and the short bio of the speaker below.


Abstract:Identifying significant sets of biomolecules that are dysregulated under specific conditions is vital to understand disease  development mechanisms at the molecular level. Along this line, numerous studies have utilized various computational feature selection (CFS) methods while analyzing omics datasets. These methods primarily rely on statistical techniques to individually rank the features and then either discard lower-ranked ones or retain highly-ranked ones. However, CFS  methods neglect the biological domain-knowledge buried in these features. Moreover, the process of eliminating or  retaining features individually fails to account for dependencies and correlations among them, potentially resulting in  the inclusion of redundant and irrelevant features. Consequently, CFS methods struggle with biological interpretation,  leading to limited biological insights. The incorporation of biological knowledge into the machine-learning (ML) algorithms have shifted the pure data-oriented studies into the domain knowledge driven ML approaches. 

Recently, our group proposed the Grouping–Scoring–Modeling (G-S-M) approach to select groups of features (i.e.,  genes, microRNAs, microorganisms, etc.) rather than evaluating the features individually. The feature groups can be  either generated by utilizing pre-existing domain knowledge (DK) stored in a biological database or through a fully  data-driven approach using statistical measures. This approach has been explored in the development of several 

bioinformatics tools such as miRcorrNet, PriPath, miRModuleNet, 3Mint, GeNetOntology, mirDisNet,  microBiomeGSM, and AMP-GSM. 

In this talk, I will introduce the G-S-M approach and its applications on different -omics datasets. Firstly, I will focus  on the development of GeNetOntology tool and its application on glioblastoma gene expression dataset. GeNetOntology  utilizes Gene Ontology as external biological information to select the most relevant genes from transcriptomic datasets. Our approach has been tested on 10 other datasets, and the experimental findings showed that GeNetOntology  successfully identified important disease-related ontology terms and their associated genes to be used in the  classification model. Hence, scientists can discover biomarkers that can assist disease diagnosis and targeted treatment  strategies. 

Secondly, I will talk about 3Mint, which integratively analyzes multi-omics (i.e., gene expression (mRNA), microRNA  (miRNA) and methylation (CpG)) data. We have experimented 3Mint using the patient profiles obtained from The  Cancer Genome Atlas for breast cancer (BRCA) molecular subtype identification problem. The possible roles of the  identified genes, miRNAs, CpGs for the distinction of Luminal and ER-negative groups of BRCA are assessed with  respect to the literature. The relationships between the mRNA, miRNA, and methylation markers over cross-validation  iterations provide insight into understanding of the basis of disease, mechanism of action and detection of disease state. 

Lastly, I will talk about another recent method, Recursive Cluster Elimination with Intra-Cluster Feature Elimination  (RCE-IFE), developed by our group. We have experimented RCE-IFE on colorectal-cancer (CRC), type-2-diabetes, inflammatory bowel disease-associated metagenomics datasets. Our findings demonstrate that the proposed strategy  effectively reduces the size of the feature set and improves the model performance. Since the intestinal microbiota can  also be used as a prognostic biomarker in CRC patients and is effective in determining therapy response and drug  resistance in patients, the developed method can be used for such purposes in addition to disease diagnosis. Additionally,  such studies are important in determining diets, prebiotic and probiotic mixtures that can change the intestinal  microbiome to prevent and help to treat CRC. In addition, discovering potential pathogens of CRC contributes to fecal  microbiota transplantation (FMT) studies used to restore the microbiota. We hope to guide FMT studies by identifying  potential pathogens of CRC. 

In summary, our group aims to enlighten the main molecular mechanisms behind disease development and progression  via developing machine-learning methods. The identified cellular mechanisms can be exploited further for early disease  detection methods in clinical diagnostics or as druggable targets in order to help physicians to develop new medical  treatment plans, personalized therapies. 

Keywords: biological domain-knowledge based feature selection, feature grouping, multi-omics data analysis, biomarker identification.

Bio: Burcu Bakir Gungor received her B.Sc. degree in  Biological  Sciences  and  Bioengineering from Sabanci University; her M.Sc. degree in Bioinformatics from  Georgia Institute of Technology; and her PhD degree from Georgia Institute of  Technology/Sabanci University. She held a researcher position at the Center of  Excellence in Bioinformatics, University at Buffalo in 2003; and in Bioinformatics  Research Center, Rat Genome Database, Human and Molecular Genetics Center,  Medical College of Wisconsin from 2007 to 2009. From 2009 to 2011, she worked at the Department of Computer Engineering, Bahcesehir University. She acted as an  Assistant Professor at the Department of Genetics and Bioinformatics, at the same  university. From 2012 to 2013, she was part of the Advanced Genomics and  Bioinformatics Research Center, UEKAE, BILGEM, TUBITAK. Currently, she works  as an Associate Professor at the Department of Computer Engineering at Abdullah Gul  University.  

In 2022, she received a prestigious award and the L‘ORÉAL – UNESCO National  Fellowship as part of the for Women in Science Programme. In 2019, as part of the  Bioinfo4Women - Outstanding Young Female Bioinformaticians Programme, she was  invited to Barcelona Supercomputing Center and recently she was selected as an  International Mentor as part of this program. She is the recipient of ‘‘Best Paper’’  awards at the UBMK 2020 and 4th EvoBIO Conferences. She is an Editorial Board  member of PeerJ journal; she acted as a reviewer for several prestigious journals and  conferences including Bioinformatics, PLoSOne, Nucleic Acid Research, PeerJ  Computer Science, Journal of Computational Biology, Frontiers in Genetics; and she  is a Technical Program Committee member of SIU, UBMK, ASYU and HIBIT  conferences. She acted as a member of the bioinformatics advisory board of the Turkish  Genome Project. She worked as a PI, researcher and advisor in many international and  national projects supported by the National Institute of Health (NIH), the European  Union (EU), TÜBİTAK, and TUSEB. Her research interests include bioinformatics,  computational genomics, applications of machine learning and data science in  bioinformatics.

Home

FENS Dean's Office

Orta Mahalle, 34956 Tuzla, İstanbul, Türkiye

+90 216 483 96 00

© Sabancı University 2023