SEMINAR:Domain Knowledge-Based Feature Grouping and Scoring for Omics...

Time:July 10 2024 2:40 FENS 2019(Hybrid)

Join Zoom Meeting:https://sabanciuniv.zoom.us/my/lozturk

Please find the abstract of the talk and the short bio of the speaker below.

Abstract:Identifying significant sets of biomolecules that are dysregulated under specific conditions is vital to understand disease development mechanisms at the molecular level. Along this line, numerous studies have utilized various computational feature selection (CFS) methods while analyzing omics datasets. These methods primarily rely on statistical techniques to individually rank the features and then either discard lower-ranked ones or retain highly-ranked ones. However, CFS methods neglect the biological domain-knowledge buried in these features. Moreover, the process of eliminating or retaining features individually fails to account for dependencies and correlations among them, potentially resulting in the inclusion of redundant and irrelevant features. Consequently, CFS methods struggle with biological interpretation, leading to limited biological insights. The incorporation of biological knowledge into the machine-learning (ML) algorithms have shifted the pure data-oriented studies into the domain knowledge driven ML approaches.

Recently, our group proposed the Grouping–Scoring–Modeling (G-S-M) approach to select groups of features (i.e., genes, microRNAs, microorganisms, etc.) rather than evaluating the features individually. The feature groups can be either generated by utilizing pre-existing domain knowledge (DK) stored in a biological database or through a fully data-driven approach using statistical measures. This approach has been explored in the development of several

bioinformatics tools such as miRcorrNet, PriPath, miRModuleNet, 3Mint, GeNetOntology, mirDisNet, microBiomeGSM, and AMP-GSM.

In this talk, I will introduce the G-S-M approach and its applications on different -omics datasets. Firstly, I will focus on the development of GeNetOntology tool and its application on glioblastoma gene expression dataset. GeNetOntology utilizes Gene Ontology as external biological information to select the most relevant genes from transcriptomic datasets. Our approach has been tested on 10 other datasets, and the experimental findings showed that GeNetOntology successfully identified important disease-related ontology terms and their associated genes to be used in the classification model. Hence, scientists can discover biomarkers that can assist disease diagnosis and targeted treatment strategies.

Secondly, I will talk about 3Mint, which integratively analyzes multi-omics (i.e., gene expression (mRNA), microRNA (miRNA) and methylation (CpG)) data. We have experimented 3Mint using the patient profiles obtained from The Cancer Genome Atlas for breast cancer (BRCA) molecular subtype identification problem. The possible roles of the identified genes, miRNAs, CpGs for the distinction of Luminal and ER-negative groups of BRCA are assessed with respect to the literature. The relationships between the mRNA, miRNA, and methylation markers over cross-validation iterations provide insight into understanding of the basis of disease, mechanism of action and detection of disease state.

Lastly, I will talk about another recent method, Recursive Cluster Elimination with Intra-Cluster Feature Elimination (RCE-IFE), developed by our group. We have experimented RCE-IFE on colorectal-cancer (CRC), type-2-diabetes, inflammatory bowel disease-associated metagenomics datasets. Our findings demonstrate that the proposed strategy effectively reduces the size of the feature set and improves the model performance. Since the intestinal microbiota can also be used as a prognostic biomarker in CRC patients and is effective in determining therapy response and drug resistance in patients, the developed method can be used for such purposes in addition to disease diagnosis. Additionally, such studies are important in determining diets, prebiotic and probiotic mixtures that can change the intestinal microbiome to prevent and help to treat CRC. In addition, discovering potential pathogens of CRC contributes to fecal microbiota transplantation (FMT) studies used to restore the microbiota. We hope to guide FMT studies by identifying potential pathogens of CRC.

In summary, our group aims to enlighten the main molecular mechanisms behind disease development and progression via developing machine-learning methods. The identified cellular mechanisms can be exploited further for early disease detection methods in clinical diagnostics or as druggable targets in order to help physicians to develop new medical treatment plans, personalized therapies.

Keywords: biological domain-knowledge based feature selection, feature grouping, multi-omics data analysis, biomarker identification.

Bio: Burcu Bakir Gungor received her B.Sc. degree in Biological Sciences and Bioengineering from Sabanci University; her M.Sc. degree in Bioinformatics from Georgia Institute of Technology; and her PhD degree from Georgia Institute of Technology/Sabanci University. She held a researcher position at the Center of Excellence in Bioinformatics, University at Buffalo in 2003; and in Bioinformatics Research Center, Rat Genome Database, Human and Molecular Genetics Center, Medical College of Wisconsin from 2007 to 2009. From 2009 to 2011, she worked at the Department of Computer Engineering, Bahcesehir University. She acted as an Assistant Professor at the Department of Genetics and Bioinformatics, at the same university. From 2012 to 2013, she was part of the Advanced Genomics and Bioinformatics Research Center, UEKAE, BILGEM, TUBITAK. Currently, she works as an Associate Professor at the Department of Computer Engineering at Abdullah Gul University.

In 2022, she received a prestigious award and the L‘ORÉAL – UNESCO National Fellowship as part of the for Women in Science Programme. In 2019, as part of the Bioinfo4Women - Outstanding Young Female Bioinformaticians Programme, she was invited to Barcelona Supercomputing Center and recently she was selected as an International Mentor as part of this program. She is the recipient of ‘‘Best Paper’’ awards at the UBMK 2020 and 4th EvoBIO Conferences. She is an Editorial Board member of PeerJ journal; she acted as a reviewer for several prestigious journals and conferences including Bioinformatics, PLoSOne, Nucleic Acid Research, PeerJ Computer Science, Journal of Computational Biology, Frontiers in Genetics; and she is a Technical Program Committee member of SIU, UBMK, ASYU and HIBIT conferences. She acted as a member of the bioinformatics advisory board of the Turkish Genome Project. She worked as a PI, researcher and advisor in many international and national projects supported by the National Institute of Health (NIH), the European Union (EU), TÜBİTAK, and TUSEB. Her research interests include bioinformatics, computational genomics, applications of machine learning and data science in bioinformatics.