We’re a computational genomics group at the Genome Institute of Singapore. Our main focus is the work with transcriptomics data, we’re developing algorithms for Third generation sequencing technology, and we analyse data from 1,000s of patient samples.
The main focus of our group is the analysis of transcriptomics data. We’re developing algorithms for large scale data analysis, modeling of batch effects, and normalisation of technical biases. Our work aims at identifying alternative splicing events, retrotransposons, and novel RNAs that contribute to human diseases (See Lu et al. (2013) Nat. Cell. Biol., Lu et al (2014) Nat. Struct. Mol. Biol., Göke et al. (2015) Cell Stem Cell, Göke and Ng (2016) EMBO Reports). In collaboration with wet labs at GIS and worldwide we’re focusing on cancer and neurodegenerative disease models (see Lin et al (2016) Cell Reports, Jo et al (2016) Cell Stem Cell). Current research focuses on large scale data sets, we’re analysing thousands of samples from bulk tissues and single cells.
Cell Identity and Cellular Heterogeneity
Even though all cells of the human body essentially share the same genetic information, the cell types that form the organs and tissues appear very diverse and have distinct properties. Together with wet lab groups at the GIS we aim at understanding the molecular determinants of cellular identity during development and differentiation; and how cellular identity is impaired in disease. We investigate transcript diversity, alternative splicing, gene regulation, and epigenetics to identify key elements and mechanisms involved in maintenance and conversion of cellular identities. We have been working extensively with embryonic stem cells to understand how cell states and complex cellular systems can be induced. Our group has contributed to the discovery of naive human pluripotency (Chan et al (2013) Cell Stem Cell) and characterisation of the first human midbrain organoids (Jo et al (2016) Cell Stem Cell).
All information is encoded in the DNA. Using string kernels and support vector machines we learn models to predict funtional elements from the DNA sequence alone. Our group has developed the first mismatch string kernel using a statistical model that corrects for the expected sequence composition (Göke et al (2012) Bioinformatics). We are also applying machine learning techniques to clinical data to understand the power and limitations of genomics data for personalised medicine.