We’re a team of computational biologists at the Genome Institute of Singapore that focus on methods development and analysis of third generation long read transcriptomics data. We work together with experimental teams and clinicians to apply and translate our methods.

–03-08-2021: Please come back for an update of the research website soon! The most up-to date overview can be found in the publication section and on our github page.–



Alternative splicing in ESCs (see Lu et al. (2013) Nat. Cell. Biol.)

The main focus of our group is the analysis of transcriptomics data. We’re developing algorithms for large scale data analysis, modeling of batch effects, and normalisation of technical biases. Our work aims at identifying alternative splicing events, retrotransposons, and novel RNAs that contribute to human diseases (See Lu et al. (2013) Nat. Cell. Biol., Lu et al (2014) Nat. Struct. Mol. Biol., Göke et al. (2015) Cell Stem Cell, Göke and Ng (2016) EMBO Reports). In collaboration with wet labs at GIS and worldwide we’re focusing on cancer and neurodegenerative disease models (see Lin et al (2016) Cell Reports, Jo et al (2016) Cell Stem Cell). Current research focuses on large scale data sets and third generation, direct RNA sequencing technology.


Retrotransposons and their contribution to the coding and non-coding transcriptome (see Göke et al (2016) EMBO Reports)

 Machine Learning

All information is encoded in the DNA. Using string kernels and support vector machines we learn models to predict funtional elements from the DNA sequence alone. Our group has developed the first mismatch string kernel using a statistical model that corrects for the expected sequence composition (Göke et al (2012) Bioinformatics). We are also applying machine learning techniques to clinical data to understand the power and limitations of genomics data for personalised medicine.


Machine learning with genomics data: classification of regulatory function from DNA sequences (left), and stratification of breast cancer patients in groups with different outcome using genomics data (right).

 Cell Identity and Cellular Heterogeneity


Conversion of embryonic stem cell states (see Chan et al. (2013) Cell Stem Cell)

Even though all cells of the human body essentially share the same genetic information, the cell types that form the organs and tissues appear very diverse and have distinct properties. Together with wet lab groups at the GIS we aim at understanding the molecular determinants of cellular identity during development and differentiation; and how cellular identity is impaired in disease. We investigate transcript diversity, alternative splicing, gene regulation, and epigenetics to identify key elements and mechanisms involved in maintenance and conversion of cellular identities. We have been working extensively with embryonic stem cells to understand how cell states and complex cellular systems can be induced. Our group has contributed to the discovery of naive human pluripotency (Chan et al (2013) Cell Stem Cell) and characterisation of the first human midbrain organoids (Jo et al (2016) Cell Stem Cell).