I currently work in three interdisciplinary research teams to understand gene expression. Gene expression is a crucial regulatory component in each of our cells: While (almost) every cell in our body has the same pool of building blocks, the genome, the characteristics of each cell are determined by which building blocks (genes) are chosen from it. This choice determines whether a cell is a nerve cell, a muscle cell or a tumor cell.
While the exact number of genes in the human genome is unknown, it is estimated to be ~ 20000. When this number was first published it perplexed a lot of scientists, since it puts our gene-count close to that of less complex organisms, like, for example, a worm (C. elegans). One hypothesis states that not the number of genes is responsible for complexity, but the combinatorial choice of even smaller building units which make up a gene. This choice can yield several distinct gene products and is called alternative splicing. I am driven to understand this complex system using a combination of novel experimental, quantification and data scientific approaches.
What governs alternative splicing?
Many studies on alternative splicing focus on deciphering the choices performed in one specific cell-type or one species. While this is clearly important, we still don’t understand the base-layer of alternative splicing regulation.
I teamed up with a mathematician in the Howard laboratory at Yale University, Hugo Bowne-Anderson, to identify regulatory features of alternative splicing, which apply not only to one, but a wide range of cell types and species. To characterize this base-layer of regulation we use publicly mined genomic data-sets from numerous species and cell-types. We use probabilistic modeling to characterize the statistical footprint of alternative splicing, which in turn will allow us to identify candidate regulatory processes. We combine data-mining of numerous genomic data-sets with machine learning to identify regulating features.
The combination of these approaches will allow us to characterize the base-layer of alternative splicing regulation, crucial to understand any form of alternative splicing, including its mis-regulation in disease.
What is the spatial organization of alternative splicing?
While genomic experiments are extremely helpful to quantify average metrics of cell or organism populations, they often lack the information how these averages are achieved. If, for example, a genomic experiment reports the presence of two gene-isoforms, it is unknown whether these isoforms co-exist in one cell or exist in two distinct cell types.
To understand the spatial organization of alternative splicing, I collaborate with biologists of Shirin Bahmanyar’s laboratory at Yale University and a computation scientist, Stephan Preibisch from Robert Singer’s laboratory at The Albert Einstein College of Medicine. Together we combine genomic experiments with single molecule imaging of alternative splicing.
This approach will allow us to i) identify interesting genes and ii) characterize their spatial expression with respect to tissue, cell-type and cell. In combination with statistical models this will allow us to characterize mechanisms fundamental to spatial-specific gene expression.
How fast is splicing?
An obvious way to achieve the expression of gene A over B, is to make the generation of A faster than B. Unfortunately, it is currently impossible to measure the speed (kinetics) of the splicing reaction in the living cell. Lydia Herzel from the lab of my PhD advisor Karla Neugebauer at Yale University , and I developed a sequencing based method to perform exactly this measurement for single genes. This will allow us to i) determine the gene-specific speed of splicing and ii) to identify regulating features using machine learning techniques.