Case study: Tammo Rukat
Tammo studied physics for his undergraduate and masters’ degrees and knew early on that he wanted to pursue research. However he also had a strong interest in medical issues, and felt that biological research questions were often more appealing than those in pure physics. Work for his masters’ thesis on medical imaging techniques gave Tammo skills in statistics and modelling techniques, and encouraged him to start thinking about the question: How do we learn from data? It was this aspect, rather than the actual medical imaging, that he discovered interested him most.
When it came to choosing a DPhil course Tammo was keen to focus on something that combined his interests and that he had not studied in detail before. The CDT course provided the opportunity to do this. The training offered in the first year of the Systems Approaches to Biomedical Science CDT is particularly targeted at people with a quantitative background who need to learn skills in biology. Tammo personally found that he was already familiar with much of the biology content because he had spent a year at medical school, but thought that overall the first year taught a really good set of skills for DPhil study. In particular he felt that the modules on introduction to drug discovery and on structural biology were very useful.
After first year projects at Roche and in brain imaging on the Oxford Centre for Human Brain Activity, Tammo chose to focus his DPhil on statistical machine learning in the context of human genomics – specifically, the gene expression profiles of single cells. Perhaps surprisingly, human cells that are situated right next to each other in what looks like uniform tissue can have very different gene expression profiles; in other words, the cells all behave differently. This is especially true of tumour cells, which can be extremely heterogeneous. Measuring the gene expression of a group of cells gives a result that has been averaged out across all the cells, and so does obscure information about the individual cells.
New techniques, however, allow for single cell sequencing and so enable researchers to study the expression of thousands of different genes in thousands of single cells. This generates a vast amount of data, which is difficult both to process and to interpret: the data need to be compressed while still retaining the key information. For Tammo the real challenge is to design algorithms that are scalable and can be applied to huge datasets. He is developing probabilistic models that attempt to identify latent variables – those which cannot directly be observed, but which are the hidden factor generating patterns and structure in the data. While these tools may help to understand biological complexity, they are generic enough to be applicable to a wide range of problems from various areas.
Because of the topic of Tammo’s DPhil, the industrial partnership aspect of the CDT is perhaps less directly relevant to him than it might be to other DPhil students. However, Tammo found it very interesting in his first year rotations to get first-hand experience at Roche of how industry works, and discover how its approach differs from that of academia. He feels that a major advantage of the CDT course is the ability to get to know supervisors through the first year projects. Other benefits are the freedom to try out projects in different areas, being able to collaborate across departments, and having a cohort of students who form a strong support network during the length of the DPhil course.