Prediction of Disease Risk in Personalised Medicine

We have developed novel methodology was developed to improve disease prediction from genetic data by using information across multiple phenotypes. There is widespread evidence of genetic correlations among a large number of phenotypes and we developed an approach to use this information by combining summary statistics from many different traits. We have demonstrated that the improvements in prediction through this approach are similar to those gained by combining full sets of genotype-phenotype information on all individuals. As combining individual-level genotype-phenotype data from multiple disease consortia is impossible under current data sharing restrictions, we expect this summary statistic approach to be widely used in future. We are now developing Bayesian methods to further improve prediction accuracy, accounting for biological knowledge, and are developing frameworks to apply these approaches in a clinical setting utilising other forms of information within an individual’s health record.

The Genetics of Human Ageing And Age-at-onset

This project develops novel association mapping techniques for time-to-event data. The project will aim to better understand the genetic basis of age-at-onset for common complex diseases asking the question: why do some people suffer symptoms earlier in life?

Within this framework we also hope to investigate fully the genetic basis of lifespan avoiding the complications of having to assume that the phenotype is normally distributed. This will enable us to empirically test theories for the genetic basis of ageing, in an experimental design that is unbiased of confounders such as common/shared environment. We are also working to identify trait-associated loci, testing for age-specific effects, and then replicating our results across other studies. We will test whether age-specific genetic effects involve rare or common loci, whether they cluster into specific genomic annotations, whether the same pathways have the same direction of effects across traits throughout life, and whether the regions identified overlap with those that influence gene expression and gene methylation.

​Biomarkers for disease risk

Data characterising gene expression, protein structure, or epigenetic modifications such as DNA methylation, histone marks and nucleosome positioning are becoming increasingly available. Epigenetic marks reflect a wide range of environmental exposures and genetic influences, are critical for regulating gene and non-coding RNA expression, and have been shown to influence disease susceptibility. The identification of clinically relevant epigenetic loci can provide insight into the molecular underpinning of disease, leading to identification of biologically relevant therapeutic targets and potentially epigenetic-guided clinical decision making. We are developing a new approach to identify epigenetic biomarkers, based on Bayesian inference, that: (i) estimates probe effects on an outcome jointly and conditionally on each other whilst controlling for other covariates such as sex and age, which avoids model over fitting and produces effect size estimates that are unbiased of data structure (including cell-count effects) and the correlations among probes; (ii) estimates the total proportion of disease risk accounted for by the probe effects (cumulative proportion of variance explained); (iii) does not require any knowledge of cell-type composition or any selection of proxy confounder variables; (iv) estimates probe effects conditional on other sources of data such as single nucleotide polymorphism data, enabling a determination of the unique contribution of different data types; (v) gives an in-depth understanding of the whole genome-wide range of probe effects on the phenotype, in terms of the likely number of independent effects and their variance explained; (vi) enables unique enrichment analyses, describing the variance explained and number of trait-associated probes of each annotation; and (vii) provides improved estimation of biomarker effects, which could be used for disease risk assessment.

​Population Genetic Differentiation

We are developing population genetic models to fully model genetic differentiation across populations, clinal gradients, groups, and ethnicities. We have deleted an approach to examine allele frequency differentiation of loci across traits, testing whether loci cluster into specific genomic annotations, whether the same pathways have the same direction of effects across traits throughout life, and whether the regions identified overlap with those that influence gene expression and gene methylation. Through this framework, we hope to better understand differences among human population groups in genetic risk of disease and better understand how history has shaped the phenotypic diversity that we currently observe in the human population.


Assortative mating and mate choice​

​Assortative mating occurs when individuals exhibit a preference for pairing with those who are either similar or dissimilar to themselves. In human populations, assortment is almost universally in the same direction, with observed similarities between spouses for quantitative phenotypes, common disease, behaviour, social factors, and personality. Assortative mating can arise from phenotypic assortment based on mate choice, partner interaction and convergence in phenotype over time, or because individuals pair on social or environmental background, referred to as social homogamy. Distinguishing among these mechanisms is a long-standing question. In previous work, we have developed a new design and analytical approach, to show that assortative mating for many human phenotypes occurs through phenotypic assortment based on mate choice across human populations.


​Genotype-covariate interactions

Whole genome interaction effects have been little studied for human complex traits. We are interested in developing methods to test for interactive effects such as whether the environment modifies genetic predisposition to common disease. This work so far has focussed on body mass index, but more methodological improvement is required as well as application to many more phenotypes.