Conference

Last updated 14 April
Central European Summer Time (CEST) applies

A simplified version of this schedule is available here.

16 April

08:50 - 09:00 Welcome by Zoltán Kutalik & Matthew Robinson
09:00 - 09:30 Invited speaker - Jim Wilson: Exome-wide association study of 1102 proteins
09:30 - 10:30 Session 1: Omics analysis
Chairs: Heather Cordell, Jérôme Goudet
9:30 - 9:45 Maarja Lepamets: Copy number variation associations with plasma protein levels in Estonians population
Lepamets M.1, Kalnapenkis A.1, Kals M.2, Mägi R.1, Esko T.1, Võsa U.1

1. Estonian Genome Centre, University of Tartu; 2. Institute for Molecular Medicine Finland (FIMM), University of Helsinki

Proteins are the primary functional units in human body serving as intermediaries between genetic variants and complex traits. Identifying variants acting as protein quantitative trait loci (pQTL) can help us gain novel insights about molecular mechanisms leading to those traits. While multiple studies have concentrated on SNP pQTLs and their overlap with corresponding expression QTLs (eQTLs), the aim of this project is to do the same for copy number variations (CNV).
In our analysis of 500 Estonian Biobank individuals we identified 12 unique genome-wide significant (P<1.1e-7) associations between CNVs and plasma protein levels. Seven of the associated CNVs were located within 500kbp from the transcription start site of the respective protein-coding gene (cis). The strongest cis-association was between SULT1A1 and a CNV overlapping with the gene encoding the same protein (P=1.1e-20). Additionally, we identified five trans-associations. Four of them involved an intergenic deletion in 3q12.1 region, which decreased the levels of ICAM2 (P=9.0e-30), FLT4 (P=1.5e-24), PDCD1LG2 (P=3.0e-15) and IL1R1 (P=6.4e-8) – all encoded from different chromosomes. Furthermore, we identified a trans-association between proinflammatory cytokine IL18 and a CNV overlapping with NAIP gene (P=2.6e-10), which is a sensor component for NLRC4 inflammasome.
Finally, we cross-referenced CNV pQTLs with CNV eQTLs that were identified from the same individuals. We found an overlap of three CNV-gene/protein pairs, possibly reflecting the distinct sources of gene expression in blood and protein expression in plasma. Altogether, our results emphasize the importance of structural variation on the genetic variability of plasma protein levels.

9:45 - 10:00 Eleonora Porcu: Determining the role of gene expression on human sexual dimorphism
Porcu E.1, Claringbould A.2, Lepik K.3, Bios Consortium, Franke L.2, Santoni F.1, Reymond A.1, Kutalik Z.1

1. University of Lausanne; 2. University of Groningen; 3. University of Tartu

Despite the prevalence of many diseases differs between women and men, only few published genome-wide association studies (GWAS) have been performed in a sex-stratified manner. Nowadays the molecular bases of sex associated differences in complex traits remain poorly understood. We hypothesized that given the marked causal involvement of gene expression levels in complex traits, sex-biased trait associations might be driven by sex-biased eQTLs. To challenge this assumption, we performed a genome-wide analysis of sex-specific whole blood RNA-seq eQTLs from 3,447 individuals. Amongst the pre-selected 9 million SNP-gene pairs (based on sex-combined association), we identified 18 genes with significantly (FDR 5%) different eQTL effects in men and women. PheWAS analyses for these 18 eGenes on >700 traits revealed that sex-biased eQTLs in CDIP1 and PSMD5 translate into sex-specific trait-associations for trunk predicted mass. However, such examples are sporadic and sex-specific expression regulation does not systematically propagate to high-level traits. Next, we applied a sex-specific transcriptome-wide Mendelian Randomization approach (TWMR) by combining sex-specific summary statistics for both eQTLs and complex traits and observed the presence of a compensatory effect downstream of gene expression, e.g. genes with stronger eQTLs in women have weaker woman-specific causal effects on complex traits. Finally, we show that the sex-specific GWAS associations are not driven by sex-biased eQTLs. Our findings suggest that sex-specific trait associations can rarely occur as a consequence of sex-specific gene expression regulation in whole blood, hence other omics data are necessary to better understand the genetic basis of sexual dimorphism.

10:00 - 10:15 Sebastian May-Wilson: Integrating transcriptomics and proteomics data into pathWAS allows prediction of pathway functionality
May-Wilson S.1, Macdonald-Dunlop E.1, Wilson J.1, Pirastu N.1

1. University of Edinburgh

Rationale: Understanding complex traits, multifactorial disease and their underlying biology is a primary goal of genetics. GWAS has seen much success in the discovery of loci related to complex traits, however, it can be challenging to pinpoint causal SNPs and genes. One method to tackle this has been the incorporation eQTL data with GWAS loci to improve the power of discovery for causal genes: TWAS. This method has the limitation of only examining genes in isolation and not in the context of relevant biological pathways.
Method: Here we present a method in which we stratify transcribed genes by biological pathways. We combine polygenic scores (PRS) based on GTEx eQTLs in the PredictDB dataset, for genes into one overall pathway score. By then exploiting a defined and measured protein end-point, it is possible to estimate the relative contribution of each gene to pathway functionality. The PRSpathway were trained in the isolated cohort ORCADES using elastic net penalised regression and tested in in the Vis cohort.
Results: In the testing set the method was successful in predicting the protein end point in six instances, significant after Bonferroni correction. These pathways are primarily those related to infectious disease response, including NOD-like receptor signalling, with a predictor for the MAPK signalling pathway significant in a linear regression model.
Conclusion: Gene pathway scoring offers the prospect of more powerful and holistic analysis of GWAS results, with the potential to investigate and discover causal pathways for complex traits.

10:15 - 10:30 Nele Taba: Causal relationships between dietary items and blood metabolites using Mendelian randomization
Taba N.1, Fischer K.1, Esko T.1, Metspalu A.1, Pirastu N.2

1. University of Tartu; 2. University of Edinburgh

Nutrition plays an important role in the development and progress of several diseases, which in turn creates high burden for individuals, society and health-care. In many cases the mechanism by which food acts on health is still unclear. One of the candidates for filling this gap are blood metabolites. Thus detecting causal relationships between dietary choices and biomarkers might give more insight into the mechanism by which food affects health.
To try to answer these questions we turned to Mendelian Randomization. The food exposure SNPs were selected amongst those coming from a recent food consumption GWAS performed in UK biobank (Pirastu et al., 2019). We included 25 individual and 14 principal component (PC) traits reflecting different levels of dietary patterns. To identify which SNPs directly influence the exposure of interest, we used additionally the Corrected to Raw ratio filtering (Pirastu et al., 2019). As outcomes we used the results from a GWAS conducted on 123 blood metabolites (Kettunen et al., 2016).
After correction for multiple testing, 48 food-metabolite pairs resulted significant. For example, Psychoactive PC1 (reflects higher consumption of both alcohol and coffee) increases cholesterol levels in low density lipoprotein and in intermediate density lipoprotein, which are both known to have adverse effects on cardiovascular health. Furthermore, we saw a significant effect of Psychoactive PC1 on Apolipoprotein-B, suggesting that part of the effect of alcohol on CVD may be mediated through this mechanism. This is indicating the potential of our approach in explaining food-health relationships.

10:30 - 11:00 Coffee break and Questions
~10:30 – 10:45 Separate rooms will be set up for Jim Wilson and each of the Session 1 speakers for additional questions and discussion
11:00 - 12:00 Session 2: Genetic mechanisms
Chairs: Jim Wilson, Nabila Bouatia-Naji
11:00 - 11:15 Andrew Whalen: Large scale livestock genomic data can be used to accurately detect recombination in hundreds of thousands of individuals
Whalen A.1, Johnsson M.1, Ros-Freixedes R.1, Chen C.-Y.2, Herring W.2, Hickey J.1

1. The Roslin Institute, University of Edinburgh; 2. Genus PLC

Estimating the frequency and position of recombination is important for understanding the evolutionary history of the genome and detecting adaptive loci. Livestock pedigrees, particularly those with large numbers of genotyped individuals, present a valuable resource for understanding the recombination landscape. However, scaling statistical methods to detect recombination, such as multi-locus iterative peeling, in pedigrees of these sizes is challenging.
In this project, we developed a fast and accurate approximation of multi-locus iterative peeling which can scale to pedigrees with hundreds of thousands of genotyped individuals. We then applied this method to estimate where recombination occurred in 150,000 pigs across 9 commercial breeding lines.
We found that when using this data, we could accurately estimate (i) the total genetic map length for each chromosome, (ii) the within-chromosome recombination landscape, and (iii) sex-differences in both the genetic map length and recombination landscape. We found large similarities between the number and locations of recombination between breeding lines. Recombination was mildly heritable (h2< 0.1), and a GWAS found three regions that were associated with an individual’s genome-wide recombination rate. This project represents the feasibility of using large-scale pedigree and genomic data to accurately estimate recombination, and provides an important advance in applying multi-locus iterative peeling to populations of hundreds of thousands of genotyped individuals.

11:15 - 11:30 Ferdinando Insalata: Survival of the densest explains the expansion of mitochondrial deletions in skeletal muscle fibres
Insalata F.1, Hoitzing H.1, Aryaman J.1, Jones N.1

1. Imperial College London; 2. University of Cambridge

The expansion of deleted mitochondrial DNA (mtDNA) molecules has been linked to ageing, particularly in skeletal muscle fibres. Despite three decades of research, the mechanism underlying this phenomenon has remained unclear. Previous accounts have assigned a selective advantage to the mitochondrial deletions, but, in fact, cells can selectively remove defective mtDNA. Justified by our microscopic understanding of mtDNA genetics, we introduce the mechanism of stochastic survival of the densest, adding noise and spatial structure to the well-known generalised Lotka-Volterra model. Our physically motivated model reproduces the expansion of deletions without free parameters based on the enhanced carrying capacity of mutants, notably even if they are assigned a selective disadvantage. We establish that the expansion takes place in a wave-like fashion and provide a functional form for the wavespeed that predicts that it drops with copy number, in agreement with experimental data. This functional form suggests the relevance of existing drugs for slowing the waves of mutants. Our model is approximated by a reaction-diffusion system whose reaction term stems from the combination of noise and increased mutant carrying capacity. In parallel, we show that a standard model based on a replicative advantage for mutants cannot reproduce the features of the expansion. We show how survival of the densest can also account for the evolution of altruism and conclude by proposing skeletal muscle ageing as a candidate exemplar for the role of noise and spatial structure in yielding novel evolutionary phenomena.

11:30 - 11:45 Stefan Böhringer: Closed form haplotype frequency estimation with application to the KIR loci
van der Burg L.1, Baldauf H.2, Schetelig J.2, de Wreede L.1, Böhringer S.1

1. Leiden University Medical Center; 2. DKMS

Haplotype analysis can complement SNP based analyses, especially in genetically complex regions such as immune loci (KIR). However, such analyses are hampered by the computational burden, including the exponential growth in the number of possible haplotypes with the number of loci and slow convergence of expectation-maximization (EM) algorithms that are popular for problems with missing information, here phase information. We address these problems by developing a closed form formula for haplotype frequency estimation and by developing efficient grouping strategies that limit the number of haplotypes.
The closed form formula is developed by solving the problem for the bi-allelic, bi-locus setting where haplotype frequencies can be related to the sample covariance between SNPs and allele frequencies. The solution is then generalized to any number of loci and alleles. We prove consistency of the estimator and show in simulations that the estimator is fully efficient. The presented algorithm allows for grouping of haplotypes either in a data dependent way (e.g. haplotype frequencies), pre-specified classes (e.g. phylogenetic trees) or a combination thereof.
To illustrate the approach, we analyze a KIR (Killer-cell Immunoglobulin-like Receptor) data set measured on a cohort of healthy donors who donated hematopoietic cells for patients with acute myeloid leukemia or myelodysplastic syndromes. We demonstrate the effect of different grouping strategies to reduce the number of predictors in down-stream regression analyses. Comparing with a standard EM, we show a speedup of more than 100 for more than 6 loci.
Our algorithms allow to perform genome-wide haplotype analyses on a routine basis.

11:45 - 12:00 Ranran Zhai: Extracellular vesicles with specific surface proteins are associated with waist visceral fat
Shen X.1, Zhai R.2, Yang Z.2, Li T.2, Ning Z.3, Pawitan Y.3, Wilson J.1, Wu D.4

1. University of Edinburgh; 2. Sun Yat-sen University; 3. Karolinska Institutet; 4. Vesicode AB

The field of genomics has spent tremendous efforts trying to discover the genetic regulation and mechanisms underlying complex traits and diseases. While many associations have been found, our understanding remains far from satisfactory, due to the high complexity of genetic architecture. Omics techniques provide an opportunity to look at the problem with better resolution, e.g. integration of genome-wide association results and single-cell omics information can sometimes give us a clue in which cells our diseases may develop.
As a proxy of cell-level biology, extracellular vesicles (EV) have become popular candidate biological complexes to study the source of cell regulation of complex diseases. EVs carry a lot of biological information and are largely enriched in human plasma. Here, we utilized a novel technology to detect the presence of 120 candidate proteins across millions of single EVs. By integration with GWAS summary statistics, we identified combinations of coding genes for the EV surface proteins being associated with obesity-related traits such as waist circumference. We subsequently verified such associations by quantifying these EVs with the particular protein profiles and testing their associations with body fat measured by DEXA scans in 96 individuals from Orkney. We found that the lower abundance of EVs that carry both ITGB6 and ITGB8 indicates larger waist circumference, as well as more waist visceral fat. Our findings provide the first evidence that EVs with specific surface proteins are associated with obesity, suggesting visceral fat can be tested using plasma and shedding light on future EV biomarker discovery.

12:00 - 13:00 Lunch break and Questions
~12:00 – 12:15 Separate rooms will be set up for each of the Session 2 speakers for additional questions and discussion
13:00 - 13:45 Flash talks parallel sessions
Session 1 - Multi-trait methods
  • 13:00 Paul Timmers: Multivariate genomic scan of human ageing traits reveals novel loci and identifies haem metabolism as a human ageing pathway
  • 13:05 Gulnara Svishcheva: A new method for combining of genetically correlated traits by maximizing of their shared heritability
  • 13:10 Oluyomi Adesoji: A Simulation Study to Evaluate Existing Pleiotropy Detection Methods
  • 13:15 Igor Pupko: Epigenome-wide association study of longitudinal changes in blood metabolite levels from young- to middle adulthood
  • 13:20 Ayse Ulgen: Relationship with Breast Cancer Subtypes and Potential Predictive Factors from a North Cyprus Cohort Study
  • 13:25 Questions
Session 2 - Sequencing and imputation
  • 13:00 Anthony Herzig: Evaluating short-read whole-genome sequencing accuracy through pseudo-replication
  • 13:05 Cathal Ormond: A Comparison of Two Software Tools for Disease-Gene Prioritization for Family-Based Sequencing Studies
  • 13:10 Claire Dandine-Roulland: Use of external controls for Whole Genome Sequencing data – Quality Control considerations
  • 13:15 Zhi Ming Xu: Designing a Tanzanian specific SNP array add-on to capture population-specific genetic variations and to improve genotype imputation
  • 13:20 Gopal Krishna Dila: Circular code and genetic code
  • 13:25 Julia Höglund: Improved power and precision with whole genome sequencing data in genome-wide association studies of inflammatory biomarkers
  • 13:30 Questions
Session 3 - Mendelian randomisation
  • 13:00 Zhijian Yang: Triangulation of analysis strategies links complex traits to specific tissues and cell types
  • 13:05 Richard Howey: Application of Bayesian Networks to Rheumatoid Arthritis and Intermediate Biological Marker Data
  • 13:10 Maria Carolina Borges: The causal relevance of fatty acids in the development of cardiovascular diseases
  • 13:15 Takiy eddine Berrandou: Assessment of a potential mediation effect of FMD on the genetic association with SCAD
  • 13:20 Torgny Karlsson: Visceral adiposity and its impact on type 2 diabetes development – the “rise-and-fall” of a causal effect
  • 13:25 Questions
13:45 - 14:00 Break
14:00 - 14:45 Keynote lecture - Aurélie Labbe: MDiNE: A model to estimate differential co-occurrence networks in microbiome studies
14:45 - 15:45 Session 3: Multi-trait methods
Chairs: Eleonora Porcu, Reedik Magi
14:45 - 15:00 Hélène Ruffieux: A global-local variational approach for detecting hotspots in molecular quantitative trait locus studies
Ruffieux H.1

1. MRC Biostatistics Unit, University of Cambridge

We tackle modelling and inference for variable selection in the context of molecular quantitative trait locus (QTL) studies. We focus on detecting hotspots, i.e., genetic variants which, by controlling remotely the levels of many gene products, may shape the architecture of the genome and initiate decisive functional mechanisms underlying disease endpoints.
Existing hierarchical regression approaches designed to model hotspots suffer from two limitations: their discrimination of hotspots is sensitive to the choice of top-level scale parameters for the propensity of predictors to be hotspots, and they do not scale to the dimensions of the predictor and response spaces encountered in QTL applications.
We address these shortcomings by introducing a flexible hierarchical regression framework that is tailored to the detection of hotspots and scalable for the current QTL studies. Our novel framework allows information-sharing across outcomes and variants, thereby enhancing the detection of weak effects, and directly controls the hotspot propensity via a dedicated top-level representation. In particular, it implements a fully Bayesian model for hotspots based on the horseshoe shrinkage prior: its global-local formulation shrinks noise globally and hence accommodates the highly sparse nature of genetic analyses, while being robust to individual signals, thus leaving the effects of hotspots unshrunk. Inference is carried out using a fast variational algorithm coupled with a novel simulated annealing procedure that allows efficient exploration of multimodal distributions.
We illustrate the merits of our approach in an expression QTL study of monocytes after immune stimulation.
Software is available at https://github.com/hruffieux/atlasqtl.

15:00 - 15:15 Marika Kaakinen: imputeSCOPA: a fast, random forest-based phenotype imputation tool for large-scale studies
Kaakinen M.1, Anasanti M.2, Jarvelin M.-R.2, Prokopenko I.1

1. University of Surrey; 2. Imperial College London

Missing data are ubiquitous but often ignored, leading to loss of power and biased parameter estimates. We investigated properties of missing data imputation methods within multi-phenotype genome-wide association studies (MP-GWAS), focussing on single and multiple-imputation (SI/MI) using Bayesian approach (MICE) and expectation-maximisation bootstrapping (EMB), k-nearest neighbour (kNN), left-censored imputation method (QRILC) and random forest (RF). We simulated genetic data for 5,000/50,000/500,000 individuals using Hapgen2, and highly (r=0.64) and moderately correlated (r=0.33) phenotypes (3/9/30/120) for them. We randomly selected common, low-frequency and rare variants to be significantly (P<5×10-8) associated with the simulated phenotypes. We considered several proportions of missing data (1/5/20/50%) under missing completely at random (MCAR), missing at random (MAR), and missing not at random (MNAR). We used the Root Mean Squared Error (RMSE) for evaluating the estimates from imputed vs. full data analysis. RF and MI-EMB had the least biased estimates under MCAR. RF also outperformed under MAR, whereas QRILC outperformed under MNAR. RF was applied to the Northern Finland Birth Cohorts (NFBC) 1966 and 1986 (N=4,955 and N=2,687, respectively) for the imputation of anthropometric and glycaemic measurements and 149 serum metabolite levels. MP-GWAS of 31 amino acids showed a novel association at ADAMTS after imputation with RF (P=2.61×10−11 vs. P=5.68×10−7 in complete case analysis) and improved power at FCGR3B (P=1.86×10-9 vs. P=1.72×10−8). We implemented RF into a user-friendly and computationally efficient imputeSCOPA software which is eight times faster than any currently available phenotype imputation software and is applicable to large-scale datasets such as the UK Biobank.

15:15 - 15:30 Ting Li: An atlas of pleiotropy for the human genome
Li T.1, Ning Z.2, Yang Z.1, Zhai R.1, Shen X.3

1. Sun Yat-sen University; 2. Karolinska Institutet; 3. University of Edinburgh

Pleiotropy describes shared genetic basis across complex traits. As a ubiquitous phenomenon across the genome, the concept of pleiotropy has been studied for over 100 years. Understanding the magnitude of pleiotropy of different genomic loci, genes, and genetic variants is of fundamental importance for dissecting the complexity of genetic architecture. Nevertheless, in order to measure the level of pleiotropy for a given gene or genetic variant, a statistical significance threshold has to be applied for the quantification of the number of regulated phenotypes. On the other hand, widespread phenotypic correlations need to be properly accounted for. We introduce a novel threshold-free modeling technique to estimate the “pleiotropicity” of a gene or genetic variant. We show that for different pleiotropic senarios, the pleiotropicity parameter can be unbiasedly identified even if the signal-to-noise ratio of the genetic effects are too small. We perform genome-wide evaluation of pleiotropicity and identify top pleiotropic loci such as MHC of the human genome. We also investigate the connection between linkage disequilibrium and pleiotropicity and provide useful functional insights. Integration of the estimated pleiotropicity and established GWAS summary statistics allow us to reveal more about the genetic basis of complex traits and diseases; For instance, our results show that the more complex genetic architecture of the phenotypes, the more their heritabilites are enriched in highly pleiotropic regions of the human genome.

15:30 - 15:45 Jennifer Asimit: A Flexible and shared information Bayesian joint fine-mapping approach for multiple quantitative traits
Hernández N.1, Newcombe P.1, Sandhu M.2, Wallace C.1,2, Asimit J.1

1. MRC Biostatistics Unit, University of Cambridge; 2. Department of Medicine, University of Cambridge

Hundreds of genetic variants have been identified as associated with a spectrum of diseases and related traits, but pinpointing likely causal variants (fine-mapping) has been complicated by extended linkage disequilibrium (LD) between genetic variants, as well as finite sample sizes. Fine-mapping inaccuracies often occur when there are two or more distinct causal variants that are both correlated with a single non-causal variant. This motivates the development of joint fine-mapping that leverages information between multiple traits with shared aetiology in a Bayesian framework. Prior model probabilities for each trait can be formulated to favour combinations of models which share causal variants to enable borrowing information between outcomes. Using summary statistics, flashfm (FLexible And SHared information Fine-Mapping) fine-maps association signals for multiple quantitative traits, allowing for missing trait measurements and related individuals. Our method does not assume that the traits must share any or all causal variants. Simulation studies demonstrate that this joint approach has greater accuracy than single outcome analyses when shared causal variants are present, yet no precision loss if there is no sharing. We jointly fine-map association signals for 34 cardiometabolic traits in a Uganda cohort that includes related individuals. Our proposed approach is computationally efficient and exploits relatedness of traits in current sample sizes to increase fine-mapping resolution, at lower cost and more feasible, than collecting larger samples.

15:45 - 16:15 Coffee break and Questions
~15:45 – 16:00 Separate rooms will be set up for Aurélie Labbe and each of the Session 3 speakers for additional questions and discussion
16:15 - 16:45 Invited speaker - Jonathan Marchini: Efficient whole genome regression of binary and quantitative phenotypes
16:45 - 17:45 Session 4: Various statistical approaches
Chairs: Stefan Böhringer, Jennifer Asimit
16:45 - 17:00 Yakov Tsepilov: Loci and genes involved in chronic musculoskeletal pain identified via analysis of genetically orthogonal pain phenotypes
Tsepilov Y.1, Freidin M.2, Shadrina A.1, Elgaeva E.1, Sharapov S.1, van Zunder J.3, Karssen L.4, Suri P.5, Williams F.2, Aulchenko Y.1

1. Novosibirsk State University; 2. King’s College London; 3. Maastricht University Medical Centre; 4. PolyOmica, The Netherlands; 5. University of Washington, Seattle

Chronic musculoskeletal pain has a negative impact on all aspects of human life. Genetic studies of pain are complicated by the high complexity and heterogeneity of pain phenotypes. In this research, we aimed to reduce phenotype heterogeneity and reveal genes and pathways shared by chronic musculoskeletal pain at four locations: back, neck/shoulder, hip, and knee. Our study was based on the results of genome-wide association studies performed using UKBiobank data with a total sample size of 456,000 individuals. We applied principal component analysis based on the matrix of genetic covariances between the studied pain traits and constructed four genetically independent phenotypes (GIPs). The leading GIP (GIP1) explained the largest proportion of the genetic variance (78.4%). We identified and replicated five loci associated with GIP1; and one locus associated with GIP2. The genes confidently prioritized for the GIP1-associated loci were SLC39A8, ECM1, and FOXP2. For the remaining two GIP1-associated loci, we proposed several candidates but were unable to prioritize any of them convincingly. The most likely causal gene in the locus associated with GIP2 was GDF5. For GIP1, gene set/tissue/cell type enrichment analyses identified multiple terms related to the nervous system. Genetic correlation analysis revealed a genetic overlap between GIP1 and osteoarthritis as well as a set of anthropometric, sociodemographic and psychiatric/personality traits. We suggest that GIP1 represents a biopsychological component of chronic musculoskeletal pain, related to physiological and psychological aspects and possibly reflecting pain perception and processing. The research has been conducted using the UKBiobank (project #18219).

17:00 - 17:15 Kaur Alasoo: Matrix factorisation reveals cell-specific trans-acting regulatory variants controlling modules of co-expressed genes
Kolberg L.1, Alasoo K.1

1. University of Tartu

A major challenge in human genetics is translating non-coding GWAS loci to mechanistic understanding about the disease causing processes. Local gene expression quantitative trait loci (cis-eQTLs) regularly implicate multiple putative target genes whose disease relevance and function is often poorly understood. In contrast, genetic variants that are associated with the expression of multiple target genes in trans, have the potential to directly identify the cellular processes affected by disease variants. However, trans-eQTLs are difficult to detect due to small sample sizes of current eQTL datasets and large number genes tested. Here, we have jointly analysed five eQTL dataset profiling gene expression in naive and stimulated B-cells, T-cells, monocytes, neutrophils and platelets from up to 710 individuals. To improve interpretability of trans-eQTLs and reduce multiple testing burden, we used five matrix factorisation techniques to infer gene co-expression modules from expression data. We find that trans-eQTLs reguating co-expression modules are highly cell type specific and are often detected by a single matrix factorisation approach. These include established trans-eQTLs, such as the platelet-specific ARHGEF3 locus associated with mean platelet volume and monocyte-specific IFNB1 locus associated with activation of genes downstream of the type 1 interferon signalling pathway upon LPS stimulation. Co-expression modules under cell-type specific genetic control also exhibit higher variance in the cell types where the associations are detected, suggesting that our results are not simply an artefact of limited power. Thus the contexts in which trans-eQTL are active are likely to be missed when studying bulk tissues such as whole blood.

17:15 - 17:30 Xia Shen: High-definition likelihood inference of genetic correlations across human complex traits
Ning Z.1, Pawitan Y.1, Shen X.2

1. Karolinska Institutet; 2. University of Edinburgh

Genetic correlation is a central parameter for understanding the shared genetic architecture between complex traits and diseases. Making use of summary-level genome-wide association study (GWAS) data resources, LD Score regression (LDSC) was developed for unbiased estimation of genetic correlation. Though easy to use, LDSC only uses a small part of all the linkage disequilibrium (LD) information in the modeling of summary association statistics. In contrast, by fully accounting for LD information across the human genome, we develop a High-Definition Likelihood (HDL) method to improve the precision in genetic correlation estimation. Compared to LDSC, HDL reduces the variance of a genetic correlation estimate by about 60%, which is equivalent to a 2.5-fold increase in sample size. We implement HDL and LDSC to estimate 435 genetic correlations amongst 30 behavioral and disease-related phenotypes measured in UK Biobank. In addition to 154 genetic correlations significant for both methods, HDL identifies another 57 significant genetic correlations compared to only another 2 by LDSC. In summary, HDL brings more power to genome-wide analyses and can better reveal the underlying connections across human complex traits.

17:30 - 17:45 Simone Rubinacci: Fast imputation of low coverage sequencing data from very large reference panels
Rubinacci S.1, Ribeiro D.1, Hofmeister R.1, Delaneau O.1

1. University of Lausanne

Introduction:
Low coverage whole-genome sequencing (0.5x-1x) followed by imputation has been shown to recapitulate the same signals and discover new variants compared to imputation of SNP arrays (Pasaniuc, 2012; Gilly, 2019). However, imputation methods are computationally expensive and large reference panels cannot be used due to model constraints.
Material and methods:
We describe Low Coverage Caller (LCC), a method for genotype imputation of low coverage sequencing datasets. The model, based on a hidden Markov model (Li and Stephens, 2003), has two key features. First, a linear-time sampling algorithm for haplotype configurations. Second, it uses a procedure to reduce the state space by selecting a subset of highly confident haplotypes. This allows LCC to be efficient while leveraging information from very large reference panels of haplotypes.
Results:
We use high-coverage data from the 1000 Genome Project and run LCC and Beagle4.1 on down-sampled coverages in the range 0.1x-8.0x. We also perform imputation on 35 different SNP array models using Beagle5.1. For all the experiments, we use the HRC as a reference panel. We show that our method is more accurate and orders of magnitudes faster than other low-coverage sequencing imputation methods. We also show that imputation from 0.5x and 0.8x outperforms imputation of Illumina Global Screening Array and Omni2.5, respectively. This is particularly true at extremely rare variants, where there is an accuracy boost of ~20%.
Conclusions:
LCC has a limited computational overhead and outperforms standard imputation from SNP arrays, allowing large-scale association studies to be based on low coverage sequencing.

17:45 - 18:00 Additional Questions
~17:45 – 18:00 Separate rooms will be set up for Jonathan Marchini and each of the Session 4 speakers for additional questions and discussion

17 April

9:00 - 9:30 Invited lecture - Jack Bowden: Design and analysis strategies to account for weak instrument bias, pleiotropy and winner’s curse in Mendelian randomization investigations
9:30 - 10:30 Session 5: Mendelian randomisation (I)
Chairs: Krista Fischer, Xia Shen
9:30 - 9:45 Eleanor Sanderson: Testing and Correcting for Weak and Pleiotropic Instruments in Two-Sample Multivariable Mendelian Randomisation
Sanderson E.1, Spiller W.1, Bowden J.2

1. University of Bristol; 2. University of Exeter

Multivariable Mendelian Randomisation (MVMR) is a form of instrumental variable analysis which estimates the direct effect of multiple exposures on an outcome using genetic variants as instruments. Mendelian Randomisation (MR) and MVMR are frequently conducted using two-sample summary data where the association of the genetic variants with the exposure and outcome are obtained from separate samples. If the genetic variants are only weakly associated with the exposures either individually or conditionally, given the other exposures in the model, then standard inverse variance weighting will yield biased estimates for the effect of each exposure, i.e. there will be ‘weak instrument bias’. We develop a two-sample conditional F-statistic to test whether the genetic variants strongly predict each exposure conditional on the other exposures included in a MVMR model. We show that this test is equivalent to the individual level data conditional F-statistic, indicating that the conventional rule-of-thumb of F >10 can be used to test for weak instruments. We then demonstrate how reliable estimates of the causal effect of each exposure on the outcome can be obtained in the presence of weak instruments and pleiotropy, through minimisation of an appropriate heterogeneity Q-statistic. Furthermore, this same Q-statistic yields an exact test for heterogeneity due to pleiotropy. We illustrate our methods and how both the Q and F statistics can be used to guide the choice of genetic instruments and exposure variables in an MVMR analysis with an application to estimate the effect of blood lipid fractions on age related macular degeneration.

9:45 - 10:00 Tom Richardson: Leveraging naturally occurring genetic variation to disentangle the effects of multiple risk factors on disease
Richardson T.1, Sanderson E.1, Tilling K.1, Holmes M.2, Davey Smith G.1

1. MRC Integrative Epidemiology Unit; 2. MRC Population Health Research Unit

Differentiating between causal and correlated risk factors is of critical importance for disease prevention. Genetic variation inherited at birth is typically robust to confounding and reverse causation and can be harnessed to separate the effects of multiple risk factors on disease using an approach called multivariable Mendelian randomization.
We firstly demonstate this by investigating the effects of various lipid traits on coronary heart disease (CHD). For example, when analysed independently HDL cholesterol appears to be protective towards CHD (OR=0.80, 95% CI=0.75-0.86), but provided weak evidence of a direct effect when adjusting for other lipids (OR=0.91, 95% CI=0.74-1.12). In contrast, our results suggest that apolipoprotein B is the predominant trait which accounts for the aetiological relationship between lipids and CHD (OR=1.68, 95% CI=1.54-1.84).
The effects of childhood and adult obesity on disease are likewise extremely challenging to disentangle in an observational setting. However, separating them usingg genetic variation provides evidence that childhood adiposity has an indirect effect on CHD and type 2 diabetes along the causal pathway via adult adiposity (OR=1.49, 95% CI=1.33-1.68 and OR=2.32, 95% CI=1.76-3.05 respectively). This suggests that the detrimental impact of childhood obesity on these outcomes is likely attributed to individuals remaining overweight into adulthood. Conversely, childhood adiposity appeared to have a more direct effect on breast cancer risk (OR=0.59, 95% CI=0.50-0.71), which we postulate is due to its influence on earlier timing of puberty and sex hormone levels.
The current wealth of phenotypically rich datasets provides an unprecedented opportunity to dissect disease pathways using Mendelian randomization.

10:00 - 10:15 Nicola Pirastu: Using genetics to disentangling the complex relationship between food choices and health status
Pirastu N.1, McDonnell C.1, Grzeszkowiak E.1, Mounier N.2, Imamura F.3, Day F.3, Zheng J.4, Taba N.5, Esko T.5, Joshi P.1 et al.

1. Usher Institute, University of Edinburgh; 2. Center for Primary Care and Public Health, University of Lausanne; 3. MRC Epidemiology Unit, Institute of Metabolic Science, Cambridge Biomedical Campus, University of Cambridge School of Clinical Medicine; 4. MRC Integrative Epidemiology Unit, Bristol Medical School; 5. Estonian Genome Center, Institute of Genomics, University of Tartu

Food choices are one of the most important factors influencing health, thus understanding their genetic determinants may shed light on important biological mechanisms underlying disease. However, nutritional epidemiology is plagued by biases and confounding, where for example, health status or education affect both behaviour and reporting of consumption. We thus applied a novel post-GWAS Mendelian Randomisation (MR)-based correction method to measure and adjust for the effect of health-related traits on food choice/reporting. We show that these risk factors and diseases account for up to 42% of the genetic variance of reported food intake in UK Biobank. Bias-corrected GWAS on 29 food consumption traits in up to 445,779 individuals identifies numerous robustly associated loci. Simulations and results from genes with known function, show that ratio between the corrected and raw results (CRR) can be used to distinguish loci directly influencing food choice, allowing selection of more valid instrumental variables for MR and better gene prioritisation for functional follow up.
We show that the use of the direct-effect only instruments can substantially change the results of MR analysis. The corrected results show numerous causal food health relationships shedding a new light on nutritional epidemiology. Our new framework is a powerful tool to circumvent bias and confounding in nutritional epidemiology and can be used in all those cases where the exposure and the outcome are mutually causal. Moreover, it is helpful to identify those genes which are influencing directly the outcome of interest guiding the prioritization for functional studies.

10:15 - 10:30 Liza Darrous: Simultaneous estimation of bi-directional causal effects and heritable confounding from GWAS summary statistics
Darrous L.1, Mounier N.1, Kutalik Z.1

1. University Center for Primary Care and Public Health, University of Lausanne

Introduction: Mendelian Randomisation (MR), an increasingly popular method that estimates the causal effects of risk factors on complex human traits, has seen several extensions that relax its basic assumptions. However, most of these extensions suffer from two major limitations; under-exploitation of genome-wide markers, and sensitivity to the presence of a heritable confounder of the exposure-outcome relationship. To overcome these limitations, we propose a Latent Heritable Confounder MR (LHC-MR) method applicable to association summary statistics, which simultaneously estimates bi-directional causal effects, direct heritability, and confounder effects while accounting for sample overlap.
Results: We demonstrate that LHC-MR outperforms several existing MR methods in terms of bias and variance for a wide range of simulation settings and apply it to summary statistics of 13 complex traits. Besides several concordant results, LHC-MR unravelled new mechanisms (how being diagnosed for certain diseases might lead to improved lifestyle) and revealed potential false positive findings of standard MR methods (apparent causal effect of body mass index on educational attainment may be driven by a strong ignored confounder). Phenome-wide search to identify LHC-implied heritable confounders showed remarkable agreement between the LHC-estimated causal effects of the latent confounder and those for the potentially identified ones. Finally, LHC-MR naturally decomposes genetic correlation to causal effect-driven and confounder-driven contributions, demonstrating that the genetic correlation between systolic blood pressure and diabetes is predominantly confounder-driven.
Conclusion: LHC-MR, a structural equation mixed effect model, is a novel method that tackles both causal inference with increased precision and the genetic architecture of complex disease.

10:30 - 11:00 Coffee break and Questions
~10:30 – 10:45 Separate rooms will be set up for Jack Bowden and each of the Session 5 speakers for additional questions and discussion
11:00 - 12:00 Session 6: Mendelian randomisation (II)
Chairs: Jack Bowden, Matthew Robinson
11:00 - 11:15 Chin Yang Shapland: Profile-likelihood Bayesian model averaging for two-sample summary data Mendelian randomization in the presence of horizontal pleiotropy
Shapland C.1, Zhao Q.2, Bowden J.3

1. University of Bristol; 2. University of Cambridge; 3. University of Exeter

Two-sample summary data Mendelian randomisation is a popular method for assessing causality in epidemiology, by using multiple genetic variants as instrumental variables. If genetic variants exert pleiotropic effects on the outcome not through the exposure of interest, this leads to heterogeneous and potentially biased estimates of causal effect. It is possible in theory to detect and remove outlying variants, but this can lead to an under-estimation of the standard error and conflate weak instrument bias with pleiotropy.
Rather than detecting and removing outlying estimates or attempting to search all possible subsets, we investigate the use of Bayesian model averaging to preferentially search the space of models with the highest posterior likelihood. We develop a bespoke Metropolis-Hasting algorithm to perform the search and use the profile likelihood of Zhao et al to define a posterior distribution that efficiently accounts for pleiotropic and weak instrument bias. In keeping with the Bayesian framework, our method also allows prior knowledge on the validity of each variant to be seamlessly included. We demonstrate how our general modelling approach can be extended from a standard one-parameter causal model to a two-parameter model, which allows a large proportion of SNPs to violate the Instrument Strength Independent of Direct Effect (InSIDE) assumption.
We use Monte Carlo simulations and real data examples to illustrate our approach and compare it to several related approaches, highlighting its relative strengths and weaknesses in outlier detection and causal estimation.

11:15 - 11:30 Jonathan Sulc: Heterogeneity in obesity and its consequences on health
Sulc J.1, Sonrel A.1,2, Mounier N.1, Draganski B.3, Kutalik Z.1

1. University of Lausanne; 2. University of Zürich; 3. Lausanne University Hospital, Switzerland

Obesity-associated SNPs have mostly been tested for only one trait in isolation and their joint impact on fat/lean mass accumulation/distribution and downstream effects on health and quality of life remain poorly understood.
We applied principal component analysis on the effect estimates of SNPs on fourteen measures of body morphology from the UK Biobank to identify the genetic axes of variation giving rise to differences in body shape and composition. This provided three independent components affecting overall body size, body composition, and body fat distribution, respectively. Our method developed for composite trait Mendelian randomization revealed that these components have both shared and specific effects on health outcomes and quality of life. Of particular interest is the component shifting subcutaneous to visceral fat, which was increased the risk of many obesity-related diseases (such as diabetes, hypertension, hypercholesterolemia, and coronary artery disease) despite being neutral in terms of body mass index and total body fat percentage. A shift in mass from lean to adipose prominently impacted lifestyle, increasing alcohol consumption and smoking. Sex-stratified analyses showed that increased adiposity leads to a greater increased risk of heart disease in men than in women. Enrichment analyses suggest that brain and nervous tissues contribute most to body size and composition, whereas genes highly expressed in adipose tissue and during development are more likely to affect body fat distribution. These genetic components provide a basis to better understand the mechanisms underlying inter-individual differences in body fat accumulation and distribution, as well as the consequences they have on health.

11:30 - 11:45 Linda Zollner: Mapuche ancestry and gallbladder cancer risk: Causality or endogeneity bias?
Zollner L.1, Lorenzo Bermejo J.1

1. Institute of Medical Biometry and Informatics, University of Heidelberg

The incidence of gallbladder cancer (GBC) in Chile is the highest in the world and has been associated with the individual proportion of Native American ancestry, in particular with Mapuche ancestry –Mapuche Chileans live in the south of the country. Since association does not imply causation, individuals with large proportions of Mapuche ancestry could show specific risk exposures and worse access to the health system. Therefore, we took advantage of ancestry informative markers (AIMs) and applied Mendelian randomisation (MR) to test causality between Mapuche ancestry and GBC risk.
By estimating the informativeness for assignment measure (IN), we selected IN-AIMs with distinct allele frequencies in Mapuche and other original Chilean populations, namely Europeans, Africans and Native Americans from northern Chile (Aymara and Quechua). The selected IN-AIMs were utilized as instrumental variables for the individual proportion of Mapuche ancestry in two-sample-MR (sample 1: 1,800 Chileans from the whole country, sample 2: 255 Chilean case-control pairs). We found evidence for a causal effect of Mapuche ancestry on GBC risk: inverse-variance-weighted OR per 1% increase in Mapuche proportion 1.02, 95% CI (1.01-1.03), Pval = 0.0001. Radial-MR was applied to identify and subsequently exclude outlying instruments and we checked different combinations of genetic principal components to examine the potential effects of population stratification unrelated to Mapuche ancestry. The results of these sensitivity analyses confirmed the causal association between Mapuche ancestry and GBC risk. We are currently applying two-step-MR to investigate the mediating effect of BMI on this causal relationship.

11:45 - 12:00 Linda Repetto: A genomic meta-analysis of 184 neuroproteins and their implied causality on psychiatric disorders
Repetto L.1, Navarro P.1, Wilson J.1, Shen X.2

1. The University of Edinburgh; 2. Sun Yat-sen University

Protein quantitative trait loci (pQTL) are essential to study the molecular basis of complex diseases, as they provide insights on the role of genetic variation in determining protein levels that modulate an individual’s metabolic state. We quantified 184 proteins involved in neurological processes using the Olink Neurology and Neuro-exploratory panels in 1070 individuals of the Orkney Complex Disease Study (ORCADES) with genotypic information. For each protein, we performed a GWAS looking for loci associated with protein levels, both in the proximity of the protein-coding gene (in cis) and distantly (in trans). We discovered 48cis- and 59trans- pQTL for 95 neuroproteins. Enrichment analyses on the proteins levels of expression show that the neuroproteins with cis-pQTL display enriched expression in the brain. We then investigated the potential causal effect of protein level variation on psychiatric disorders, including major depression, schizophrenia, and bipolar disorder for the proteins with cis-pQTL. Using our pQTL study and summary-level data from a large 2018 GWAS for major depressive disorder (Wray et al., 2018), we discovered 12 additional loci associated with the disease. Two of these 12 loci were also discovered in a later 2019 GWAS meta-analysis for major depression (Howard et al., 2019), strengthening our novel results, that highlighted loci through neurological pQTL with potentially relevant roles in psychiatric disorders. With these promising results, we set up a SCALLOP meta-analysis of levels of the same proteins in seven cohorts with a maximum sample size of 12,000 individuals to reveal more biology underlying psychiatric diseases.

12:00 - 13:00 Lunch break and Questions
~12:00 – 12:15 Separate rooms will be set up for each of the Session 6 speakers for additional questions and discussion
13:00 - 13:45 Flash talks parallel sessions II
Session 4 - Omics analysis
  • 13:00 Arianna Landini: GWAS of transferrin N-glycans: one step closer to understanding the genetics of protein glycosylation
  • 13:05 Grace Png: Exploring the Genetic Architecture of the Human Neurological Proteome using Whole Genome Sequencing
  • 13:10 Kaur Alasoo: eQTL Catalogue: a compendium of uniformly processed human gene expression and splicing QTLs
  • 13:15 Georgia Katsoula: A comprehensive transcriptional map of knee osteoarthritis
  • 13:20 Carl Beuchel: Large-scale identification of links between the human blood metabolome and transcriptome
  • 13:25 Questions
Session 5 - Genetic associations
  • 13:00 Olivia Leavy: Genome-wide SNP-by-rs35705950 interaction analysis of susceptibility to idiopathic pulmonary fibrosis
  • 13:05 Sarah Djebali: Integrating three-dimensional organization of the genome into association studies : single SNP, SNP set and SNP-SNP interaction tests
  • 13:10 Saswati Saha: Two-stage GWAS approach for detecting the genetic variants and their epistatic interactions responsible for cardiac aging in Drosophila
  • 13:15 Anastassia Kolde: Martingale residual based approach for Cox modeling from high-dimensional data
  • 13:20 Mattia Tomasoni: The Genetics of Retinal Microvascular Features
  • 13:25 Questions
Session 6 - Genomic prediction
  • 13:00 Krista Fischer: Prediction and interpretation of the disease and mortality risks in biobank cohorts
  • 13:05 Sofia Khan: Predicting risk of neuropsychiatric disorders using genotypes and neural networks
  • 13:10 Ciarán Kelly: Exploring and Improving Deep Learning Methods for Genomic Prediction
  • 13:15 Hervé Perdry: Minimum variance unbiased estimator of genomic relatedness matrices
  • 13:20 Harmen Draisma: Simulation study of multi-predictor epigenome-wide association method performance
  • 13:25 Questions
14:00 - 14:30 Invited speaker - Tim Frayling: Genetic epidemiology for everyone’s medicine
14:30 - 15:15 Session 7: Methods leveraging genome annotations
Chairs: Peter Joshi, Sven Bergmann
14:30 - 14:45 Maria-Alexandra Katsara: Incorporating and validating the impact of priors on DNA prediction of external visible characteristics
Katsara M.-A.1, Nothnagel M.1, Branicki W.2, Pospiech E.2, Kayser M.3, Walsh S.4, Hysi P.5

1. University of Cologne – Cologne Center for Genomics; 2. Jagellonian University – Malopolska Centre of Biotechnology; 3. Erasmus MC University Medical Centre Rotterdam – Department of Genetic Identification; 4. Indiana University Purdue University – Department of Biology; 5. St Thomas Hospital – Department of Twin Research & Genetic Epidemiology

Predicting externally visible characteristics (EVCs) from genetic data, often referred to as Forensic DNA Phenotyping, has raised major interest in forensic genetics over the last years. Several studies have recently developed and forensically validated predictive tools comprising of labtools and statistical tools, for traits such as eye, hair and skin color. Using prior information, e.g. obtained from the trait prevalence across geographic regions or populations, may potentially improve prediction accuracy, but has not been investigated thus far. Here, we performed a systematic assessment of the impact of incorporating prior values on the prediction for a number of EVCs, including eye, hair and skin color, hair structure and freckles, with respect to commonly used performance measures. We also compared the performance of the prior-incorporating model to that of the prior-free model. We show that prediction is affected to a different degree for the different EVCs but also single EVC categories. We further show that, although priors have the potential to increase prediction performance, misspecification of prior values can lead to dramatic losses in overall accuracy. Our results emphasize the importance of a precise specification of priors in order to achieve valid and accurate results. This research has received funding from the EU Horizon 2020 VISAGE project (no. 740580).

14:45 - 15:00 Marion Patxot: Conditional estimation of the contribution of genomic annotations to common complex diseases
Patxot M.1, Kousathanas A.1, Ojavee S.1, Trejo Banos D.1, Robinson M. R.1

1. University of Lausanne

Despite efforts to estimate the contribution of genomic regions to common complex trait variation, the distribution of effect sizes across functional annotations remains unknown. Previous studies estimate enrichment across annotations in a series of follow-up analyses rather than utilizing functional information to assess enrichment conditional on the rest of the genome. Here, we introduce a scalable Bayesian model that utilizes genomic annotations and individual-level data to jointly estimate marker effects while accounting for LD. SNP effects for each annotation group are modelled as a series of normal distributions providing inference on the genetic architecture of genomic enrichment. Effect size distributions can be compared across annotations and the genetic variance for each group simultaneously quantified. In simulations we demonstrate our method gives unbiased estimates of annotation enrichment under extreme LD scenarios. We apply the model to height, body mass index (BMI), type-2-diabetes (T2D) and coronary artery disease (CAD) in the UK Biobank. We find a stark contrast between BMI and height: BMI being highly polygenic with tiny effect sizes located 10 to 1000 kb from exons whereas height shows a variety of effect sizes in coding and non-coding regions. T2D and CAD are highly polygenic and we also observe that promoter regions contribute less than 5% to the phenotypic variance of all four traits. This method provides a full quantification of genetic architectures underlying complex traits and determines which genomic regions are influential, improving disease risk prediction.

15:00 - 15:15 Ozvan Bocher: Finding the best testing units in rare variant association tests: an approach using variant pathogenicity scores
Bocher O.1, Ludwig T.2, Tournier-Lasserve E.3, Marenne G.4, Génin E.4

1. Université de Bretagne Occidentale, UMR1078; 2. CHU Brest, UMR1078; 3. UMR‐S1161; 4. UMR1078

Different methods have been proposed to test for association between disease and rare variants when sequence data are available on cases and controls. These methods require first the definition of the testing units and then the selection of eligible variants into these testing units. When focus is on the exonic parts of the genome, typical testing units are the different genes delimited by their beginning and ending positions on the genome. However, this is not always the most optimal way of defining testing units as there might be some different parts within a gene with different functionality. As an alternative, sliding window approaches have been developed that do not require to pre-select the testing units based on functional annotations. These methods however are computationally intensive with results that could differ depending on the choice of window sizes. Here, we propose to group rare variants into genomic units using pathogenicity scores of variants observed in GnomAD. We show on an exome-wide analysis of Moyamoya disease that using these regions enables to find more significant regions than using the gene definition by pinpointing the specific region of the gene with the highest association signal. In this example, our method outperformed WGScan, a recently proposed sliding window approach. These encouraging results obtained on the coding regions of the genome suggest that a similar approach could also be used in the non-coding regions of the genome where we dramatically lack functional annotations to choose rare variant testing units.

15:15 - 15:45 Coffee break and Questions
~15:15 – 15:30 Separate rooms will be set up for Tim Frayling and each of the Session 7 speakers for additional questions and discussion
15:45 - 16:30 Session 8: Association methods
Chairs: Zoltán Kutalik, Nicola Piratsu
15:45 - 16:00 Sven Erik Ojavee: Discovery, estimation and prediction analysis using a Bayesian survival model for complex traits
Ojavee S.1, Trejo-Banos D.1, Patxot M.1, Fischer K.2, Kousathanas A.1, Robinson M.1

1. University of Lausanne; 2. University of Tartu

Time-to-event analysis using genotypic data may enable a better understanding of the mechanisms and underlying genetic architecture behind the onset and development of common complex disease. Typically, time-to-event data are not normally distributed, are right censored and in genomic studies, the number of covariates strongly exceeds the number of observations. Previous work has not adequately addressed these issues, potentially leading to underpowered models and biased effect size estimates.
Here, we propose a hierarchical Bayesian model which assumes that time-to-event has a Weibull distribution, handles sparsity with spike and slab variable selection and variance partitioning, considers right censoring, and yields estimates of the proportion of genetic variance explained (SNP heritability). Through a series of computational advances, the model can handle datasets of hundreds of thousands of people and millions of genetic markers by combining synchronous parallel Gibbs sampling and adaptive rejection sampling.
In simulations, our approach outperformed previous approaches and provided better genetic effect estimation. We then applied our model to UK Biobank data for a wide-range of traits, including time-to-death, time to cardiovascular disease and time-to-menopause. We achieved 68% higher SNP heritability estimates compared to previous results for both time-to-menopause and time-to-menarche (SNP heritabilities of 0.26 and 0.42 respectively).
Our general framework enables more accurate discovery and estimation of the survival-related genomic marker effects providing novel insight into the genetic architecture of any very large-scale time-to-event data, and unprecedented statistical power to predict risk for medically important traits.

16:00 - 16:15 Märt Möls: Using offspring genotype instead of true genotype for survival analysis
Möls M.1, Mändul M.1, Fischer K.1

1. University of Tartu

As the follow-up time for biopanks is still relatively short, the number of observed events can be quite small, despite of the impressive size of the overall cohort. Therefore, it has been proposed to use parental survival times as outcomes in such analyses, whereas offspring genotypes are used as covariates. However, estimating the genotype effect using offspring genotypes in Cox proportional hazards model will lead to biased estimates. This simple approach is also theoretically questionable as the proportional hazard model will not be a valid model for offspring genotypes if it holds for parental genotypes. We will demonstrate how one can modify the Cox proportional hazards model to be compatible with offspring genotypes. The modified model will also provide unbiased estimates for genotype effects (comparable to the results one gets if one would use the true genotypes in survival analysis).
The methodology will be illustrated using the Estonian Biobank data.

16:15 - 16:30 Peter Joshi: Sibling difference analyses reveal polygenic risk score confounding
Joshi P.1, Timmer P.1, Clark D.1, Wilson J.1

1. University of Edinburgh

Motivation: Polygenic risk scores are increasingly discussed for predicting disease risk and phenotypes(1), but confounding may still be present. Analysis of the effects of polygenic risk scores within sibling pairs (PRSsib) cannot plausibly be confounded by any postulated confounder we are aware of. At the same time, commercial ventures have started looking into in vitro genetic embryo selection amongst sibling embryos (PIGS (3)). PRSsib analysis can thus disentangle causality, confounding and genetic nurture, and reveal the magnitude of plausible effects of PIGS.
Method: Published independent summary statistics were used to analyse the effect of PRSsib in siblings from UK Biobank, and to see the possible effect of PIGS if the higher PRS of two sibs had been selected for.
Findings: We find the relative effect(ratio/se) of PRSsib as against PRS on the selected outcome is 95%/5% for height and 103%/4% for LDL , but 47%/4% for educational attainment (EA). In this sample, PIGS using these PRS and two siblings would have increased height/EA by {mean(se)} 0.8cm(0.1)/0.0 years(.02).
Conclusion: Embyronic screening for complex, especially social, traits appears ineffective.

16:30 - 17:15 Keynote lecture - Bogdan Pasaniuc: Integrative methods for biobank-scale genetic studies
17:15 - 17:30 EMGM 2021
17:30 - 17:45 Awards & farewell
17:45 - 18:00 Final Questions
~17:45 – 18:00 Separate rooms will be set up for Bogdan Pasaniuc and each of the Session 8 speakers for additional questions and discussion