Statistics Data Literacy: Applications of Statistics: Statistics and Genetics: A fruitful relationship

tisdag 23 juli 2019

Applications of Statistics: Statistics and Genetics: A fruitful relationship

https://youtu.be/5lFHlPWcGMg

Presentation of contents:

Title: Statistics and Genetics: A Fruitful Relationship - A brief history of the relationship between the two disciplines
Authors: Heather Cordell - Newcastle University In this presentation, I shall briefly summarize the history of the relationship between the two disciplines of Statistics and Genetics, with a focus on particular selected highlights. Starting from the work of Gregor Mendel in 1866 (rediscovered independently at the turn of the century by Correns, de Vries, von Tschermak-Seysenegg and Spillman), it soon became clear that inherited traits obey simple statistical rules. However, the simple rules of Mendelian inheritance seemed at odds with the biometric approaches developed by Francis Galton (and promoted by Raphael Weldon and Karl Pearson), until R.A. Fisher’s seminal 1918 paper (and subsequent work arising) effectively resolved the conundrum. Some specific areas in which genetics has impacted statistics, or, conversely, statistics has impacted genetics, will be discussed. These include epistasis, likelihood-based techniques (including REML), sequential testing, Approximate Bayesian Computation (ABC) and False Discovery Rate (FDR). ----------------------------------------------------------- Title: Inferring causality from genome wide association studies Authors: Toby Johnson - GlaxoSmithKline, United Kingdom Over the last decade, genome wide association studies (GWAS) have revolutionized our knowledge of DNA sequence variants that have strong and robust statistical associations with human diseases and traits. This is primarily because of increased sample sizes, which in turn are due to lower costs of genotyping and DNA sequencing, broad adoption of collaborative meta-analytic approaches, and widespread access to data from the UK Biobank. In larger sample sizes, genetically related individuals will be sampled almost surely, and making full use of these data motivates finding computationally tractable ways to fit billions of random effects models. Although genetic associations are typically not susceptible to reverse causation or uncorrected confounding, it remains a challenging and unduly neglected problem, to make inference about which genes or proteins (as opposed to DNA sequence variants) are causal for human diseases. Some fully or approximately Bayesian methods show promise in this regard. I will describe our approaches to solving these problems, with application to selecting and validating targets for future drug discovery projects. ----------------------------------------------------------- Title: Statistical challenges of the "Let's Measure Everything" era of human genetics Authors: Luke Jostins-Dean - University of Oxford Over the last decade, human geneticists have discovered a very large number of genetic variants that impact "headline" human traits (e.g. risk of common diseases). In parallel to this, our ability to measure broader "genome-adjacent" human phenotypes has exploded: researchers often measure expression levels of genes across multiple tissues, epigenetic marks, metabolites, gut microbe abundances, etc. Geneticists analyse these diverse "-omics" datasets in order to draw causal pathways from genetic variants, through cellular and systemic phenotypes, through to headline traits. This new "Let's Measure Everything" approach to human genetics raises a new set of statistical challenges. In this talk, I will discuss some examples of these challenges, and the attempts that have been made to address them. I will discuss the "large p, small N" problem in analysing gene expression data, where a large number of genes are tested using a small number of samples, and how variance stabilisation techniques have allowed these experiments to proceed successfully. I will also discuss the problem of modelling covariance between large numbers of observed variables in the presence of potential confounding, including cases where no single unique solution exists, and how penalised likelihood approaches can provide solutions. Finally, I will discuss some currently unsolved problems in statistical genetics, and how new developments in statistics could help drive genetics forward in the future.