Methods for Genome-wide Association Studies

Genome-wide association studies (GWASs) collect genotype and phenotype information for a same set of individuals, and are aimed to systemically identify genetic variants that are associated with, and therefore potentially causal for the phenotypes of interest. These studies are large-scale, typically consisting of tens of thousands of individuals and millions of genetic variants, and the phenotypes are often affected by confounding factors (e.g. population stratification and individual relatedness) and sometimes by study designs (e.g. ascertainment in case-control studies). We are developing statistical methods and computational algorithms to overcome these challenges, in order to facilitate the identification of causal variants and to better model the relationship between phenotypes and genotypes. Examples include chip heritability estimation, phenotype/risk prediction, association analysis in the presence of population stratification and individual relatedness, modeling with multiple correlated phenotypes. We distribute open-source software, GEMMA, implementing these methods.

Manhattan Plot

Methods for Functional Genomics Sequencing Studies

Functional genomics sequencing studies collect a variety of data types by large-scale sequencing projects to provide intermediate phenotypes between genetic variants and final outcome traits (e.g. disease status). With these intermediate phenotypes, scientists hope to understand the underlying gene regulation mechanism and provide mechanistic explanations on how genetic variants affect final outcome traits. These intermediate phenotypes are obtained genome-wide (i.e. consist of measurements on approximately three billion base pairs) through a variety of techniques (e.g. RNAseq, ChIPseq, DNase-seq, Hi-C, single cell sequencing etc.), and are heavily influenced by many measured or unmeasured confounding factors (e.g. batch effects). With active collaboration with many experts in the field, we are developing methods and analytic tools to better harness the information from different data types and study designs.

Aggregate Plot