Development of this software was supported in part by ENDGAME (Enhancing Development of Genome-wide Association Methods) project U01 CA 125489 Dissecting complex traits with diverse resources. Investigators on this project are James Dai, Li Hsu, Charles Kooperberg, Michael LeBlanc, Hua Tang, and Yingye Zheng.
Charles Kooperberg has developed code for approximate power calculations for identification of gene x gene and gene x environment interactions
in genomewide association studies using a two-stage analysis:
bundled in the R-package powerGWASinteraction available from CRAN.
Hua Tang has a variety of genomic software programs on her
web site. In particular,
the SABER program, a computationally efficient, R-based, program that
infers locus-specific ancestry in admixed individuals, taking into account
background LD within ancestral populations, was developed as part of
our ENDGAME project.
Li Hsu has developed a program hybrid.r which provides simultaneous estimation of environmental risk
factors, candidate genes, and their interactions. The program outputs log-odds ratio
estimates, standard error estimates, and p-values for all covariates
using data on case families only, case-unrelated controls only, and
combined case families and unrelated controls.
Mike LeBlanc has developed software
for Adaptively Weighted Association Statistics (AWAS).
This program implements adaptive selection and weighting to potentially improve
the power of association testing of genetic factors with disease outcome.
The strategy is based on the often plausible assumption that genetic associations
may be stronger within subgroups of subjects in epidemiologic or clinical studies.
The least angle regression (LAR) method (Efron et al, 2004) is used to adaptively
select or weight the score test statistics.
James Dai has developed software for SNP-Haplotype Adaptive Regression (SHARE)
to perform multi-locus analysis in order to account for LD patterns observed in human genome.
The challenge is to choose a model that exploits the local dependence of SNPs without incurring too many parameters.
SHARE uses novel strategy to select an optimal set of SNPs that captures the genetic association in the targeted region using statistical learning framework. The model searching process resembles CART. Depending on the evolutionary history of the disease mutation and the markers,
the optimal set may contain a single SNP, or several SNPs that lay foundation for a haplotype analysis.
The algorithm
is implemented in the R-package SHARE available from CRAN.