INFORMATION ABOUT OUR NIH GRANT SUPPORT

A summary of our federal funding from the NIH RePORT website can be found here. We briefly summarize below our primary NIH R01-funded research projects.



BIOINFORMATICS STRATEGIES FOR GENOME-WIDE ASSOCIATION STUDIES

Funded by NIH R01 LM010098 (PI - Moore, with Asselbergs and Williams)


Genome-wide association studies (GWAS) are commonplace despite the lack of a comprehensive bioinformatics approach to the analysis of the data. The common method of analysis is to employ parametric statistics and then adjust for the large number of tests performed to limit false-positives (i.e. type 1 errors). This agnostic approach is preferred by some because no assumptions are made about which genes or genomic regions might be important. This logic suggests that the data should tell us where the important genetic variants are. The goal of our proposed research program is to specifically compare this agnostic approach with a bioinformatics approach that selects associated SNPs based on expert knowledge about biochemical pathways and gene function. We propose to develop a bioinformatics approach for selecting SNPs from a GWAS using knowledge about the biology of the genes being studied and the molecular pathology of disease (AIM 1). We will modify and extend the Exploratory Visual Analysis (EVA) database and software that was originally designed for microarray studies with pilot funding from the NLM BISTI program. We will then use this bioinformatics approach along with an agnostic statistical approach for detecting SNPs associated with plasma levels of tissue plasminogen activator (t-PA) and plasminogen activator inhibitor one (PAI-1) in a large population-based sample of Caucasians (n=2000) from the PREVEND study in Groningen, The Netherlands (AIM 2). Those SNPs identified by both methods in the PREVEND study will be evaluated first for replication in an independent population-based sample of Caucasians (n=2000) from the Rotterdam Study in the Netherlands and then for validation in a population-based sample of Blacks (n=2000) from the HeART Study in Ghana, Africa (AIM 3). Finally, we will specifically compare how many and which SNPs replicate and validate using the statistical approach and the bioinformatics approach (AIM 4). Our working hypothesis is that we will obtain more validated and hence more real SNPs using the bioinformatics approach.



MACHINE LEARNING PREDICTION OF CANCER SUSCEPTIBILITY

Funded by NIH R01 LM009012 (PI - Moore)


Susceptibility to sporadic forms of cancer is determined by numerous genetic factors that interact in a nonlinear manner in the context of an individual´┐Żs age and environmental exposure. This complex genetic architecture has important implications for the use of genome-wide association studies (GWAS) for identifying susceptibility genes. The assumption of a simple architecture supports a strategy of testing each single-nucleotide polymorphism (SNP) individually using traditional univariate statistics followed by a correction for multiple tests. However, a complex genetic architecture that is characteristic of most types of cancer requires methods that specifically model combinations of SNPs and environmental exposures. While novel methods are available for modeling interactions, exhaustive testing of all combinations of SNPs is not feasible on a genome-wide scale because the number of comparisons is effectively infinite. Thus, it is critical that we develop intelligent strategies for selecting subsets of SNPs prior to combinatorial modeling. The objective of this research program is to continue the development, evaluation, distribution, and support of machine learning algorithms and open-source software for detecting and characterizing gene-gene and gene-environment interactions on a genome-wide scale. All methods developed as part of this proposal will be applied to gene-gene and gene-environment interaction analysis of bladder cancer susceptibility within the framework of GWAS.



BIOINFORMATICS STRATEGIES FOR MULTIDIMENSIONAL BRAIN IMAGING GENETICS

Funded by NIH R01 LM011360 (PI - Moore, with Shen and Saykin)


Today's generation of multi-modal imaging systems produces massive high dimensional data sets, which when coupled with high throughput genotyping data such as single nucleotide polymorphisms (SNPs), provide exciting opportunities to enhance our understanding of phenotypic characteristics and the genetic architecture of human diseases. However, the unprecedented scale and complexity of these data sets have presented critical computational bottlenecks requiring new concepts and enabling tools. To address these challenges, using the study of Alzheimer's disease (AD) as a test bed, this project will develop and validate novel bioinformatics strategies for multidimensional brain imaging genetics. Aim 1 is to develop a novel bi- multivariate analysis strategy, S3K-CCA, for studying imaging genetic associations. Existing imaging genetics methods are typically designed to discover single-SNP-single-QT, single-SNP-multi-QT or multi-SNP-single- QT associations, and have limited power in revealing complex relationships between interlinked genetic markers and correlated brain phenotypes. To overcome this limitation, S3K-CCA is designed to be a sparse bi- multivariate learning model that simultaneously uses multiple response variables with multiple predictors for analyzing large-scale multi-modal neurogenomic data. Aim 2 is to develop HD-BIG, a visualization and systems biology framework for integrative analysis of High-Dimensional Brain Imaging Genetics data. Machine learning strategies to seamlessly incorporate valuable domain knowledge to produce biologically meaningful results is still an under-explored area in imaging genetics. In this aim, we will develop a user-friendly heat map interface to visualize high-dimensional results, adjust learning parameters and strategies, interact with existing bioinformatics resources and tools, and facilitate visual exploratory and systems biology analysis. A novel imaging genetic enrichment analysis (IGEA) method will be developed to identify relevant genetic pathways and associated brain circuits, and to reveal complex relationships among them. Aim 3 is to evaluate the proposed S3K-CCA and IGEA methods and the HD-BIG framework using both simulated and real imaging genetics data. This project is expected to produce novel bioinformatics algorithms and tools for comprehensive joint analysis of large scale heterogeneous imaging genetics data. The availability of these powerful methods is critical to the success of many imaging genetics initiatives. In addition, they can also help enable new computational applications in other areas of biomedical research where systematic and integrative analysis of large-scale multi-modal data is critical. Using AD as an exemplar, the proposed methods will demonstrate the potential for enhancing mechanistic understanding of complex disorders, which can benefit public health outcomes by facilitating diagnostic and therapeutic progress.



BIOINFORMATICS APPROACHES TO VISUAL DISEASE GENETICS

Funded by NIH R01 NEI022300 (PI - Moore)


It is now recognized that many visual diseases are influenced by complex interactions between multiple different genetic variants. As a result, our ability to predict susceptibility to visual diseases will depend critically on the computational, mathematical and statistical modeling methods and software that are available for making sense of high-dimensional genetic data. We propose here a systems-based bioinformatics research project to develop network modeling approaches for identifying combinations of genetic biomarkers associated with visual disease endpoints. Our working hypothesis is that a systems-based bioinformatics approach using network modeling will play a very important role in confronting the complexity of the relationship between genomic variation and visual diseases. We will first develop and evaluate modeling methods to infer large-scale genetic interaction networks from genome-wide association studies (AIM 1). We will then apply the modeling methods developed in AIM 1 to the inference of genetic interaction networks from genome-wide association data in subjects with and without visual diseases (AIM 2). Next, we will utilize the inferred genetic interaction networks to guide the development of predictive genetic models of visual diseases (AIM 3). Finally, all network modeling methods will be released to the vision research community as part of a popular user-friendly, freely available and open-source software package (AIM 4). We anticipate that the network modeling methods and software developed and distributed as part of this project will play an important role in the development of the genetic tests that will be necessary to identify those at risk for visual diseases.



BIOINFORMATICS STRATEGIES FOR BIODEFENSE VACCINE RESEARCH

Funded by NIH R01 AI59694 (PI - Moore)


Infectious bioterrorism agents such as smallpox and anthrax represent a critical public health concern. Important goals of biodefense research include the development of predictors of pathogenicity of bioterrorism agents for rapid response and the prediction of clinical outcomes such as adverse events following vaccination. Our success in these biodefense endeavors will depend critically on the bioinformatics methods and software that are available for making sense of high-dimensional data generated by technologies such as DNA microarrays and mass spectrometry. The goal of this research program is to continue the development, evaluation, distribution and support of our successful open-source Multifactor Dimensionality Reduction (MDR) software package for identifying combinations of genetic and environmental predictors of clinically important biodefense outcomes. We will first evaluate new methods from our research group and those that have been proposed by other research groups and assess the best approaches for inclusion in new versions of the MDR software (AIM 1). The inclusion of new methods such as stochastic search algorithms for genome-wide analysis and linear models for continuous endpoints will ensure that the MDR software stays on the cutting edge. Second, we propose to develop a web server that biodefense researchers can use as a source of expert knowledge in the form of gene weights that are generated from biochemical pathways, Gene Ontology (GO), chromosomal location and protein-protein interactions, for example (AIM 2). Expert knowledge files generated by the web server will be used by the MDR software to prioritize single nucleotide polymorphisms (SNPs) for interaction analysis in genome-wide association studies or GWAS. These additions will ensure that MDR is ready for application to GWAS that are now commonplace. We will then apply these methods to GWAS data from an ongoing study of adverse events following vaccination for smallpox (AIM 3). Finally, we will identify opportunities to address other important bioterrorism research questions with our software that are consistent with the research objectives of the NIAID/NIH (AIM 4). All bioinformatics methods and tools will be provided in a timely manner for free as open-source software.