
We
are committed to releasing free open-source software to the research
community in a timely manner. Our current open-source software
projects are listed below.
I. BioSymphony (BioSym)
See www.BioSymphony.org for more information.
Solving complex biomedical problems relies more and more on collaborative
efforts among two or more investigators each with different skills and
expertise. It is often the case that the most useful collaborations result from
chance interactions among investigators from different departments at the same
institution or among investigators at different institutions from the same
region. The goal of the BioSymphony
project is to develop and make freely available software for facilitating
biomedical research collaborations. We are establishing a database of biomedical
investigators at Dartmouth and throughout Northern New England that consists of
annotated information about each investigator mined from PubMed that can be used
to predict fruitful collaborations. Our hope is that this resource will result
in meaningful collaborations that otherwise might happen only by
chance. The BioSymphony (BioSym) database and software is
in early alpha testing at Dartmouth Medical School. Check the BioSym web page, here or our blog (Epistasis
Blog) for updates.
Development of BioSym is supported by generous funds from the Norris-Cotton Cancer Center.
II. Exploratory Visual Analysis (EVA)
See www.exploratoryvisualanalysis.org for more information.
EVA
is a database and GUI for the exploratory visual analysis of statistical results
(not raw data) from high-throughput genetic and genomic experiments. How often
have you been handed an Excel spreadsheet with >30,000 Affymetrix gene IDs
and p-values from a statistical analysis and been left with the daunting
challenge of extracting something biologically meaningful? The EVA system allows you to database these
results with knowledge about each gene from public databases such as
Entrez
Gene. The GUI allows you to visually explore the
p-values in the context of Gene Ontology, biochemical pathway, protein domain,
chromosomal location, or phenotype thus facilitating biological interpretation.
The first paper describing EVA was published in the 2005 proceedings of the
Pacific Symposium on
Biocomputing. An example application of EVA can be found in a recent publication in Oncology Reports and a recent publication in Diabetes. The prototype EVA database was programmed in
Oracle while the prototype EVA GUI was programmed in Visual Basic. An
open-source version of EVA in Java is under development and is available upon request. Check
this web page, www.exploratoryvisualanalysis.org, or our blog (Epistasis
Blog) for updates.
Development of EVA is supported by generous funds from the Norris-Cotton Cancer Center. The prototype for EVA was supported by NIH grant P20-LM007613.
III. Multifactor Dimensionality Reduction (MDR)
The open-source MDR software package can be freely downloaded from Sourceforge.Net.
See www.multifactordimensionalityreduction.org for more information.
MDR
is a nonparametric and genetic model-free data mining alternative to logistic regression for
detecting and characterizing nonlinear interactions among discrete genetic and
environmental attributes. The MDR method
combines attribute selection, attribute construction, and classification with
cross-validation and permutation testing to provide a comprehensive and powerful
approach to detecting nonlinear interactions.
See our 2006 paper in the Journal of Theoretical Biology for a recent review. See also the MDR entry in Wikipedia for a description of the basic method. Click here to carry out a PubMed search for MDR publications. Click here to Google MDR. Click here to Google Scholar MDR. See the publications page on this website for a comprehensive list of our MDR papers. See our blog or www.multifactordimensionalityreduction.org for updates and news about latest developments with MDR.
Development of MDR is supported by NIH grants R01-AI59694, R01-HD047447, and R01-LM009012 as well as by generous funds from the Norris-Cotton Cancer Center.
IV. Symbolic Modeler (SyMod)
See www.symbolicmodeler.org for more information.
The SyMod software package will provide open-source access to
two different methods. The first method, Symbolic Disciminant
Analysis (SDA), was developed by our team as nonlinear alternative to
Fisher's Linear Discriminant Analysis (LDA). The goal of SDA is to identify the optimal combination of attributes and
mathematical functions for predicting a discrete endpoint. Unlike LDA, SDA makes no assumptions about
the functional form of the model. Given
a list of attributes (e.g. gene expression variables) and mathematical functions
(e.g. +, -, *, /, log, sqrt, abs, AND, OR, <, >, etc.), SDA optimizes
model discovery using any wrapper algorithm.
We have used genetic programming as a wrapper for SDA although
other stochastic search methods such as simulated annealing could be used. We
have a new paper on SDA that will appear soon in a special issue of
Human Heredity. The second method that will be included in SyMod
is symbolic regression. Symbolic regression is similar to SDA but
is used for continuous endpoints. The alpha
version of SyMod is ready for public testing. Check
this web page, www.symbolicmodeler.org, or our blog (Epistasis
Blog) for updates.
Development of SyMod is supported by NIH grant R01-AI59694 and by generous funds from the Norris-Cotton Cancer Center.
V. Weka-CG
The open-source Weka-CG software package can be downloaded from here.
Weka is an
open-source data mining software package with a number of powerful
machine learning methods such as decision trees, neural networks and
support vector machines. A recent book about data mining with
Weka can be found here.
We are distributing our own version of Weka with integrated tools
for computational genetics (CG). The first new tool added to
Weka-CG is our multifactor dimensionality reduction (MDR) method.
Here, MDR has been added to Weka-CG as a filter for constructive
induction so that constructed attributes (i.e. SNP combinations) can be
analyzed with any number of different methods included in Weka (e.g.
logistic regression).
Development of Weka-CG is supported by NIH grant R01-AI59694 and R01-LM009012 as well as by generous funds from the Norris-Cotton Cancer Center.
Last updated by JHM on March 2, 2008