General
1. About multiple hypothesis testing issue in GO analysis: Zhong S, Tian L, Li C, Storch FK and Wong WH (2004). Comparative Analysis of Gene Sets in the Gene Ontology Space under the Multiple Hypothesis Testing Framework. Proc IEEE Comp Systems Bioinformatics 2004:425-435
2. About the software in general: Zhong S, Storch F, Lipan O, Kao MJ, Weitz C, Wong WH (2004). GoSurfer: a graphical interactive tool for comparative analysis of large gene sets in Gene Ontology space. Applied Bioinformatics. 3(4):261-4.
Are there other related articles?
1. Khatri P, Draghici S. Ontological analysis of gene expression data: current tools, limitations, and open problems. Bioinformatics 2005, Adv Access
2. Storch KF, et al. Extensive and divergent circadian gene expression in liver and heart. Nature 2002 Aug 8;418(6898):665, Abstract
What if I do not use Affymetrix probe set ID, but I am using other gene IDs such as LocusLink ID?
Please use the
GeneCruiser program to convert your IDs into Affymetrix probe set IDs.
When several Affymetrix probe sets linked to the same gene, will the statistical tests in GoSurfer be affected?
GoSurfer takes care of it. The statistical
tests are based on gene numbers, not on probe set numbers.
What if there are replicated entries in the input file(s)?
GoSurfer will delete replicated entries, and
use only one of them. However, it is possible that several Affymetrix probe
sets are linked to one gene. This is allowed.
What is a Gene Information File?
A Gene Information File is a mapping file that
records the mapping among different gene identifiers and the association
between genes and GO terms.
How Affymetrix ID was mapped to GO terms?
The mapping between Affymetrix probe set IDs and the GO terms is generated by the ChipInfo software. For every probe set ID, ChipInfo retrieves its corresponding Unigene ID, Locuslink ID, and GO annotations (lowest level GO terms), and then it retrieves all the ancestor GO terms of such lowest level GO terms. At last it links the several gene IDs and all the GO terms, including the ancestor terms.
To get the mapping between Unigene/Locuslink ID and GO terms, we utilized the information we got from the previous step. The gene to GO term mapping information were pooled together across several Affymetrix arrays for the same species. By doing so, even though not all Unigene/Locuslink IDs are associated with GO annotations, we have covered a fair large number of genes. The exact numbers are stated in the Download page, beside every Gene Information file.
View
Why tree, not a directed acyclic graph (DAG)?
Reason 1: When there is a lot of GO terms to display, the DAG structure gets very messy on the screen. The advantage of a tree display is its ability to provide a clear global view and allowing users to interrogate every node in the mean time. The disadvantage of doing so is that when a GO term has multiple ancestor terms, it has to be replicated in the display. A decision favoring the tree display was made because a lot people feel the advantage outweigh the disadvantage.
Reason 2: If every node in a DAG uniquely represents a GO term, every node in a tree can uniquely represent a GO path, the collection of a GO term and its direct ancestor terms. For example, "ell growth and/or maintenance", is a GO term and "biological process > cellular process > cell growth and/or maintenance" is a GO path.
This site was last updated 08/04/05