Manual


Introduction Downloading Starting GoSurfer and getting help Input Views Biological Process / Molecular Function / Cellular Component Using mouse and array keys Options / Hiding nodes Gene list / GO list Highlighting Simple highlighting /statistical test Changing colors and branch width Output Acknowledgement

Introduction
Gene Ontology (GO) structures biological knowledge by using a controlled vocabulary consisting of GO terms. GO terms are organized in three general categories, “biological process, “molecular function,” and “cellular component,” and the terms within each category are linked in defined parent-child relationships that reflect current biological knowledge.

GoSurfer uses Gene Ontology (GO) information in the analysis of gene sets obtained from genome-wide computations or microarray analysis. It retrieves GO information for user input genes and graphically represents the such structured information. Every GO category is visualized as a tree, and users can interact and manipulate the tree by various means.

GoSurfer allows researchers to make comparisons among different sets of genes. Typically two sets of genes, such as genes from two tissues, are mapped onto Gene Ontology tree. Statistical tests can be performed to determine which GO paths are associated with significantly more genes in a particular gene set. The significant paths can be highlighted. All tree graphs and text information can be exported.

Downloading GoSurfer
Users need to download 3 files. 1: GS.exe is GoSurfer's main executable program. 2: GOpath.txt is a required input file. 3. A gene information file. Several gene information files are available to download at the "Download" page. Please choose one according to your gene identifier type or microarray type.

Starting GoSurfer and getting help
Click on GS.exe to start the program. On main menu, click "Help -> Online help", the user will be connected to GoSurfer's webpage.

Input
GoSurfer accepts 1 text file, with gene identifiers at the first column, as input file. Gene Identifiers can be Affymetrix probe set ID, Locuslink ID or Unigene ID. Affymetrix probe set ID is preferred (Currently some functions are only available to Affymetrix ID). Please use three "NONE"s in three continuous lines as seperate of gene list groups.

An example of an input file:
1433844_a_at
1425028_a_at
1435382_at
... ...
1449231_at
NONE
NONE
NONE
1456590_x_at
1423327_at
1437239_x_at
1425458_a_at
... ...

From menu, choose File => File input. A file input dialog shows up. Click on "Browse" button, and choose the input file(s). At the bottom of this dialog, please choose what GO terms should be used. User can either use all the GO terms that are associated with any genes either of the gene lists, or use all the GO terms that are associated with any genes on an Affymetrix GeneChip array. The first option will generally lead to more efficient analysis, and therefore is recommended. Please refer to views for more information.

Click "Step 2" on top of the dialog. Please specify the gene identifier type used in the input gene list. Input the file name of the Gene Information File. A Gene Information File is a mapping file that associates different gene identifiers with GO annotations. Please note that the user should specify the Gene Information File that contains the gene identifiers used in the input gene lists specified in step 1. Please find and download the appropriate Gene Information File in the "Download" page. For details about how Unigene/Locuslink/Affymetrix ID was mapped to GO terms, please see "FAQ".

Also please indicate the path of the GO Structure File (GOpath.txt). Please go to the "Download" page to download it.

Click "OK" button. And Input starts. The following message will show on screen:
reading gene information ...
reading gene ontology ... ...

When input completes, gene information will show on screen, and a message "Input complete" will show on the lower left corner of the screen.

For questions concerning different probe sets linking to the same gene, and replicated identifiers in the input file(s), please see FAQ.

Views
Click "View => Biological Process" to view the tree structure of "Biological Process" related GO terms. If on step 1 of the input dialog, the user chose the first option, "Any gene in the union of all genes in input files", the tree of "biological process" will be drawn using only the GO terms that are associated with the genes in the input files. If the user chose the second option, "Any gene on an Affymetrix GeneChip array", a huge GO tree representing all GO terms that are associated with any genes in the Gene Information File, specified in step 2, will be drawn. Please notice that drawing such a figure can be very time consuming (~1 minute on a current main stream PC). After such a tree is drawn, the user can use "Analysis => Highlight => Simple" to highlight the GO terms that are associated with the genes in the input gene lists.

Click "View => Molecular Function" and "View =>Cellular Component" to show the corresponding GO trees.

Every node in a GO tree represents a GO term. The hierarchical structure of all GO terms is represented by the tree structure. The root of a tree must be one of the three GO terms: Biological Process, Molecular Function, and Cellular Component, because they are the up-most ancestor terms. The sequence from the root to a node in the tree is called a GO path.

Mouse and Arrow keys
When the mouse is pointing at a node, the name of the according GO path will show on the bottom of the screen. Double click on a node, the genes that are associated with this node will show up. Pressing array keys can expand or constrict the tree.

Options: "View => Options".

Choose the way the user would like the image to display.
"Highlight branches together with nodes" will have effect when the user uses "Analysis => Highlight" function. When selected, both the node and the branch connecting it to its parent term will be highlighted together. "Enable quick display" will speed up display, but the GO path will not show when the mouse is point at a node. However, the user can click on a node to show its GO path.

"Hide nodes containing less than __ genes" trims the GO tree by cutting nodes (GO terms) that are associated with a small number of genes from the input gene list(s).

Viewing gene list and GO list
"View => gene list" shows gene information.
"View => GO list" shows Gene Ontology information.

Highlighting: "Analysis => Highlight"

"Analysis => Highlight" uses one of the two ways to highlight GO tree.
1. Simple. A node is colored magenta if a least one gene in input gene list group 1 is associated with it, but none of group 2 genes is associated. A node is colored blue if vise versa. A node is colored gray if both of the groups have genes associated with it.
2. Test for significance. Every node is tested for whether either of the the two input gene lists has a significantly larger proportion of genes that are associated with it. A Pearson's Chi-square test is performed. A node is colored magenta if group 1 has a significantly larger proportion of genes associated with it, comparing to group 2. A node is colored blue if vice versa, gray if not significant. When there is only one group of input genes, the test for significance function can still be performed. In this case, the group of of input genes are compared with all the genes contained in the Gene Information File. It is as if using all the genes in the Gene Information File as the second input gene list. Notice that this function usually takes several minutes to perform.
3. Significance test, adjusting for multiple testing issue. A new statistic, q-value is used to find GO terms with better statistical confidence.

This is an example after highlighting:

Changing colors and branch width
The colors, the branch width, and the node radius can be changed by clicking "View => Color and Width"

Output
1. Image output. Users can save the currently on screen image by clicking "File => Export => Current Image" menu.
2. Gene ontology information output. Click "File => Export => GO list", and all GO information, including GO ID, GO term, the genes of each group that are associated with this GO term, and the p-value of every GO term, will be exported to one tab delimited file. It there is a lot of GO information to output, this function can take several minutes to finish. The output progress is shown at the lower left corner of the screen.

Acknowledgement
GoSurfer is a product of Wong Lab. We are very grateful to our collaborators who kindly provided us with data, suggestions and help:
Cheng Li, Florian Storch, Ovidiu Lipan, Ming-Chih Kao, Charles Weitz.

This site was last updated 12/11/06