NACEP is a model-based, open source tool for time-course data analysis. It explicitly uses co-expression network information in comparison of temporal gene expression data.
Index [ top ]
Description [ top ]
NACEP explicitly uses co-expression network information in comparison of temporal gene expression data under different experimental conditions. Instead of assigning each gene into a particular cluster, NACEP retains the probabilities of this gene to belong to every cluster. These probabilities and the mean expression patterns of every cluster are used in the final step of comparing the temporal expression patterns of a gene.
Figure 1 Flow-chart of NACEP in a temporal expression data comparison analysisParameter inference is implemented by Gibbs Sampler algorithm. The relative algorithm is compiled by C and runs on R platform. It is very easy to use and can return several kinds of results that users may concern.
Reference for NACEP and supplementary materials for the reference is shown in "Reference" below.
Downloading and Installation [ top ]
NACEP is available for both Windows (32- and 64-bit) and Linux (64-bit) platforms, source file is also available to build with specific needs:
- NACEP Package for 32-bit Windows
- NACEP Package for 64-bit Linux (built under Ubuntu)
- NACEP Package for 64-bit Windows
- Source for gibbssampling.dll / gibbssampling.so
The package contains four files.
- "NACEP.r": the main function file;
- "gibbssampling.dll" (Windows) / "gibbssampling.so" (Linux): dynamic library for the Gibbs sampling algorithm;
- "Data.txt": a sample dataset for testing purposes.
- "Example.r": an example for how to use the algorithm.
To install the package:
- Download the latest version of R for your platform from http://cran.r-project.org/.
- Install "splines" package in R.
- Download the package for your platform and extract code files from the package file to a new folder.
How to Use the Program [ top ]
Data File Format [ top ]
Specific format for the expression data files is needed to loaded into NACEP. All data files should be saved as tab-delimited text files. The first row of the data file consists of descriptions of each column including Gene name and serial array number (or array name). See "Data.txt" for further information.
Standard data file:
Gene | array1 |
array2 |
array3 |
array4 |
array5 |
array6 |
. . . |
---|---|---|---|---|---|---|---|
1 | 10.2 |
11.3 |
11.5 |
12.1 |
13.6 |
15.3 |
|
2 | 6.9 |
2.1 |
1.7 |
3.1 |
4.2 |
5.3 |
|
3 | 18.2 |
17.2 |
16.8 |
15.1 |
10.7 |
9.2 |
|
.............. |
NACEP requires different experimental conditions have the same number of time points. In the data file, time point data in the same experimental condition need to be put in consecutive columns in the order of the time course. By passing the parameters spcNum and Timelength, NACEP will be able to separate the columns into their corresponding condition / time points. For the standard data file shown above, if we set spcNum = 2 and Timelength = 3, NACEP will interpret the data in the following way (excessive data to the right of the table will be discarded):
Condition 1 Condition 2 t0 t1 t2 t0 t1 t2 Gene array1 array2 array3 array4 array5 array6.
.
.1 10.2 11.3 11.5 12.1 13.6 15.32 6.9 2.1 1.7 3.1 4.2 5.33 18.2 17.2 16.8 15.1 10.7 9.2........ ........ Because the input for NACEP is gene expression values, it can be used on normalized high-throughput sequencing data as well.
Interpreting the Result [ top ]
Converged results of NACEP can be used to reveal and compare gene expression differences between different conditions. σ2 value (low and not changing a lot across steps) and number of clusters (not changing a lot across steps) in Step_<step>.txt can be used to infer whether the results have converged,
By comparing the values in Distances_avg.txt, the genes most / least affected by the condition can be shown and the degrees can be quantified.
Experiment design may enable further analysis from NACEP distance data. For example, if multiple knock-out experiments of different transcription factors were performed, Pearson correlation values can be calculated between distances from those experiments to infer the relationship of transcription factors in the regulatory network. (See reference for analysis example.)
Results can also be used to get insights for gene expression patterns: by polling data from clustering results in Step_<step>.txt, genes that behave in the similar manner across conditions can be revealed in the same cluster.
Examples [ top ]
After launching R and changing its workspace path to the package path, open "Example.r" and execute. It will take up to several minutes to complete depending on your machine specification and the output files will be in the same directory.
Update Notes [ top ]
6/2/2014:
- Added "Interpreting the Result" and updated "How to Use the Program" to help result interpretation.
12/5/2011:
- Minor change in text.
- Updated grant information in "Reference and Acknowledgment"
10/13/2011:
- Added 64-bit and Linux support.
- Re-structured the web page for easy navigation. Added explanation for input data structure.
Reference and Acknowledgment [ top ]
Wei Huang, Xiaoyi Cao and Sheng Zhong (2010). "Network-based comparison of temporal gene expression patterns." Bioinformatics 26(23): 2944-2951.
[Figures in Manuscript] [Supplementary Text] [Supplementary Figures]This work is supported by NSF grant DEB 0848386.
Contact [ top ]
Contact Xiaoyi Cao (xcao3@illinois.edu) for any problem or comment concerning NACEP.