NACEP: Network Based Comparison of Temporal Expression Patterns


NACEP is a model-based, open source tool for time-course data analysis. It explicitly uses co-expression network information in comparison of temporal gene expression data.

Index [ top ]

Description [ top ]

NACEP explicitly uses co-expression network information in comparison of temporal gene expression data under different experimental conditions. Instead of assigning each gene into a particular cluster, NACEP retains the probabilities of this gene to belong to every cluster. These probabilities and the mean expression patterns of every cluster are used in the final step of comparing the temporal expression patterns of a gene.

Figure 1: Flow-chart of NACEP in a temporal expression data comparison analysis
Figure 1 Flow-chart of NACEP in a temporal expression data comparison analysis

Parameter inference is implemented by Gibbs Sampler algorithm. The relative algorithm is compiled by C and runs on R platform. It is very easy to use and can return several kinds of results that users may concern.

Reference for NACEP and supplementary materials for the reference is shown in "Reference" below.

Downloading and Installation [ top ]

NACEP is available for both Windows (32- and 64-bit) and Linux (64-bit) platforms, source file is also available to build with specific needs:

The package contains four files.

  1. "NACEP.r": the main function file;
  2. "gibbssampling.dll" (Windows) / "gibbssampling.so" (Linux): dynamic library for the Gibbs sampling algorithm;
  3. "Data.txt": a sample dataset for testing purposes.
  4. "Example.r": an example for how to use the algorithm.

To install the package:

  1. Download the latest version of R for your platform from http://cran.r-project.org/.
  2. Install "splines" package in R.
  3. Download the package for your platform and extract code files from the package file to a new folder.

How to Use the Program [ top ]

  1. Make sure splines package is already installed in your R.
  2. Launch R and change the workspace to a new folder. (Notice that you need to run 64-bit R executable if you want to use the 64-bit NACEP package.)
  3. Extract NACEP.r and gibbssampling.dll (gibbssampling.so if using Linux version) from NACEP package file. Put these two files and your data file (in tab-delimited format as described in "Data File Format") into the new folder mentioned above.
    The data file must be normalized using other software.
  4. Open a new R code file and execute:
    >source("NACEP.r")
  5. Input key function NACEP and set the relative function parameters.
    There are 8 parameters in total in a NACEP function call:
    >NACEP(filename, spcNum, Timelength, Knot, loop=500, compStart, compIntvl, alpha=50)
    1. filename is the name of your txt data file.
    2. spcNum is the number of experiments. If you want to do two-group comparison, please use 2. For clustering purpose, please use 1. Multiple groups can be used and NACEP will report the average pair-wise distance between all groups.
    3. Timelength is the number of time points in each experiment.
    4. Knot is the number of knots used to construct the spline design matrix.
    5. loop is the number of loops that you want the Gibbs Sampler algorithm to run. Default value is 500.
    6. compStart is the first loop that the comparison results will be chosen.
    7. compIntvl is the interval between each two successively chosen comparison results.
    8. alpha is a parameter controlling the clustering strength, default is 50.
  6. Results can be obtained from NACEP algorithm.
    A folder named by the current date (yyyy-mm-dd) contains processed clustering results.
    The folder will include the following information:
    1. YHat_Step_<step>.txt contains the inferred gene expression pattern from Dirichlet Process clustering at the <step>-th interval.
    2. Step_<step>.txt contains the clustering details at the <step>-th interval, including σ2 value, number of clusters, and the cluster each gene belongs to.
    3. Distances.txt contains the comparison results between experimental conditions, every column in this file corresponds to the Euclid distance between gene expression pattern for different conditions as is shown in YHat_Step_<steps>.txt.
    4. Distances_avg.txt is the average distance value computed from Distances.txt.

Data File Format [ top ]

Specific format for the expression data files is needed to loaded into NACEP. All data files should be saved as tab-delimited text files. The first row of the data file consists of descriptions of each column including Gene name and serial array number (or array name). See "Data.txt" for further information.

Standard data file:

Gene
array1
array2
array3
array4
array5
array6
.
.
.
1
10.2
11.3
11.5
12.1
13.6
15.3
2
6.9
2.1
1.7
3.1
4.2
5.3
3
18.2
17.2
16.8
15.1
10.7
9.2
..............  

NACEP requires different experimental conditions have the same number of time points. In the data file, time point data in the same experimental condition need to be put in consecutive columns in the order of the time course. By passing the parameters spcNum and Timelength, NACEP will be able to separate the columns into their corresponding condition / time points. For the standard data file shown above, if we set spcNum = 2 and Timelength = 3, NACEP will interpret the data in the following way (excessive data to the right of the table will be discarded):

  Condition 1 Condition 2  
  t0 t1 t2 t0 t1 t2  
Gene
array1
array2
array3
array4
array5
array6
.
.
.
1
10.2
11.3
11.5
12.1
13.6
15.3
2
6.9
2.1
1.7
3.1
4.2
5.3
3
18.2
17.2
16.8
15.1
10.7
9.2
  ........ ........  

Because the input for NACEP is gene expression values, it can be used on normalized high-throughput sequencing data as well.

Interpreting the Result [ top ]

Converged results of NACEP can be used to reveal and compare gene expression differences between different conditions. σ2 value (low and not changing a lot across steps) and number of clusters (not changing a lot across steps) in Step_<step>.txt can be used to infer whether the results have converged,

By comparing the values in Distances_avg.txt, the genes most / least affected by the condition can be shown and the degrees can be quantified.

Experiment design may enable further analysis from NACEP distance data. For example, if multiple knock-out experiments of different transcription factors were performed, Pearson correlation values can be calculated between distances from those experiments to infer the relationship of transcription factors in the regulatory network. (See reference for analysis example.)

Results can also be used to get insights for gene expression patterns: by polling data from clustering results in Step_<step>.txt, genes that behave in the similar manner across conditions can be revealed in the same cluster.

Examples [ top ]

After launching R and changing its workspace path to the package path, open "Example.r" and execute. It will take up to several minutes to complete depending on your machine specification and the output files will be in the same directory.

Update Notes [ top ]

6/2/2014:

12/5/2011:

10/13/2011:

Reference and Acknowledgment [ top ]

Wei Huang, Xiaoyi Cao and Sheng Zhong (2010). "Network-based comparison of temporal gene expression patterns." Bioinformatics 26(23): 2944-2951.
[Figures in Manuscript] [Supplementary Text] [Supplementary Figures]

This work is supported by NSF grant DEB 0848386.

 


Contact [ top ]

Contact Xiaoyi Cao (xcao3@illinois.edu) for any problem or comment concerning NACEP.