The Soft Cross Species Clustering (SCSC) software is available for interactive use via the web. Alternatively, you can download the executable software for Windows.
All of the features of Two-species clustering program available in the downloadable version are included in the parameter file named as 'ParameterSetting.txt'(sample) ,which defines the file names of input and output files and parameter specification used in program:
Features in file: 'ParameterSetting.txt' | Description |
---|---|
dataFile_hm | microarray expression data for species I |
dataFile_ms | microarray expression data for species II |
k | cluster number |
annotFile_hm | annotation file for species I |
annotFile_ms | annotation file for species II |
orthologFile | ortholog file of probes to probes across species |
outputFile_hm | output file containing the clustering results of species I |
outputFile_ms | output file containing the clustering results of species II |
k_by_k_output_clusters | output file containing the combined clustering results for the two species |
gene_pair_result | output file containing the information of each ortholog pair and its cluster index |
prep | yes: to perform preprocess; no: not to do the preproces |
CVThr | (0-100) all genes whose coefficient of variance is less than this value will be filtered out. |
maxExpr | (50-200) all genes whose maximum expression level is less than this value will be filtered out. |
nMaxAlgRun | (1-150) randomly repetitive running times. |
stoppingThr | (0.001-0.00001) the threshold used in assessing stopping criteria - if the relative increase of score in iteration is less than this value, the algorithm will stop. |
psudo_uniform_counts | psudo counts added to scatter prior, 0: no scatter cluster; >=1(integer): the scatter cluster exits. |
Two-species clustering program takes as inputs (sample) data files for each species(dataFile_hm and dataFile_ms), annotation files for the probes in each species(annotFile_hm and annotFile_ms) and a file of ortholog pairs among the two species(orthologFile). The output files (sample) include the clustering result for each species(outputFile_hm and outputFile_ms), the combined clustering results for the two species (k_by_k_output_clusters) and cluster index file for each ortholog pair(gene_pair_result). Clusters in output files are separated with three lines marked as "NONE". These output files can be viewed via software TreeView. In the TreeView plot, the left panel is for one species and the right panel is for the other species. Clusters are seperated by three space lines, which are sorted by cluster indices of (cluster index in species one, cluster index in species two): (1,1),(1,2),(1,3)...(2,1),(2,2),(2,3)....