APEG: Affinity Prediction by Epigenome and Genome
The program uses a biophysical model to analyze transcription (TF)-DNA binding data, such as ChIP-seq data by incorporating epigenomic modifications and genome sequence data. This model can learn synergistic and antagonistic interactions between specific TFs and epigenomic modifications from genome-wide TF binding and epigenomic data.
The program needs GNU Scientific Library (GSL). If it is not installed in your system, go to: http://www.gnu.org/software/gsl/. Note that after installing GSL, you need to change the start-up script of your shell, e.g., .bash_profile at your home directory if you are using bash. Suppose the GSL installation directory is /raid/apps/gsl-1.15/lib:
After extracting the program, change the GSL directory in src/Makefile, e.g.:
GSL_DIR = my_gsl_dir
Then simply type:
Run the program
./seq2binding –s <seqFile> -d <dataFile> -m <motifFile> -nep <number of epigenomic marks> -ep <epiFile1> ( <epiFile2> ..)
If you have multiple epifiles, then after the command –ep please type in your epifiles separated with a space.
The program takes the following input files. See example dataset for reference.
seqFile: the FASTA format file of sequences. See example data.
dataFile: the binding data in bed format consist of all sequences in the seqFile. The first column is the sequence id (must be the same as those used in seqFile, and in the same order), and the second column is the measured strength of binding. chr1:136351629-136351631[tab]312
motifFile: the motif of the TF. The header line consists of motif name, length and pseudocount (0.5 should be OK for most motifs).
>Nanog 9 0.5
20 225 46 209
70 0 19 411
50 66 381 3
434 45 0 21
55 5 66 374
17 32 222 229
74 18 325 83
8 243 146 103
48 145 6 301
epiFile: the epigenomic mark data in wig format. It should cover whole genome. Otherwise, the program won’t be able to find the epigenomic data to annotate the binding sites.
fixedStep chrom=chr1 start=0 step=25
(1) Estimated parameters: binding parameter (how strongly the TF binds with its binding site); the interaction parameters between TF and epigenomic mark: greater than 1 if favorable interaction, less than 1 unfavorable, 1 if no interaction.
(2) Pearson correlation between predicted binding and observed binding.
-ts <testSeqFile>: test the trained model in additional testing data. The format of testSeqFile is the same as seqFile.
-td <testDataFile>: test the trained model in additional testing data. The format of testDatafile is the same as dataFile.
-p <trainPredictionFile>: print the predicted binding intensities (of the training sequences in seqFile) in the file trainPredictionFile.
-tp <testPredictionFile>: print the predicted binding intensities (of the testing sequences in seqFile) in the file testPredictionFile.
Contact Chieh-Chun Chen (cchen63 AT illinois DOT edu) for any problem concerning APEG.