APEG: Affinity Prediction by Epigenome and Genome
Description
The
program uses a biophysical model to analyze transcription (TF)-DNA binding
data, such as ChIP-seq data by incorporating epigenomic modifications and genome
sequence data. This model can learn synergistic and antagonistic interactions
between specific TFs and epigenomic modifications from genome-wide TF binding
and epigenomic data.
Download
Installation
The program needs GNU Scientific
Library (GSL). If it is not installed in your system, go to: http://www.gnu.org/software/gsl/.
Note that after installing GSL, you need to change the start-up script of your
shell, e.g., .bash_profile at your home directory if you are using bash.
Suppose the GSL installation directory is /raid/apps/gsl-1.15/lib:
LD_LIBRARY_PATH=/raid/apps/gsl-1.15/lib:$LD_LIBRARY_PATH
export
LD_LIBRARY_PATH
After extracting the program, change
the GSL directory in src/Makefile, e.g.:
GSL_DIR = my_gsl_dir
Then simply type:
Make
Run
the program
Usage:
./seq2binding –s
<seqFile> -d <dataFile> -m <motifFile> -nep <number of
epigenomic marks> -ep <epiFile1> (
<epiFile2> ..)
If you have multiple epifiles, then after the command –ep
please type in your epifiles separated with a space.
The program takes the following input
files. See example dataset for reference.
seqFile:
the FASTA format file of sequences. See example data.
>chr1:136351629-136351631
gtggtgatgcccaaccacagaattattttgttgctactttataactgtaattttgatcct
>chr3:122137593-122137593
atttctagttccagtgactgggagactgaaacaagagagtcacttgagtacaggagtgca
dataFile:
the binding data in bed format consist of all sequences in the seqFile. The
first column is the sequence id (must be the same as those used in seqFile, and
in the same order), and the second column is the measured strength of binding.
chr1:136351629-136351631[tab]312
chr3:122137593-122137593[tab]307
motifFile:
the motif of the TF. The header line consists of motif name, length and pseudocount (0.5 should be OK for most motifs).
>Nanog 9 0.5
20 225 46 209
70 0 19 411
50 66 381 3
434 45 0 21
55 5 66 374
17 32 222 229
74 18 325 83
8 243 146 103
48 145 6 301
<
epiFile: the
epigenomic mark data in wig format. It should cover whole genome. Otherwise,
the program won’t be able to find the epigenomic data to annotate the binding
sites.
track
name=mouse
fixedStep
chrom=chr1 start=0 step=25
0.00
0.00
1.50
1.50
1.50
0.00
0.00
0.00
Output:
(1) Estimated parameters: binding
parameter (how strongly the TF binds with its binding site); the interaction parameters
between TF and epigenomic mark: greater than 1 if favorable interaction, less
than 1 unfavorable, 1 if no interaction.
(2) Pearson correlation between
predicted binding and observed binding.
Advanced options
-ts
<testSeqFile>: test the trained model in
additional testing data. The format of testSeqFile is the same as seqFile.
-td <testDataFile>:
test the trained model in additional testing data.
The format of testDatafile is the same as dataFile.
-p <trainPredictionFile>:
print the predicted binding intensities (of the training sequences in seqFile)
in the file trainPredictionFile.
-tp <testPredictionFile>: print the predicted binding
intensities (of the testing sequences in seqFile) in the file testPredictionFile.
Contact Chieh-Chun Chen (cchen63 AT illinois DOT edu) for
any problem concerning APEG.