perEditor
A tool to create personalized genome sequences
Home Download Installation Usage VCF format Tutorial
Usage
perEditor
This tool only works with
phased SNP/indels (that is, we know for diploid organisms if the variations are
in the maternal or paternal chromosome). The syntaxis is:
perEditor
ref_sequence.fa snps_indels.vcf allele indivicual output_sequence.fa
Where:
ref_sequence.fa (fasta file) Reference
DNA sequence.
snps_indels.vcf (VCF file) File containing SNPs and indels information according to the VCF format version 4.1.
allele (mother, father) Maternal or paternal allele used to build the new reference sequence.
individual (integer) Particular
individual among all the samples presented in the VCF file.
output_sequence.fa
(fasta file) Name of the output file where is going to be store
the customized DNA sequence.
perEditor_ra
perEditor_ra
allows the user to take into account chromosome rearrangement information to
generate personalized reference genomes. perEditor_ra only works with phased
data. To use perEditor_ra you must
save in the same folder all the fasta files of the chromosomes involved in the
chromosome rearrangement. Then, in such folder you can run perEditor_ra using the following
syntaxis:
perEditor_ra
rearrangements.vcf chr_length.bed centromeres.bed individual allele
Where:
rearrangments.vcf (VCF4.1 file) Rearrangment
annotation file that is formatted following the convention of the VCF format version 4.1.
chr_length.bed (BED file) File
containing the length (total number of nucleotides) of each chromosome of the
genome (see on Tutorial an example of the format of
this file).
centromeres.bed (BED file) File
containing the position of the centromeres of each chromosome (see on Tutorial an example of the format of this file).
This
information is necessary as the centromere coordenates are used to
determine the identity of
the new chromosome. That is, when two chromosomes are merged due to a
rearrangement process, the resulting chromosome will be named after the
name of
the source of its centromere. If a new chromosome inherited two
centromere, its name will contain the name of the two parental
chromosomes.
allele (mother, father) Maternal or paternal allele used to build the new reference sequence.
individual (integer) Particular
individual among all the samples presented in the VCF file.
Both BED files as well as
the VCF file must be in the same working folder. perEditor_ra create copies of the
original chromosomes, where each new chromosome (in fasta format) will have its
original name, but with the string _new attached at the end o if.