perEditor


A tool to create personalized genome sequences


 

Home Download  Installation  Usage  VCF format  Tutorial

 

Usage

 

perEditor

 

This tool only works with phased SNP/indels (that is, we know for diploid organisms if the variations are in the maternal or paternal chromosome). The syntaxis is:

 

perEditor ref_sequence.fa snps_indels.vcf allele indivicual output_sequence.fa

 

Where:

 

ref_sequence.fa    (fasta file)     Reference DNA sequence.

snps_indels.vcf    (VCF file)       File containing SNPs and indels information according to the VCF format version 4.1.

allele             (mother, father) Maternal or paternal allele used to build the new reference sequence.

individual         (integer)        Particular individual among all the samples presented in the VCF file.

output_sequence.fa (fasta file)     Name of the output file where is going to be store the customized DNA sequence.

 

perEditor_ra

 

perEditor_ra allows the user to take into account chromosome rearrangement information to generate personalized reference genomes. perEditor_ra only works with phased data. To use perEditor_ra you must save in the same folder all the fasta files of the chromosomes involved in the chromosome rearrangement. Then, in such folder you can run perEditor_ra using the following syntaxis:

 

perEditor_ra rearrangements.vcf chr_length.bed centromeres.bed individual allele

 

Where:

 

rearrangments.vcf  (VCF4.1 file)    Rearrangment annotation file that is formatted following the convention of the VCF format version 4.1.

chr_length.bed     (BED file)       File containing the length (total number of nucleotides) of each chromosome of the genome (see on Tutorial an example of the format of this file).

centromeres.bed    (BED file)       File containing the position of the centromeres of each chromosome (see on Tutorial an example of the format of this file).

This information is necessary as the centromere coordenates are used to determine the identity of the new chromosome. That is, when two chromosomes are merged due to a rearrangement process, the resulting chromosome will be named after the name of the source of its centromere. If a new chromosome inherited two centromere, its name will contain the name of the two parental chromosomes.

allele             (mother, father) Maternal or paternal allele used to build the new reference sequence.

individual         (integer)        Particular individual among all the samples presented in the VCF file.

 

Both BED files as well as the VCF file must be in the same working folder.  perEditor_ra create copies of the original chromosomes, where each new chromosome (in fasta format) will have its original name, but with the string _new attached at the end o if.