concatenator
Concatenates alignments into a single alignment!
Use cases
A typical case is to concatenate alignments of 1-1 orthologs to create a large alignment that will be used to reconstruct a species phylogeny
Options
In this section you can find a number of options that can be used.
General options
- -h, --help
Prints a simple help message with a small description of all the available options.
- -i, --in <FILES>
A list of alignment files that will be concatenated.
- -d, --directory <DIRECTORY>
A directory containing the alignment files. Only files with a specified ending (see
--ending
) will be used.
- -e <STRING>, --ending <STRING>
The file ending to use together with the
--directory
option. (default: .fa)
- -l, --lenient
If a sequence is missing the space will be filled with gaps.
- -o, --out <FILE>
The file to write the output to. If none is given, the output will be printed to the terminal.
Name detection options
To concatenate two alignments, the correct sequences need to be concatenated. By default, a complete match of the sequence name is required to determine the match. However, the options below introduce ways to allow only partial matches.
- -D CHAR, --delimiter CHAR
The first part of the sequence until the first occurrence of the delimiter is used to determine the match. If your files look like:
>Mnat_123acgtacgt>Znev_456acgtacgt>Csec_789acgtacgtthen you would use:
-D _
. The parts used for matching are then Mnat, Znev and Csec.
- -p <STRINGS>, --pattern <STRINGS>
For each sequence to be concatenated, a pattern is provided that is found in the sequence name. If your files look like:
>abc_Mnat_123acgtacgt>def_Znev_456acgtacgt>ghi_Csec_789acgtacgtthen you would use:
-p Mnat Znev Csec
Note
Only sequences matching the given patterns will be concatenated. Other sequences will be ignored.
Examples
You can find the input files in the concatenate example folder. You can execute them and have a look at the input files.
Here are some example where the matching is simply based on the full sequence name.
# Concatenate alignments explicitly giving the names
$ concatenator -i complete/*.fasta
# Same but providing the directory
$ concatenator -d complete -e .fasta
Here are some example where the matching is based on a common prefix. The delimiter determines where the prefix ends.
# Concatenate alignments based on their prefix
concatenator -i prefix/aln1.fasta prefix/aln2.fasta -D _
# Concatenate alignments based on their prefix allowing for missing sequences
concatenator -i prefix/aln1.fasta prefix/aln2.fasta prefix/aln3.fasta -D _ -l
Here is an example where given patterns is used to concatenate the sequences
# Concatenate alignments based on the patterns given
concatenator -i pattern/aln1.fasta pattern/aln2.fasta -p seqA seqC seqG