AlignMe

Frequently Asked Questions

1) General

Q: What are the different types of alignment available, and what are they useful for?
A: AlignMe allows the user to perform pair-wise alignments based on either: (A) two protein sequences or (B) two multiple sequence alignments, using a Needleman-Wunsch algorithm. (A) Detection of conserved segments of distant homologues is possible by measuring the similarity between two proteins based on a weighted combination of substitution matrices, hydrophobicity scales and secondary structure predictions. (B) Transmembrane topologies may be compared by aligning two hydropathy profiles generated from multiple sequence alignments. Each multiple sequence alignment should contain one of the sequences of interest aligned to sequences of homologs from e.g. a PSI-BLAST search.

Q: Is it possible to align globular proteins?
A: Yes, it is. However, we did not optimize gap penalties for globular proteins and therefore you have to choose reliable custom penalties!

Q: Why is Javascript required by your website?
A: AlignMe is a very flexible program with many different possible input options and therefore, the website has also to be very flexible, which we could only provide by using Javascript.

Q: Where can I download AlignMe?
A: You can download the source code of AlignMe, as well as examples and manuals at: www.forrestlab.org

Q: How can I cite AlignMe?
A: If you have used AlignMe in your work, please cite:
For pairwise alignments: Stamm M., Staritzbichler R., Khafizov K. and Forrest L.R. 2013 PLoS ONE
For the alignment of two MSAs: Khafizov K., Staritzbichler R., Stamm M. and Forrest L.R. 2010, Biochemistry

Q: How can I contact you?
A: If you encounter issues that can not be solved by this FAQ and the help-buttons on each section, then write to AlignMe@rzg.mpg.de and we will fix your problem. We appreciate any kind of feedback!

2) Sequence to sequence alignments

Q: What are the basic inputs that I have to provide?
A: You have two provide at least two fasta sequences in section 1, then choose predefined (or your own) gap penalties in section 2, and finally click on the submit button at the bottom of the page

Q: Can I also align nucleotide sequences?
A: No. This server is optimized for amino acid sequences and therefore treats C as Cysteine, G as Glycine, A as Alanine and T as Threonine.

Q: Are there any limitations regarding the sequence lengths of two proteins to be aligned?
A: No. The vast majority of membrane proteins contains less than 3,000 amino acids, and therefore we do not anticipate serious problems with long sequences. An alignment of the tow longest reported human membrane proteins (two Ca2+ channels, 2352 and 2327 amino-acids long) took 52 minutes to compute.

Q: How did you obtain the "optimized gap penalties"?
A: We tested several input parameters on their own and in combination by applying a systematic scan of different gap penalty sets. More detailed can be obtained from our paper about the AlignMe program: Stamm M., Staritzbichler R., Khafizov K. and Forrest L.R.; 2013; PLoS ONE

Q: Should I use AlignMe PST, AlignMe PS, AlignMe P or fast, but less accurate alignments?
A: This decision depends on your proteins and the objective of your task. If you want to have a quick look if your proteins might be related or not then choose the option "fast, but less accurate alignments". If you are sure that your proteins are related then you have to consider how closely they are related. AlignMe PST performs well for proteins sharing a sequence identity below 15 %, AlignMe PS for those between 15 and 30 % and AlignMe P should be used if the proteins are closely related.

Q: Why is there a field in which I can enter weights for the different inputs?
A: Weights are useful to normalize different input parameters and to prevent a bias of the alignment towards a specific input. Weights may be caculated on the range of values within a certain scale, matrix or prediciton: For example, a substitution matrix containing values from -3.49 to 0.60 (i.e., with a range of values of 4.09) should be given a relative weight of 5.1 when used in combination with a hydrophobicity scale with a range of values from -6 to 15 (i.e., with a range of values of 21).

Q: What is the advantage of using windows for hydrophobicity profiles?
A: A sliding window averages hydrophobicity values over a certain number of neighbour-residues and therefore introduces a new information in the alignment that is based on a certain region and not on a single amino acid only.

Q: How long will my alignments take?
A: Alignment time depends on a) the product of the input sequence lengths, and b) the time taken to compute sequence homology searches, secondary structure predictions and/or transmembrane predictions. For the slowest version (AlignMe PST), the maximum expected wall time required is one hour. A breakdown of the contributions to the compute time (minutes) follows:

Sequence
lengths
2x PSSM
creation
2x SS
predictions
2x TM
predictions
Alignment Total Time
(mins)
101,101 2:40 1:00 1:45 0:02 4:44
282,349 3:50 2:38 2:19 0:03 9:05
529,535 6:20 4:00 1:00 0:07 11:42
2353,2327 16:00 10:00 23:30 2:30 52:00

Please note that the HP mode (Alignment of two Multiple Sequence Alignments) is very fast, even if large numbers of long sequences are input, because the sequences are used to create an averaged hydropathy profile, which is then pair-wise aligned.

Q: Why do I get an error message that an amino acid does not exist in the scale I want to use?
A: The hydrophobicity scales that we are providing on our website contain the 20 standard amino acids in upper case letters (e.g., A, C, D ...). Your sequences in fasta format also have to be in upper case letters! Moreover sequences with unkown atoms (X) are notsupported by these scales - if you want to align such sequences, then please provide a custom scale.

Q: What outputs do I get?
A: The outputs include a pairwise sequence alignment in ClustalW format including the sequence identity and percentage of matched positions. Additionally, plots are provided if you have used a sliding window on a hydropathy scale or a membrane or secondary structure prediciton. These plots show position-specific properties for each position within the two sequences that are aligned.

Q: I do not understand the detailed profiles that are provided on the result page. What do they represent and what does fasta1.ss2 and fasta1.prf stand for?
A: The profiles correspond to the plots that have been created. Columns of fasta1 belong to your first sequence and those of fasta2 to your second sequence. If an automated secondary structure prediction has been used (e.g. in the PS or PST mode), then secondary structure propensity values are displayed in columns that are based on .ss2 files. In .ss2 files, column 5 represents propensities of a residue for being in an alpha-helix (1) or not (0), column 4 for being in a coil and column 6 for being in a beta-sheet. If you have used an automated membrane propensity prediction (e.g. in the PST mode), then membrane propensity values are displayed in columns that are based on .prf files. In this case, column 3 represents the propensities of a residue for being within the membrane (1) or not (0).

Q: Where are my results? I don't get forwared to a result page!
A: We encountered some troubles with timeouts of local proxy servers, especially for alignments including a PsiPred-prediction, which may take about 5 minutes or longer. Therefore, we recommend you to provide your E-Mail address to get a link to the results page via E-Mail.

Q: I got results, but instead of two profiles only one profile is plotted. Why is there no second profile?
A: It is possible that there is a hidden command like "/n" in one of the files that you submitted. Be sure that all the files that you submit are correctly formatted. If you still have problems, please contact us and send us your inputs so that we can look into the problem.

3) Alignment of two Multiple Sequence Alignments

Q: How do I obtain two multiple sequence alignments?
A: You can prepare two sets of homologues sequences using BLAST or PSI-BLAST and then align them separately using a multiple sequence alignment program such as MUSCLE or T-Coffee.

Q: What outputs do I get?
A: You will get a hydropathy plot showing the aligned hydropathy values of the two MSAs and a pairwise sequence alignment of the first sequence of each of the two MSAs. The sequence alignment is formatted in the ClustalW format.

Q: Why do I get only two aligned sequences? Where are the other sequences of the MSAs that I have submitted?
A: The idea of aligning two hydropathy profiles based on multiple sequence alignments is rather to estimate the structural similarity between two protein families rather than to align two multiple sequence alignments (see Khafizov et al., 2010). The resultant family-averaged profile alignments can be analyzed visually using plotted hydropathy profiles. However, to facilitate a more detailed analysis of the resultant alignment we recently introduced a feature for extracting two sequences from the hydropathy profile alignment, so that now also a pair-wise sequence alignment is presented.

Q: There are different symbols indicating gaps. What is the difference between them?
A: The multiple sequence alignments used as input typically already contain gaps, and these are retained during the alignment procedure: these gaps are indicated using the '.' symbol. Additional gaps are introduced to optimally align the averaged hydropathy profiles and are indicated by the '-' symbol.

Q: Is there a limit on the size of the MSAs?
A: No, there are no restrictions of the number of sequences, or the length of the MSA. The generation of a “family-averaged” hydropathy profile and aligning two averaged profiles is very fast (seconds to minutes, even for tens of thousands of positions in the profiles). Moreover, membrane protein sequences are rarely >3,000 amino acids long.