Description
Protein Sequence Multiple Alignment


Method :

TBIAT (Tree-base Best-first Iterative Algorithm with Tree-dependent partitioning)

To date, most multiple alignment systems have employed a tree-based algorithm, which combines the results of group-to-group pairwise alignment in a tree-like order of sequence similarity. The alignment quality is not, however, high enough when the sequence similarity is low. Once an error occurs in the alignment process, that error can never be corrected.

Our algorithm iteratively apply group-to-group pairwise alignment to partially aligned sequences to improve their alignment quality, whenever two subalignments are merged in a tree-based way. The iteration corrects any errors that may have occurred in the tree-based alignment process. Such an iterative strategy requires heuristic search methods to solve practical alignment problems. We employed best-first search with tree-dependent partitioning, and parallelized its search step to reduce the execution time of the iterative algorithm.

References :

Y. Totoki, Y. Akiyama, K. Onizuka, T. Noguchi, M. Saito, and M. Ando :
"Employing A* Algorithm in Parallel Multiple Protein Sequence Alignment",
IPSJ SIG Notes, 97-MPS-16-4, pp.19-24 (1997).[in Japanese]

M. Hirosawa, Y. Totoki, M. Hoshida, and M. Ishikawa :
"Comprehensive Study on Iterative Algorithms of Multiple Sequence Alignment",
Comput. Applic. Biosci., Vol.11, No.1, pp.13-18 (1995).


Restrictions :

  1. Sequence length
    Sorry. Sequence length limit: Sequence length <= 1000
    If you want to align sequences of long length, please connect and send your data to papia@m.aist.go.jp.

  2. Sequence data size
    Sorry. Data size limit:
    (1) Dynamic programming: Maximum length * Number of sequence <= 10000
    (2) A* algorithm: Maximum length * Number of sequence <= 5000
    If you want to align sequences of big size, please connect and send your data to papia@m.aist.go.jp.

How to use :

  1. See 'Service status' and confirm the service is ON.
  2. Select 'Score Matrix'.
    'BLOSUM45','BLOSUM62','BLOSUM80','PAM120' and 'PAM250' are available.
  3. Set 'Gap Cost'.
    Select 'System default' or 'User defined'.
    If you select 'User defined', fill in each field for 'Gap' penalty.
    • Opening Gap : Cost penalty for opening gap
    • Extension Gap : Cost penalty for extension gap
    • Out Gap : Cost penalty for out gap
    Each 'Gap' penalty must be integer and "0 <= Gap_penalty <= 100".
  4. Set 'Searching method' which is used in group-to-group pairwise alignment.
    Select 'Dynamic programming' or 'A* algorithm'.
    If you select 'Dynamic programming', fill in each field for 'DP Cutoff'.
    'DP Cutoff' means the cut-off of the search space in dynamic programming matrix.
    Fill the ratio of the cut-off in both fields 'Tree-base' and 'Iterative'.
  5. The ratio of 'DP Cutoff' must be integer
    and "0 <= DP_Cutoff <= 100".
  6. Enter any label for your query into the field 'Query label='.
  7. Paste your sequences into the text area 'Input Sequences'.
    Some formats are available.
    5.1.
    Fasta format is available.
    Example :
    >CSRC_HUMAN
    KLGQGCFGEVWMGTWNGTTRVAIKTLKPGTMSPEAFLQEAQVMKKLRHEKLVQLYAVVSEEPIYIVTEYMSKGSLLDFLK
    >CABL_HUMAN
    KLGGGQYGEVYEGVWKKYSLTVAVKTLKEDTMEVEEFLKEAAVMKEIKHPNLVQLLGVCTREPPFYIITEFMTYGNLLDY
    >EPH_HUMAN
    IGEGEFGEVYRGTLRLPSQDCKTVAIKTLKDTSPGGQWWNFLREATIMGQFSHPHILHLEGVVTKRKPIMIITEFMENGA
    >FER_HUMAN
    LLGKGNFGEVYKGTLKDKTSVAVKTCKEDLPQELKIKFLQEAKILKQYDHPNIVKLIGVCTQRQPVYIIMELVSGGDFLT
    5.2.
    One-line Format : (label) (amino acid sequence)
    One sequence must be pasted on one line.
    Example :
    CSRC_HUMAN  KLGQGCFGEVWMGTWNGTTRVAIKTLKPGTMSPEAFLQEAQVMKKLRHEKLVQLYAVVSEEPIYIVTEYMSKGSLLDFLK
    CABL_HUMAN  KLGGGQYGEVYEGVWKKYSLTVAVKTLKEDTMEVEEFLKEAAVMKEIKHPNLVQLLGVCTREPPFYIITEFMTYGNLLDY
    EPH_HUMAN   IGEGEFGEVYRGTLRLPSQDCKTVAIKTLKDTSPGGQWWNFLREATIMGQFSHPHILHLEGVVTKRKPIMIITEFMENGA
    FER_HUMAN   LLGKGNFGEVYKGTLKDKTSVAVKTCKEDLPQELKIKFLQEAKILKQYDHPNIVKLIGVCTQRQPVYIIMELVSGGDFLT
    5.3.
    No label Format : (amino acid sequence)
    One sequence must be pasted on one line.
    Example :
    KLGQGCFGEVWMGTWNGTTRVAIKTLKPGTMSPEAFLQEAQVMKKLRHEKLVQLYAVVSEEPIYIVTEYMSKGSLLDFLK
    KLGGGQYGEVYEGVWKKYSLTVAVKTLKEDTMEVEEFLKEAAVMKEIKHPNLVQLLGVCTREPPFYIITEFMTYGNLLDY
    IGEGEFGEVYRGTLRLPSQDCKTVAIKTLKDTSPGGQWWNFLREATIMGQFSHPHILHLEGVVTKRKPIMIITEFMENGA
    LLGKGNFGEVYKGTLKDKTSVAVKTCKEDLPQELKIKFLQEAKILKQYDHPNIVKLIGVCTQRQPVYIIMELVSGGDFLT
  8. If you want to reset the input form, Push 'Reset this form'.
  9. Push 'Service status' button to confirm the service status for your query.
  10. Then, push 'Submit' button to submit your query to the server.
  11. Results Example

PAPIA system, papia@m.aist.go.jp
Copyright © 1997-2000, Parallel Application TRC Lab. , RWCP , Japan
Copyright © 2001, Computational Biology Research Center , AIST , Japan