Description
Protein Sequence Homology Search (Using Score Matrix)


Method :

Local alignment
This local-local alignment finds local similarity between two amino acid sequences based on Smith-Waterman style dynamic-programming method.
Query sequence is compared to all sequences in the specified database (SWISS-PROT protein sequence database, or PDB-REPRDB structure database).
Gap cost is calculated as "cost = a + bk", where "a" is the opening gap cost, "b" is the extend gap cost, and "k" is the gap length.
(Local alignment does not take care of out gaps described below.)

Global-Local alignment
This global-local alignment finds whole-length (of the query) versus local-part (of the target protein in the database) similarity between two amino acid sequences, based on Needleman-Wunsch style dynamic-programming method.
Query sequence is compared to all sequences in the specified database (SWISS-PROT protein sequence database, or PDB-REPRDB structure database).
Gap cost is calculated as "cost = a + bk", where "a" is the opening gap cost, "b" is the extend gap cost, and "k" is the gap length. User can set any coefficient for "a" and "b", though it is recommended to use the default settings for usual cases.
This global-local alignment also takes so-called out gap costs into accounts. An out gap is a gap that exist outside of alignment.
If only very smaller region of the query matches to the (partial) sequence of target, an out gap cost tends to become large. Thus in this case, user is recommended to use zero or smaller coefficient for out gap costs.

Global alignment
This global-global alignment finds whole-length versus whole-length similarity between two amino acid sequences, based on Needleman-Wunsch style dynamic-programming method.
Query sequence is compared to all sequences in the specified database (SWISS-PROT protein sequence database, or PDB-REPRDB structure database).
Gap cost is calculated as "cost = a + bk", where "a" is the opening gap cost, "b" is the extend gap cost, and "k" is the gap length. User can set any coefficient for "a" and "b", though it is recommended to use the default settings for usual cases.
Like the global-local alignment described above, this global alignment also takes so-called out gap costs into accounts.
An out gap is a gap that exist outside of alignment.
If two sequences have very different length, an out gap cost tends to become large. Thus in this case, user is recommended to use zero or smaller coefficient for out gap costs.

References :

Smith, T.F., Waterman, M.S.: "Identification of common molecular subsequences."
J. Mol. Biol. Vol.147, pp.195-197 (1981).

Needleman, S.B., Wunsch, C.D.: "A general method applicable to the search for similarities in the amino acid sequence of two proteins."
J. Mol. Biol. Vol.48, pp.443-453 (1970).


How to use :

  1. See 'Service status' and confirm the service is ON.
  2. Select 'Database'.
    PDB-REPRDB and SWISS-PROT are available.
  3. Select 'Score Matrix'.
    'BLOSUM45','BLOSUM62','BLOSUM80','PAM120' and 'PAM250' are available.
  4. Set 'Gap Cost'.
    Select 'System default' or 'User defined'
    ( Click System default to see the gap cost table of system default.)
    When you select 'User defined', fill in the 'Gap' penalties.
    • Opening Gap : Cost penalty for opening gap
    • Extension Gap : Cost penalty for extension gap
    • Out Gap : Cost penalty for out gap
    Each 'Gap' penalty must be integer and "0 <= Gap_penalty <= 100".
  5. Set 'Search Threshold'.
    Select 'Sorted by Homology Score' or 'Sorted by exact match rate'.
    • Homology Score : that by alignment result with no query gap
    • exact match rate : percentage of identical residues between the two sequences
    Put appropriate values in upper and lower threshold fields.
    ( You can omit both thresholds at a time. The upper and lower thresholds for Homology Score may be automatically filled in, which are suitable for using BLOSUM62.)
  6. Set 'Max. Output Number'.
    The output will be truncated by this number.
  7. Select 'Method'.
    'Smith-Waterman style DP local local','Needleman-Wunsch style DP global local','Needleman-Wunsch style DP global global' are available.
  8. Enter any label for your query into 'Query label=' field.
  9. Enter your amino acid sequence into the field 'Input Query Sequence'.
    Two formats are acceptable.
    8.1
    Simple amino acid sequence is acceptable.
    Example :
    KLGQGCFGEVWMGTWNGTTRVAIKTLKPGTMSPEAFLQEAQVMKKLRHEKLVQLYAVVSEEPIYIVTEYMSKGSLLDFLK

    8.2
    Fasta format is acceptable.
    Example :
    >CSRC_HUMAN
    KLGQGCFGEVWMGTWNGTTRVAIKTLKPGTMSPEAFLQEAQVMKKLRHEKLVQLYAVVSEEPIYIVTEYMSKGSLLDFLK
    ( If the field 'Query label=' is blank, the label after '>' in the first line will be used as the label for the sequence. The character '>' in the sequence will terminate the sequence, and other non-alphabetical characters will be removed. )

    Maximum length is 2,000 a.a.
    If you selected Database:SWISS-PROT, sequence length =< 300 a.a.

  10. Push 'Service status' button to confirm the service status for your query.
  11. Push 'Reset this form' only if you want to reset the input form.
  12. Then, push 'Submit' button to submit your query to the server.
  13. Results Example

PAPIA system, papia@m.aist.go.jp
Copyright © 1997-2000, Parallel Application TRC Lab. , RWCP , Japan
Copyright © 2001, Computational Biology Research Center , AIST , Japan