Quick Start

Input format

SlideSort accepts FASTA formtat.

Latest version of SlideSort accepts both DNA sequence and protein sequence.

Execution of SlideSort

  • Typical usage:
    ./slidesort -d 4 -i input_file -o output_file
    • Find similar pairs with distance threshold 4
    • -i name of input file
    • -o name of output file
  • Finding pairs from some part of an input sequences
    ./slidesort -d 4  -p -fst_head 9 -fst_size 100 -i input_file -o output_file 
    • 100 sequences from 10th sequence are searched.
    • please notice that sequencial ID starts from "0". (e.g. first sequence's ID is 0, second sequence's ID is 1 and so on.)
  • Comparing two datasets
    ./slidesort -d 4 -m -fst_head 9 -fst_size 100 -snd_head 149 -snd_size 30  -i input_file -o output_file 
  • 100 sequences from 10th sequence are compared to 30 sequences from 150th sequence.
  • other options
    • -a : write alignment (Notice that this option causes large output size.)
    • -c : type of input string. DNA: DNA seq, PROTEIN: protein seq, INT: integer seq (default=DNA)
    • -u : include sequence with unknown character. ex) n, N, Z, etc...
    • -g : gap extention cost (default=1, same value as mismatch cost)
    • -G : internal gap open cost (default=0, must be positive value)
  • using hamming distance

Advanced Usage

SlideSort is also provided as library of C/C++.

In most cases of large scale data analyses. output of SlideSort become extremely large, which leads to huge overhead of file I/O.

By using library version of SlideSort (libSlideSort), similar pairs can be obtained directly from call back function.

Detailed guide is written in libSlideSort.

SlideSort with GUI

Attach file: filevisualslidesort.png 254 download [Information]
Front page   Edit Freeze Diff Backup Upload Copy Rename Reload   New List of pages Search Recent changes   Help   RSS of recent changes
Last-modified: 2012-06-05 (Tue) 05:27:17 (3265d). Site admin: Kana Shimizu