SPECTRAL REPEAT FINDER

HELP AND DOCUMENTATION

  1. Input Sequence

    The DNA sequence can be pasted into the text area. Or a file containing nucleotide sequence can be uploaded using this option.

  2. Format of Input Sequence

    In case the Format of the sequence is any of the standard ones (EMBL, FASTA, GENBANK, etc.) then `Sequence Format' should be selected appropriately to 'FASTA','GENBANK', or `OTHER (READSEQ Formats)'. The SRF server uses READSEQ program developed by D.G. Gilbert Indiana University to convert the format of your sequence to fasta. In case the input sequence is just plain text, set the `Format Type' to Plain Text (Single Letter Code). By default the server takes only single letter code of nucleotide bases. The server also has the capability to ignore all the non-standard characters such as ,*%!@$% etc.

  3. Minimum and Maximum Length of Repeat Unit

    The minimum and maximum size of repeat unit allowed by SRF server is 2 bp and 300 bp respectively. By default the server only searches for repeat units of size upto 10 bp.

  4. Minimum % Match (%Identity)

    Represents the minimum identity a DNA segment should have to qualify for being taken as a copy of repeat unit.

  5. Minimum Number of Copies

    This represents the minimum number that the user should expect from SRF server to consider a pattern to be sufficient as representative of that region.

  6. FFT Peak Cut - Off

    The default cut-off for spectral peak is 4. This value has been derived from previous evaluations on certain DNA sequences (Tiwari et al., 1997). The user can however change the parameter to his/her requirement.

  7. Window Size for Window Scan FFT

    The SRF server uses different window size for DNA sequence of different lengths.
    For a sequence below 300 bp, the window scan FFT is not used.
    For a sequence of length between 300 bp and 600 bp a window size of 100 bp is used.
    For any DNA sequence above 600 bp, a window size of 300 bp is used for repeats of 2-150 bp. For finding repeats of length 151-300 bp, a window size of 600 bp is used.

  8. Score

    It is representation of the percent identity, where the value 1 denotes 100% identity. The score for a pattern is calculated as the Sum of all perfect matches between the individual bases with respect to the query pattern divided by the length of pattern, where a perfect match has a score of 1 added to tally. The query pattern refers to the pattern that is initially obtained from a region with respect to which the repeat patterns are searched.

  9. Example for SRF Output
    Input Sequence (Accession Number M96445):
        1 aaaaattaac tgggtgtggt agtgtgcacc tgtggttcta gctactcggg aggctgaggt 60
       61 aggaggcttg cttgacccca ggaggtcaag gctgtggtga gctgagattg taccattgca 120
      121 ctctagcctg ggcaacagat ccagaccttg tctctaaatt aaaacaaaac aaaaccaaac 180
      181 aaaaaaacag ctgataagga aggttataac agaacactgt tctctcttta cacacacaca 240
      241 cacacacaca cacacacaca cacacacaca cacacatcac acgtacagga attattttaa 300
      301 cctatcagtt acatggtggt ttcacaggtt tcaacttcat caacccagaa ccacaatcac 360
      361 agattttggc tagactctga ctctcatcta ctagtgataa caacaagttc cctgtggagt 420
      421 ttatagccca cagattatca
    

    Output (With Default Settings):

    RESULT

  10. Fourier Spectrum

    The Fourier spectrum is a plot between the Power and Frequency. The algorithm used in SRF computes the power of different mers represented by frequency as inverse of mer. Therefore, any region having a repetitive sequence structure will show a peak above 'threshold' for the mer that is repeated. For e.g. any region that may be having 3-mer repeats will have Power above threshold at inverse of 3 i.e. 1/3=0.333. This value is refered to as the frequency. In the Window Scan FFT spectrum, the plot is between the Peak obtained for a particular mer along the total length of the sequence. This allows the users to observe the occurence of repetitive mer along the input sequence.