SPECTRAL REPEAT FINDER |
The DNA sequence can be pasted into the text area. Or a file containing nucleotide sequence can be uploaded using this option.
In case the Format of the sequence is any of the standard ones (EMBL, FASTA, GENBANK, etc.) then `Sequence Format' should be selected appropriately to 'FASTA','GENBANK', or `OTHER (READSEQ Formats)'. The SRF server uses READSEQ program developed by D.G. Gilbert Indiana University to convert the format of your sequence to fasta. In case the input sequence is just plain text, set the `Format Type' to Plain Text (Single Letter Code). By default the server takes only single letter code of nucleotide bases. The server also has the capability to ignore all the non-standard characters such as ,*%!@$% etc.
The minimum and maximum size of repeat unit allowed by SRF server is 2 bp and 300 bp respectively. By default the server only searches for repeat units of size upto 10 bp.
Represents the minimum identity a DNA segment should have to qualify for being taken as a copy of repeat unit.
This represents the minimum number that the user should expect from SRF server to consider a pattern to be sufficient as representative of that region.
The default cut-off for spectral peak is 4. This value has been derived from previous evaluations on certain DNA sequences (Tiwari et al., 1997). The user can however change the parameter to his/her requirement.
The SRF server uses different window size for DNA sequence of different lengths.
For a sequence below 300 bp, the window scan FFT is not used.
For a sequence of length between 300 bp and 600 bp a window size of 100 bp is used.
For any DNA sequence above 600 bp, a window size of 300 bp is used for repeats of 2-150 bp. For finding repeats of length 151-300 bp, a window size of 600 bp is used.
It is representation of the percent identity, where the value 1 denotes 100% identity. The score for a pattern is calculated as the Sum of all perfect matches between the individual bases with respect to the query pattern divided by the length of pattern, where a perfect match has a score of 1 added to tally. The query pattern refers to the pattern that is initially obtained from a region with respect to which the repeat patterns are searched.
1 aaaaattaac tgggtgtggt agtgtgcacc tgtggttcta gctactcggg aggctgaggt 60 61 aggaggcttg cttgacccca ggaggtcaag gctgtggtga gctgagattg taccattgca 120 121 ctctagcctg ggcaacagat ccagaccttg tctctaaatt aaaacaaaac aaaaccaaac 180 181 aaaaaaacag ctgataagga aggttataac agaacactgt tctctcttta cacacacaca 240 241 cacacacaca cacacacaca cacacacaca cacacatcac acgtacagga attattttaa 300 301 cctatcagtt acatggtggt ttcacaggtt tcaacttcat caacccagaa ccacaatcac 360 361 agattttggc tagactctga ctctcatcta ctagtgataa caacaagttc cctgtggagt 420 421 ttatagccca cagattatca
The Fourier spectrum is a plot between the Power and Frequency. The algorithm used in SRF computes the power of different mers represented by frequency as inverse of mer. Therefore, any region having a repetitive sequence structure will show a peak above 'threshold' for the mer that is repeated. For e.g. any region that may be having 3-mer repeats will have Power above threshold at inverse of 3 i.e. 1/3=0.333. This value is refered to as the frequency. In the Window Scan FFT spectrum, the plot is between the Peak obtained for a particular mer along the total length of the sequence. This allows the users to observe the occurence of repetitive mer along the input sequence.