Pslpred: A svm based method for the subcellular localization of Prokaryotic proteins

Name of the protein sequence:
This is an optional field. User may or may not enter the name of the sequence.

Input sequence:
Sequence can be submitted by two ways. User can paste the sequence directly into the inbox field provided or upload the file by using the "BROWSE" option.Sequences must be entered in the one-letter code. All the non standard characters will be ignored from the sequence.

Sequence format:
Pslpred can accept both the formatted or unformatted protein sequences. It uses ReadSeq routine to parse the input. The user should check the format of the input sequence before submittin the prediction. The results of the prediction will be wrong if the format choosen is wrong.

Prediction Approaches:
There are 5 types of approaches available for the prediction of five different types of subcellular localization (cytoplasm, extracellular, inner-membrane, outer-membrane, periplasm) of the prokaryotic proteins.Users have the option to choose either of the prediction approach available. The brief account of all the approaches is given below:

Amino acid composition:

A SVM module developed on the basis of fraction of 20 types of amino acid presnt in a protein. Amino acid composition based SVM module can predict cytoplasmic, extracellular, inner-membrane, outer-membrane, periplamic localization with 87% 78% 87% 94% 80% accuracy respectively. The calculation of amino acid composition generates the 20 dimensional input vectors for each protein sequence which were used to train five types of SVM models for the five types of subcellular localizations. The composition based SVM module has been predicted with an overall accuracy of 86%.

Composition of physico-chemical properties:

A SVM module developed on the basis of composition of 33 physico-chemical properties of the protein sequences. SVM module has been provided with an input vector of 33 dimensions for each sequence. The overall accuracy of properties based SVM module is 83%, ~3% lesser then amino acid composition based SVM module.

Dipeptide composition :

The dipeptide composition based SVM module encompasses the information about amino acid composition along local order of amino acid.It uses the fixed pattern length of a vector with 400 dimensions. The SVM module has been predicted with an overall accuracy of 86%, similiar to amino acid composition based SVM module.

PSI-BLAST:

Since homology of the protein with other related sequence also provides broad range of the evolutionary information, therefore we have also developed PSI-BLAST module to predict subcellular localization of prokaryotic proteins. The performance of this module is poorer as compared to other modules developed in the present study. The SVM module based on this approach was able to predict the subcellular localization of the proteins with overall accuracy of 68%.

Hybrid based approach:

To enhance the prediction accuracy, we have devised methodologies to encapsulate more comprehensive information of a protein. A SVM-based module called as hybrid module was constructed on the basis of comprehensive information about the proteins including amino acid composition, dipeptide composition, composition of physico-chemical properties, and PSI-BLAST results.This module uses an input vector of 459 dimensions.The hybrid module was able to achieve a striking accuracy of 91%. The result confirmed that detection of subcellular localization of proteins requires wide range of information about a protein.

Output:
The output shows the input data as submitted by the user along with the prediction results. It gives the name ( if provided), input sequence, length of the sequence and prediction approach as used by the users. In addition to this different scores generated for all the four types of locations are also given. In case of hybrid approach, details such as RI value and expected accuracy are also displayed.