Our server provides two options for submitting the query sequences. The
first option user can paste their sequence in the given inbox. The other option user can upload the
The server can accept both the
formatted or unformatted raw antigenic sequences.The server uses ReadSeq
routine to parse the input.The user should choose whether the sequence
uploaded or pasted is plain or formatted before running prediction.
The results of the prediction will be wrong if the format choosen is
Please do paste one sequence at a time.
The dataset used in this study consists of 252 secretory
proteins and 252 non-secretory proteins.This dataset was
used to train and test our method.
Support Vector Machine:-
Support Vector Machine Support vector machine (SVM) is a
novel machine learning method. It is based on the statistical learning
theory presented by V.N.Vapnik, it has been successfully applied to numerous
classification and pattern recognition problems such as text categorization,
image recognition and bioinformatics. The application of SVM results in the
globally optimized while with neural networks, the gradient based on training
algorithms and the solution for a classification problems. The SVM light is
a freely downloadable package written by Joachim's which can be downloadable
from http://ais.gmd.de/~thorsten/svm_light/. The SVM_light is used to predict
the secretory protein. The SVM modules were developed based on Aminoacid
Amino acid composition:-
The amino acid composition provided the information of
protein in 20 dimensions vector. The amino acid composition is the
fraction of each amino acid in protein.
It was observed that amino acid composition of surface exposed and non-surface
exposed proteins was somewhat different.Thus a SVM based classifier was
developed using amino acid composition where amino acid composition was used
as input vector of dimension 20. Different kernels and parameters of SVM were
tried and achieved maximum accuracy 83.2% with MCC 0.7 using RBF kernel.
It was interesting to note that our method was able to predict 80.2% secretary
proteins (sensitivity) at specificity 86.3% for threshold 0.0.
Evaluation of Performance:-
The leave-one-out cross validation technique examined the prediction
quality. Leave-one-out cross validation (LOOCV) is a technique where the classifier is occassoinally learned on n-1 samples and tested on the remaining one. The accuracy of
results commonly measured by the quantity of True Positives (TP), True Negatives (TN),
False Positives (FP) and False Negatives (FN). In the prediction system the
total prediction accuracy, Mathew's correlation co-efficient(MCC), sensitivity
and specificity was calculated by following equations.
Sensitivity = TP / (TP+FN),
Specificity = TN / (TN+FP),
Accuracy = TP+TN / TP+TN+FP+FN and
MCC = (TP*TN)-(FP*FN)/(TP+FN)*(TP+FP)*(TN+FP)*(TN+FN).