HELP FILE MHCBench Version:1.0
Importance of MHC binding peptides and their prediction:
Antigen specific T-cells are the main regulatory cells of immune system. They
recognize their targets only in association with major histocompatibility complex
(MHC) molecules. Thus, the MHC molecules play a pivotal role in determining which
antigens will be visible to T-cells. During antigen processing, an antigen is taken up
by the antigen presenting cells (APCs) and processed to small peptides. Some of these
peptides bind to a given MHC class II molecules and are displayed on the surface of
APC for recognition by antigen specific T-helper cells.
Because the MHC binding is the necessary condition for T-cell activation, an
understanding of how and which peptides would bind to a MHC molecule is critical to
explore the immune reactions. Such a study would find direct application in cellular
immunology, transplantation, vaccine design, immunodiagnositcs, immunotheraputics, and
molecular understanding of autoimmune susceptibility.
The experimental identification of MHC binding regions involve extensive peptide
synthesis, thus is both time consuming and expensive. Therefore, the computer-aided
prediction of the MHC binding regions is the best possible alternative. The prediction
algorithms can effectively narrow down the number of peptide required to be
synthesized and assayed.
Approaches used for prediction:
Generally three basic approaches have been used for the computer-aided prediction of MHC binding regions. These are: the motif based, the matrix based, and the artificial neural network (ANN) based (Hagmann, 2000).
A motif consists of two or more key amino acid residues that are thought to be essential for binding. The motifs can be general or allele specific. These are well established and have been in use by some of the most popular T-cell epitope prediction algorithms such as EPIMER and OPTIMER (Meister et al., 1995). The drawback with motifs is that only 60 to 70% binders contain binding motifs. Thus, their prediction accuracies are low.
The matrices are essentially the refined motifs. The matrix (X, Y) represents the probability of occurrence of an amino acid (X) at a specific position (Y). Summation of position specific values, from the matrix, for a given peptide yields the predicted binding score that is used for decision. The matrix-based methods provide fast prediction. However, they fail in handling non-linear data.
To work around the non-linearity in data the Artificial Neural Networks (ANNs) were developed (Brusic et al., 1998). These methods try to learn the binding patterns from a given set of peptides. The major drawback with ANN is that it requires large amount of pre-processed data for training.
These approaches have already started proving their mettle (Stassar et al., 2001; Cochlovius et al., 2000).
Need for comparative evaluation:
Over the years a number of MHC Class II binding peptide prediction methods have been developed. The efficacy of these algorithms is generally defined in terms of their power to discriminate between binders and non-binders. In many cases, there is not enough evidence (except for success against one or two antigens) to support the predictions. Therefore, it is important to know the reliablity of these programs. It concerns both users and developers. Lab bench experiments and animal trials are often based on the epitope predictions. They usually require a substantial investment in time and resources. Thus, it is important for a user to know how well a certain algorithm performs, what are its strengths and weakness. A developer may be more interested in knowing the current state of the art. The performance of algorithms already in use. The pitfalls or weaknesses that need to be addressed.
THE MHCBENCH SERVER
What is the MHCBench:
The MHCBench is an automatic web service for evaluating the performance of MHC binding peptide prediction methods. It offers users:
The following features are available upon request:
- Curated and defined data sets.
- Calculation of Threshold dependent and Threshold independent parameters
- A platform for comparing the performance of new methods with the available methods
For all services, you can submit your data in this format interactively from World Wide Web.
- Submission of new data sets
- Submission of new prediction methods
The MHCBench offers two classes of evluation parameters:
These are the most widely used parameters for evaluating the MHC binding peptide
prediction methods. The threshold dependent parameters are calculated from the
confusion matrix generated at a specific threshold. Where as the ROC is a threshold
independent measure calculated as the area under the curve between sensitivity (true
positive proportion) and 1-specificity (true negative proportion). For details please
click on the parameter.
Please click on the link to get a more information.
Prediction methods available at MHCBench:
Currently, the MHCBench is contains tweleve different HLA-DRB1*0401 binding peptide prediction methods. These include the motifs, the matrices and the ANN. The user can find more information about these methods form the orignal papers.
HOW TO USE THE SERVER
The server reqiures a very simple but specific input format. Below are a few example sequences that can be directly fed into the server.
The input is arranged as SEQUENCE,ObservedState,PredictedState. The length or type of SEQUENCE may vary. For example the same set can also be fed as:
Please do not use special characters as you may end up in mess. It is alos important to use the "commas (,)" as a seperator in each input string.
The observerd state is the actual experimental value. For example in case of MHC binding peptides, a peptide can be either be a Binder represented as "1" or a non-binder represented as "0". Please use only 1 and 0 as representation.
The predicted state is the score calculated by the method. For example in case of MHC binding peptides, a peptide is assigned score according to the values in the matrix or motif.
Please take care that the predicted score should be between 0 and 1. Zero correspond to non-binding state and one to binding; with all possible in between values.
Before submitting the input data the user has to choose for the type of evaluation.
The Threhold Independent Evaluation (TIE) is performed in terms of Receiver Operating Characteristic (ROC). It is the single valued parameter that represents the performance of any predictor over the entire threshold range. The better the value the better is the algorithm. A detailed description can be seen here . After submitting the data the server calculates the ROC value and presents output in a simple table. The buttons above and below the table will take users to the graphical representation of ROC plot. The plot gives a visulization of the range in which the predictor performs better or worse.
The Threhold Dependent Evaluation (TDE) is performed in terms of parameter calculated form confusion matrix. Since the numbers in confusion matrix changes with the threshold, these parameters behave as a functions of thershold. The parameters include: , Specificity, , , , . These are some of the widely used threshold dependent evaluation parameters. Before submitting the data the users can select the cut-off or threshold against which they require the parameters to be calculated.
After submitting the data the results are represented in tabular format.
- Maximum Correlation Coefficient is the threshold at which the correlation
coefficient is highest. At this threshold, the user will have the optimized
- Higher coverage means the threshold at which 90% of binders are predicted as binders. At this threshold, the user can have the confidence that there will be hardly any binder that will be missed by the method.
- Generally when the data set is unbalanced that is contain unequal number of
binders and non-binders, the single valued parameters (e.g. accuracy) will show bias. To give equal weight for both the DFactor take into account the sum of the percent sensitivity and specificity. Selection of this will display evaluation results at a threshold where the Dfactor is highest. At this threshold, the user will have the optimized results.
- Maximum Accuracy is the threshold at which the percent of correctly predicted peptides (both binders and non-binders) is highest. At this threshold, the user will have the optimized results.
- Higher PPV means the threshold at which there is 90% probability that a predicted binder will actually be a binder. At this threshold, the user can have the confidence that what ever the method predicts, there is good chance that it will be a binder.
- Many a times when the optimized threshold is unknown, the threshold at which sensitivity and specificity are almost equal is selected as the optimized threshold. Hence, this is threshold where both are almost equal.
- The 0.5 threshold means halfway in the prediction range. Since the predicted peptides has score in range of 0-1.
Below are two example data files. Either you can download the files and the submit to server or move towards the tutorial page