IMTECH Home

Help Documentation ProPred-I  
The Promiscuous MHC Class-I Binding Peptide Prediction Server

 

 

 

About Propred1

Source of Weight Matrices used in Propred1

Algorithm for Prediction of MHC Binders

Prediction of Proteasome Cleavage Sites

Simultaneous Prediction of MHC binders and Proteasome Cleavage sites

Selection of Parameters

Performance of Propred1

A Case Study

Presentation of Results

Limitations of Propred1

 An Example Submission Form

 

 Go Top

 

About Propred1

ProPred1 is an on-line web tool for the prediction of peptide binding to MHC class-I alleles. This is a  matrix based method that allows the prediction of MHC binding sites in an antigenic sequence for 47 MHC class-I alleles. The matrices used in ProPred1 have been obtained from BIMAS server and from the literature. ProPred1 also allows the prediction of the standard proteasome and immunoproteasome cleavage sites in an antigenic sequence. This server has implemented the matrices described by Toes et al., 2001 for the identification of proteasome (standard/constitutive proteasome and immunoproteasome.) cleavage site in an antigenic sequence. It allows filtering of MHC binders, who have cleavage site at C terminus. Recently, Kessler et al., 2001 and Ayyoub et al., 2002 have demonstrated that most of MHC binders having Proteasome cleavage site at their C terminal have high potency to become T cell epitopes. In brief, the server assists users in identification of promiscuous potential T-cell epitopes in an antigenic sequence. These epitopes can serve as suitable vaccine candidates. The server represents MHC binding regions and proteasome cleavage sites in an antigenic sequence in user-friendly formats by presenting the output in graphical or text format. These display formats help the user in easy detection of the promiscuous MHC binding regions in their query sequence.

Propred1 has been installed on a Sun Server (420E) under UNIX (Solaris 7) environment and launched using Apache web server. Most of the programs including common gateway interface (CGI) scripts are written in PERL The protein sequences can be submitted to the ProPred1 by cut-and-paste technique or by directly uploading a sequence file. The server uses ReadSeq (developed by Dr. Don Gilbert) to parse the input sequence, therefore it can accept most of the commonly used sequence formats. The server allows user to select the threshold for their prediction. The threshold plays a vital role in determining the stringency of prediction. Lower the threshold, higher is the stringency of prediction i.e. lower rate of false positives and higher rate of false negatives in the prediction. In contrast, a higher threshold value (low stringency) corresponds to a higher rate of false positives and a lower rate of false negatives.


 
Go Top

Source of Weight Matrices used in Propred1

Matrices for the Prediction of MHC Binders: The matrices used in Propred1 for the prediction of MHC binders for 7 MHC Class I alleles were mostly from BIMAS server. Few matrices were obtained from published literature. Following table provides complete information about sources of matrices (See Matrices for detail).

  

Name of MHC allele

Reference

Comment

HLA-A2.1

Ruppert et al., 1993

Addition Matrix

HLA-B*0702

Sidney et al., 1996

Addition Matrix

HLA-B51

Sidney et al., 1996

Addition Matrix

HLA-B*5301

Sidney et al., 1996

Addition Matrix

HLA-B*5401

Sidney et al., 1996

Addition Matrix

All other MHC alleles

Unpublished

Multiplication  Matrices (BIMAS Server)

 

 

Matrices for the Prediction of Cleavage Sites: The weight matrices for both standard proteasome and immunoproteasome were derived from Table 1 & 2 in the paper of Toes et al., 2001. Each value of these tables were divided by 1000 in order to rationalize the score (See Matrices for detail).  

 Go Top

Algorithm for the Prediction of MHC Binders

The ProPred-1 utilizes matrix data in a linear prediction model where contribution of each amino acid is summed up multiplied depending which type of matrix is used for prediction. The peptides having scores more than a defined (threshold score) are assign as binders. Following is the brief description of algorithms.

Computation of Score

Multiplication Matrices: The most of matrices in Propred1 is multiplication type where the score is calculated by multiplying scores of each position. For example, score of peptide ‘PACDPGRAA” can be calculated by following equation.

Score = P(1) ´ A(2) ´ C(3) ´ D(4) ´ P(5) ´ G(6) ´ R(7) ´ A(8) ´ A(9)                     (1)

Where P(1) is score of P at position 1.

Addition Matrices:  The matrices obtained from the literature are “Addition Matrices”, where score is calculated by summing the scores of each position. For example, score for above peptide “PACDPGRAA” is calculated as follows.

Score = P(1) + A(2) + C(3) + D(4) + P(5) + G(6) + R(7) + A(8) + A(9)                                 (2)

 

Calibration of Threshold Score for each Allele

One of the crucial steps in matrix based methods to adjust the cut-off score called  threshold score, as we obtained these matrices from various sources so its not clear what should be threshold score. The number does not give any sense so we adjust the score in such a way that user can select threshold score in term of percent such as 3% , 4% etc. For example 4% threshold score means that there is 4% chance that your predicted binder is random peptide. We follow the following steps in order to calculate threshold score for each allele/matrix.

i)                    All proteins were obtained from SWISSPROT databases for creating the overlapping peptides of length nine. For example, a protein of length n will have (n+1 – 9) overlapping peptides.

ii)                   The score of all natural 9-mer peptides have been calculated using weight matrix of that allele. These peptides have been sorted on the basis of score in descending order and top 1 % natural peptides have been extracted. The minimum score that we called threshold score was determined from these selected peptides.  Similarly, threshold scores at 2%, 3% … 10%  were calculated.

iii)                 Step 1 and 2 were repeated for each MHC allele in order to calculate threshold score at different percent for each allele used in ProPred1.

 

Identification of MHC binders

In order to identify the MHC binder in an antigen sequence, first Propred1 generate the overlapping 9-mer peptides. In next step, the score of these 9-mer peptides are calculated using quantitative matrix of selected MHC alleles. Finally, all peptides having score greater than selected threshold score (e.g. at 4%) are considered as predicted binders for selected MHC allele. Predicted binders are presented on antigen sequence by different color or along the primary sequence.

 

 Go Top

Prediction of Proteasome Cleavage Sites

The prediction proteasome cleavage site is based on Toes et al., 2001 work. We derived the matrices for standard proteasome from Table 1 of Toe et al., 2001. The derived matrix is an “Addition Matrix” where score of a peptide is calculated by summing the score at each position (See ‘Computation of score’ subsection of section ‘Prediction of MHC binders’ for detail). Similarly, procedure has been adopted for deriving the matrix for immunoproteasome, from the Table 2 of Toes et al., 2001. The major difference between proteasome matrices and MHC matrices is that proteasome matrices consider the peptide of length twelve instead of nine. The cutting site is at the center of 12-mer peptide.  

 

Calibration of Threshold Score

Threshold score for proteasome prediction was computed, in order to provide the confidence to the users. The threshold scores for standard proteasome and immunoproteasome have been calculated at different percent by using the approach described above for calculation of threshold score for MHC alleles. The calculation of threshold score of proteasome matrices requires the 12-mer overlapping peptides. The matrices and cutoff scores at different threshold 1%, 2%, … 10% are available at URL http://www.imtech.res.in/raghava/propred1/matrices/matrix.html .

 

Identification of Cleavage Site

In order to predict proteasome cleavage sites in an antigenic sequence. The overlapping 12-mer peptides were generated for antigenic sequence and score of these peptides were calculated using weight matrix of proteasome. In next step, all peptides having score greater than selected threshold score (e.g. at 4%) are considered as peptides having proteasome cleavage site. The center positions of these peptides (6-position left and 6 position right) are considered as predicted proteasome cleavage site. Similar approach has been utilized for prediction of peptides having immunoproteasome cleavage site. .

 

Go Top 

Simultaneous Prediction of MHC Binders and Proteasome Site

One of the powerful feature of Propred1 is that it allows prediction of MHC binders for various alleles and proteasome cleavage site, simultaneously. This is based on observations of previous studies where it has been demonstrated that most of MHC binders having Proteasome cleavage site at their C terminus have high potency to become T cell epitopes (Kessler et al., 2001 and Ayyoub et al., 2002). The predicted MHC binders are filtered based on prediction of proteasome cleavage sites in an antigenic sequence. Firstly, the server computes the predicted MHC binders and their C terminus position for a selected MHC allele in an antigenic sequence. Secondly server predicts the cleavage sites of proteasome (standard proteasome, immunoproteasome or both) in an antigenic sequence at given threshold (e.g. at 4%). Finally, all predicted MHC binding peptides whose C terminal position coincides with proteasomes cleavage sites were filtered. These peptides were also called predicted potential T-cell epitopes. In other words, we removed the MHC binders from list which does not have proteasome site at C terminous.

 

 Go Top

Selection of Parameters

Selection of Alleles: Propred1 allows user o select the any allele or combination of alleles or all alleles (total 47) of MHC class I. The server will predict the MHC binders of these selected alleles in an antigen/protein sequence.

 

Threshold Score for the Prediction of MHC Binders: The server have default threshold 4%, user may select their own threshold depending on need. we observed that most of the alleles have sensitivity and specificity nearly same at 4% so we set it as default threshold. This is a critical parameter user should vary it according to requirement, for example if user is interested to detect all possible binders of alleles than user should select threshold like 8% or 9% etc. (in this case coverage will be high but probability of prediction will be very poor), if user is only interested in top binders with high confidence than user should select threshold like 1% or 2 % (in this case probability of correct prediction will be high but coverage will be poor).

 

Type of Display: Propred1 allow user to display their result in four formats; i) HTML 1; ii) HTML-II, Graphical and Tabular format. For detail  see the section ‘Presentation of Result’.

 

Proteasome Filters: The user can filter their MHC binders who have standard proteasome of immunoproteasome cleavage sites at C terminus. By default its off, user can ‘ON’ these filters to see which of their MHC binders have cleavage site at C-terminus.

 

 Threshold of Proteasome filter: In case user is interested to use proteasome filters than they should also select cut-off threshold which is 5 % by default. User can select threshold suitable to their requirement (See ‘Threshold Score for the Prediction of MHC Binders’ sub section for detail).

 

 

Go Top 

Performance of Propred1

What is the performance of server for various alleles ? or How much I can rely on prediction ? . This is one of the obvious questions in users mind when they use any prediction server.

 

Percent Coverage: We calculated the percent coverage (percent of binders correctly predicted as binder) for each allele for which sufficient amount of data was available.  The data of binders and non binders corresponding to each MHC alleles has been extracted from MHCBN database (http://www.imtech.res.in/raghava/mhcbn/  Bhasin et al., 2002). The number of binders varies from 20 to 1200. Following table shows the result at default threshold score 4% (score at which sensitivity and specificity are nearly the same). The percent coverage has been calculated from predicted results.

 

  

MHC Alleles (Total binder, % Coverage)

HLA-A*0201(1221, 75%)

H2-Db (189, 74%)

HLA-B*0702(79, 92%)

HLA-A*0205(28, 61%)

H2-Dd (89, 74%)

HLA-B*2705(145, 98%)

HLA-A*1101(116, 80%)

H2-Kb (116, 78%)

HLA-B*3501(254, 84%)

HLA-A*3101(33, 70%)

H2-Kd (277, 83%)

HLA-B*5101(51, 92%)

HLA-A1 (128, 77%)

H2-Kk (28, 86%)

HLA-B*5102(33, 94%)

HLA-A2 (976, 69%)

H2-Ld (113, 60%)

HLA-B*5103(30, 97%)

HLA-A2.1 (77, 64%)

HLA-B*5401(60, 100%)

HLA-B8 (130, 75%)

HLA-A24 (60, 70%)

HLA-B61 (22, 95%)

HLA-B62 (29, 55%)

HLA-A3 (191, 64%)

HLA-B14 (81, 75%)

HLA-Cw*0401(20,80%)

HLA-B7 (134, 81%)

HLA-B*5301 (64, 95%)

 

 

These results clearly indicate that in most of the cases percent coverage is more than 80% which is reasonably good. Almost all alleles showed reasonable percent coverage, which means threshold criteria and matrices used in ProPred1 are beneficial for experimental scientists.

 

Comprehensive Evaluation of ProPred1: Nonetheless, the percent coverage is a useful measure to evaluate the ability of method for the identification of binders from a given sequence, but it does not provide any information about predicted false positive binders or accuracy of prediction etc. Thus we also perform comprehensive evaluation of Propred1, where following three commonly used parameters were used to measure the performance.

                                                              [1]

                                                              [2]

                                                        [3]

The correlation coefficient (CC) is a rigorous parameter to measure the performance of a method, which is commonly used in other fields of the science (e.g. secondary structure prediction).  The CC can be defined as:

                                          [4]

 

Where TP and TN are correctly predicted binders and non binders respectively. FP and FN are wrongly predicted binders and non binders respectively.

 

We compute all the above parameters for ProPred1 for its comprehensive evaluation.  In order to evaluate a method one need sufficient data of experimentally proven MHC binders and non binders. Unfortunately, most of the alleles have very limited number of binders and . non bindersThus, the comprehensive evaluation of ProPred1 was performed only for two alleles (HLA-A*0201 & H2-Kb) for which sufficient number of binders and non binders binders were available. The peptides for allele HLA-A*0201 (1220 binders & 56 nonon binders and H2-Kb (300 binders & 200  bnon bindersibinders werebtained from MHCBN database (Bhasin et al., 2002). The performance of ProPred1 for these two MHC alleles at different percent threshold has been shown in following table

 

 

Thres-

holds

HLA-A*0201

H2-Kb

Sensitivity

(%)

Specificity

(%)

Accuracy

(%)

Correlation coefficient

 

Sensitivity

(%)

Specificity

(%)

Accuracy

(%)

Correlation

coefficient

1%

36

98

38

0.1314

68

88

70

0.372

2%

57

93

58

0.1854

73

81

74

0.3775

3%

66

80

67

0.1783

78

81

78

0.4209

4%

75

78

75

0.2179

78

69

77

0.3367

5%

81

67

80

0.2151

82

62

80

0.3418

 

 

 Go  Top

A Case Study

The purpose of development of ProPred1 is to effectively reduce number of wet lab experiments involved in the identification of potential T cell epitopes or suitable vaccine candidates. In order to demonstrate the usefulness of Propred1 in real life, we applied Propred1 on an antigen which has been extensively studies and whose MHC binders and T cell epitopes have been identified experimentally.  Recently, Kessler et al., 2001 have experimentally determined the MHC binders and T cell epitopes from tumor associated antigenic protein, PRAME. We analyzed the performance of ProPred1 in the identification of experimentally proven MHC binders and T cell epitopes of PRAME. The sequence of PRAME antigenic protein was obtained from SWISSPROT database.

 

MHC Binder: Kessler et al., 2001 tested 128 peptides and identified 19 as high-affinity binders and 27 intermediate-affinity binders. ProPred1 was used to predict these MHC 128 peptides of PRAME at various thresholds. Following table shows the performance of  Propred1.

 

Threshold (%)

Correctly predicted high-affinity binders (out of 19)

Correctly predicted intermediate-affinity binders (out of 27)

1.0

4 (21%)

1 (4%)

2.0

9 (47%)

6 (22%)

3.0

10 (53%)

14 (52%)

4.0

11 (58%)

15 (56%)

5.0

12 (63 %)

21 (77%)

6.0

13 (68%)

22 (81%)

7.0

13 (68%)

23 (85%)

8.0

15 (79%)

24 (89%)

9.0

18 (95%)

26 (96%)

10.0

19 (100%)

27 (100%)

 

As shown in Table, number of correctly predicted binders (intermediate/high affinity) depend on percent threshold. The ProPred1 predicted all binders correctly at 10% threshold. This clearly indicate that server has capability to predict the binders. The performance of ProPred1 was significant even at 4% threshold (default threshold). The default threshold is that threshold at which sensitivity and specificity of a method is nearly the same.

 

Potential T-cell epitopes: It has been demonstrated experimentally that MHC binders having Proteasome cleavage site at C-terminus are mostly responsible for the activation of cytotoxic T lymphocytes (CTLs). Kessler et al., 2001 experimentally identified four regions having HLA-A*201 restricted T cell epitopes. We tested these regions using ProPred1 server. Firstly, binding regions were predicted at default threshold (4%) in protein PRAME. Secondly, all proteasomes sites were predicted at various thresholds. Finally, predicted binders having proteasomes cleavage site at C-terminus were identified.  The number of peptides predicted by above falls in regions identified as T cell epitopes by Kessler et al., 2001, is shown in  following Table. Propred1 was tested on 4 regions of PRAME (A: 90-116; B:133-159; C: 290-316; D: 415-441) which were identified as T cell epitopes by Kessler et al., (2001). The column 2 shows the number of predicted peptides and regions (in bracket), which agree with the experimentally identified epitopes.

Name of filter

Correctly predicted T cell epitopes in protein PRAME at different thresholds (out of 4)

2%

3%

5%

7%

Standard Proteasome

0

1 (A)

1 (A)

2 (A,D)

Immunoproteasome

2 (A,D)

2 (A,D)

3 (A,C,D)

3 (A,C,D)

Immunoproteasome or Standard Proteasome

2 (A,D)

2 (A,D)

3 (A,C,D)

3 (A,C,D)

 

 

 

Table  shows the regions where predicted T cell epitopes by ProPred1 and experimentally identified T-cell epitopes matched. It was observed that in the presence of standard proteasome filter at 7%, the server was able to predict the 50% of binding regions that are in agreement with experimentally proven binding regions as demonstrated by Kessler et al., 2001. Similarly, it has been observed that at 5% of threshold of immunoproteasome filter, the server was able to identify 75% of experimentally determined binding regions. The server was able to predict 75% of binding regions in simultaneous presence of either standard proteasome or immunoproteasome filters at 5% threshold. Hence, all the analysis clearly indicate that it is worth using ProPred1 for the identification of MHC binding regions having Proteasome cleavage site at their C terminus or potential T cell epitopes.

 

Go Top
Presentation of Results

One of the important aspects of MHC prediction is the representation of binding peptides found within the antigenic sequence. This can be achieved by developing a powerful interface of prediction methods. The ProPred1 provides three major options to visualize results in user-friendly formats, including most popular tabular format. Following is the brief description of these options.

 

Graphical Display:  The graphical output represents the quantitative estimation of MHC binding propensity of the antigenic sequence. The server represents results in graphical format (X-Y Plot), where amino acid sequence is shown along the sequence and peptide score is shown along the . Y- axisThe images are generated in GIF format using the GDPlot library (developed by Lincoln D. Stein). Each binder is represented as a peak crossing the dashed threshold line in the image. Besides this, the server also plots the threshold profile (threshold versus binding peptides). This profile assists experienced users in selecting the threshold for locating the promiscuous regions in antigenic sequence. It allows user to locate the promiscuous regions in the query sequence by looking at the peaks in graphs for different MHC alleles. Each binder is represented as a peak crossing the dashed threshold line. Following is the example of graphics output.


Graphical output generated by the ProPred-1. The peaks (starting from ~200th amino acid) crossing the red threshold line are the predicted binders.

Text or HTML Format: This option of server presents the MHC binders within antigenic sequence in text or HTML format. It has two sub-options. The first sub options displays the predicted MHC binders in separate lines along the antigen sequence. This option uses the separate lines for representing all the predicted overlapping binders within the sequence. This suboptions is very useful for viewing the predicted overlapping binders. In this option (HTML-I), the overlapping regions are presented on separate lines making it easier to detect the overlaps.



The HTML-I output: The prediction is made at 7% threshold for explaining the output. The peptide frames "RTFEREYRT", "FEREYRTRL", and "REYRTRLKT" are represented more expressively than the simple "RTFEREYRTRLKT" string.

 

The second  osuboption of the server represents predicted binder by different color i.e. blue. The first position of each binder is shown by red color so that user can easily distinguish the overlapping peptides. This option (HTML-II) is similar to that used by TEPITOPE and ProPred.


The peptide "YLESQLEEL" is predicted to bind to five alleles. The main advantage of this display is that it allows easy way to locate promiscuous regions in sequence (a region that can bind to number of MHC alleles). For example peptide "QQRTVLEGRLEQLRTFEREYRTRLKTYLESQLEEL" binds to ten MHC alleles out of eleven MHC alleles. Though useful, this option is less expressive in presenting the overlapping binding regions.

 

 

Tabular Format: This is the most widely used option for the display of results in most of the web servers of MHC prediction. This option displays the peptides sorted in descending order of their score. The server creates a separate table corresponding to each selected allele.  Following is the example output of Propred1 for Table Format.

 

Go Top






Limitations of ProPred1

All the matrices used in server were obtained from various servers and from the literature. The base of selection of matrices is on its availability from single source and not on the performance. Thus, it is not necessary that we are using best matrix for an allele if more than one matrix is available in the literature. In this server only 9mer peptide length are predicted not 8mer or 10mer. Thus it is possible that ProPred1 may miss potential 8mer and 10mer binders. The matrices for predicting ProPred1 were obtained from the paper of Toes et al. (2001), where their values were obtained for enolase-I protein. We have used these values for all predictions, it is not necessary that this generalization will work for all the proteins.     

 

 Go Top

 

An EXAMPLE of SUBMISSION FORM

 


The following section will provide a case study that demonstrates systematically how the MHC class-I binders are identified by the ProPred-I. In this example, we will use the Antigen 84 Mycobacterium leprae [gi|15827442|ref|NP_301705.1|] and search for the promiscuous MHC class-I binders. We will use both the  and iProteasomemmunoproteasomal filters, and try to construct a potential immunogenic peptide.

    • The antigen sequence obtained from NCBI (accessible via http://www.ncbi.nlm.nih.gov/) is pasted in the sequence box. Alternatively, you may submit a sequence file.
      The server can accept both the formatted or unformatted (RAW) sequences. It uses ReadSeq routine to parse the input


 

o        Before scanning, you can modify the prediction conditions, such as the MHC matrices and the percent threshold value. The choice of the percent threshold value may vary based on the overall context, such as the type of analyzed sequence (e.g. protein subunit vs. whole viral genome), the downstream epitope validation capabilities including economic considerations (e.g. number of in vitro stimulation assays that can be performed), or the specific question asked (e.g. search for allele-specific or promiscuous epitopes).