Computational methods for MHC binding peptide prediction available at this server

    There are several methods used for the prediction of the MHC binding peptides. However, due to the data limitations, it will be difficult to evaluate all the methods on a single platform. In current study we selected the methods used to predict HLA-DRB1*0401 binding peptides. The methods can be classified into three categories. A brief description of each is as follows.

Motifs based methods

A motif consists of two or more key amino acid residues that are thought to be essential for binding. The motifs can be general or allele specific. We converted every motif into a matrix by exchanging the motif residue with numeric value (dependent on its effect on binding interaction) and rests with zero. This matrix was used for predictions. The motifs used in analysis are:



Chicz et al., 1993: They eluted >200 naturally processed peptide form HLA-DR2, DR3, DR4, DR7 and DR8 molecules using the mass spectroscopy and edman micro sequencing. These peptides were derived from 66 different source proteins and are of length 15-18 amino acids. From alignment of these peptides, they derived the motifs.

Motif Chicz et al., 1993
Amino acid/PositionP1P2P3P4P5P6P7P8P9
A:---------
C:---------
D:---------
E:---------
F:3.0000--------
G:---------
H:---------
I:---------
K:---------
L:3.0000--------
M:---------
N:--------3.0000
P:---------
Q:--------3.0000
R:---------
S:--------3.0000
T:--------3.0000
V:3.0000--------
W:---------
X:---------
Y:---------


Sette et al., 1993: They screened a large number of peptides to identify high affinity binders. The peptides were truncated at N and C terminals to deduce the binding core. Finally, single amino acid substitutions were carried out in HA (301-319) to elucidate the importance of each residue in selected binding core.



Motif Sette et al., 1993:
Amino acid/PositionP1P2P3P4P5P6P7P8P9
A:---------
C:---------
D:---------3.000
E:---------3.000
F:3.0000--------
G:---------
H:---------
I:3.0000----3.0000---
K:----3.000---3.000--3.000
L:3.0000----3.0000---
M:3.0000----3.0000---
N:---------
P:---------
Q:---------
R:----3.000---3.000--3.000
S:-----3.0000---
T:-----3.0000---
V:3.0000----3.0000---
W:3.0000--------
X:---------
Y:3.0000--------


Hammer et al., 1993: They screened the M-13 phage display library with purified HLA-DRB1*0401 and HLA-DRB1*0101 molecules. The sequencing of the peptide encoding regions of DR bound phage was done. Form a set of 51 nine residue long peptides they calculated the position specific probabilities and deduced the motif.

Motif Hammer et al., 1993:
Amino acid/PositionP1P2P3P4P5P6P7P8P9
A:---1.0000-----
C:---------
D:---------
E:---------
F:---------
G:---------
H:---------
I:---------
K:---------
L:---1.0000--3.0000--
M:---3.0000--1.0000--
N:------1.0000--
P:---------
Q:------3.0000--
R:---------
S:-----1.0000---
T:-----3.0000---
V:---1.0000-----
W:3.0000--------
X:---------
Y:3.0000--------


Max et al., 1994: They used the Cal (295-309) and analogous of RAL-1 peptide. The Binding assays using self-peptide analogues with single amino acid substitutions led to the development of a DR4Dw4-binding motif with anchor residues at relative positions 1 and 6.

Motif Max et al., 1994:
Amino acid/PositionP1P2P3P4P5P6P7P8P9
A:---3.0000-----
C:---------
D:---3.0000-----3.000
E:---------3.000
F:---------
G:---------
H:---------
I:3.0000-----3.0003.0000--
K:----3.000---3.000--3.000
L:------3.0003.0000--
M:---------
N:---------
P:---------
Q:---------
R:----3.000---3.000--3.000
S:-----3.0000---
T:-----3.0000---
V:3.0000--------
W:3.0000--------
X:---------
Y:3.0000--------


Rammensee et al., 1995: They compiled a huge database of the sequences of natural epitopes that trigger the T-cell response. The sequences were either washed form the various MHC alleles or collected from literature. From the allele specific alignment of these sequences, they proposed the binding motif.

Motif Rammensee et al., 1995:
Amino acid/PositionP1P2P3P4P5P6P7P8P9
A:---2.0000--1.0000-1.0000
C:------1.0000-1.0000
D:---2.0000--1.0000--
E:---2.0000--1.0000--
F:3.0000--2.0000-----
G:------1.0000-1.0000
H:-----2.00001.0000--
I:2.0000--2.0000--1.0000-1.0000
K:----3.000--1.0000-1.0000
L:2.0000--2.0000--1.0000-1.0000
M:2.0000-----1.0000-1.0000
N:-----2.00001.0000-1.0000
P:------1.0000-1.0000
Q:-----2.00001.0000-1.0000
R:----3.000-2.00001.0000--
S:-----2.00001.0000-1.0000
T:-----2.00001.0000-1.0000
V:2.0000--2.0000--1.0000-1.0000
W:3.0000--2.0000-----
X:---------
Y:3.0000--------



Matrix based methods

The matrix (X, Y) represents the probability of occurrence of an amino acid (X) at a specific position (Y). Summation of position specific values, from the matrix, for a given peptide yields the predicted binding score that is used for decision. Following is the brief description of matrices used in the analysis:
Marshal et al., 1994: In this study quantitative matrix was generated from 13mer peptides who has shown high-affinity binding to HLA-DRB1*0401. The relative effect of each amino acid at each position were determined experimentally in term of IC50 value. Here, we took positions 3-11 of their defined matrix for predicting binders/non-binders in data set of 9mer peptides used in our study. The same has already been used and shown to be better in predictive performance than 13mer by Borras-Cuesta et al., (2000).



Matrix Marshal et al., 1994:
Amino acid/PositionP1P2P3P4P5P6P7P8P9
A:1.001.001.001.001.001.001.001.001.00
C:1.001.860.250.421.000.820.740.800.86
D:1.009.411.290.901.457.843.481.5041.17
E:1.0012.273.820.540.966.415.001.3631.00
F:1.680.500.580.280.8012.441.900.6036.00
G:1.004.150.782.051.001.092.041.501.40
H:1.000.951.570.700.882.860.490.774.02
I:6.420.420.850.760.700.801.160.6911.66
K:1.000.462.7230.01.3013.3120.001.1123.00
L:15.410.330.400.611.751.441.021.104.20
M:8.311.430.540.320.683.310.800.804.50
N:1.000.752.580.161.190.601.200.895.00
P:1.00544.01.7270.852.991.500.770.745.43
Q:1.000.864.230.601.001.948.450.555.37
R:1.000.252.9418.321.1317.2212.801.3010.67
S:1.000.700.400.781.000.701.572.611.69
T:1.000.821.281.180.660.561.673.006.86
V:14.070.391.101.030.820.951.190.636.15
W:1.200.521.791.500.682.801.541.237.84
X:1.001.001.001.001.001.001.001.001.00
Y:1.000.610.800.600.533.200.400.4210.23


Hammer et al., 1994: They analysed preferences observed in a systematic series of peptide binding experiments, where each position in a ?minimal? peptide was replaced individually by every amino acid. The resulting matrix was incorporated into a computer program and was used to predict the HLA-DRB1*0401 binding regions in an antigen sequence. Their prediction algorithm includes, first scanning an antigen for nine residue peptide frames with a P1 anchor (F, I, L, M, V, W, or Y). The selected frames were then assigned scores based on the matrix. The score are compared with a threshold value for making a decision.


Hammer et al., 1994:
Amino acid/PositionP1P2P3P4P5P6P7P8P9
A:-9990.000.000.000.000.000.000.000.00
C:-9990.000.000.000.000.000.000.000.00
D:-999-1.3-1.31.70-0.20.00-1.1-1.1-2.6
E:-9990.10-1.20.80-0.1-1.2-0.2-0.2-1.8
F:0.000.800.80-0.80.30-1.3-0.80.10-0.8
G:-9990.500.20-1.5-0.2-1.1-1.5-0.5-0.2
H:-9990.800.200.80-0.1-1.6-0.80.000.3
I:-1.01.101.500.800.10-0.2-0.2-0.1-0.4
K:-9991.100.00-2.20.30-2.3-1.20.90-0.9
L:-1.01.001.00-0.60.10-1.30.400.60-1.3
M:-1.01.101.401.400.30-1.30.700.40-0.4
N:-9990.800.500.500.201.70-0.10.70-1.1
P:-999-0.50.30-2.10.500.10-0.3-0.2-1.6
Q:-9991.200.001.100.10-1.2-0.51.600.7
R:-9992.200.70-1.50.00-2.2-1.20.70-0.9
S:-999-0.30.201.100.401.70-0.40.601.2
T:-9990.000.000.800.601.90-0.20.50-0.3
V:-1.02.100.500.500.401.300.500.400.5
W:0.00-0.10.00-1.2-0.1-0.9-1.30.60-0.3
X:-9990.000.000.000.000.000.000.000.00
Y:0.000.900.80-1.0-0.2-1.1-0.71.30-1.5


Southwood et al., 1998: They used the polynomial method for predicting the HLA-DR4 binding peptides. Average Relative Binding (ARB) values for each residue at each position were estimated form a library of 384 peptides. For prediction, the ARBs for a given sequence are multiplied together and the sequence is classified as binder on non-binder according to a pre selected threshold.


Southwood et al., 1998:
Amino acid/PositionP1P2P3P4P5P6P7P8P9
A:1.000.961.041.570.590.650.860.821.62
C:1.000.570.741.120.830.470.940.281.1
D:1.000.340.331.400.401.000.580.530.24
E:1.000.311.090.420.421.000.290.610.25
F:2.333.661.850.801.581.001.841.341.12
G:1.001.140.640.430.481.000.491.190.52
H:1.000.780.151.140.931.0013.771.405.15
I:0.791.741.011.914.390.982.361.662.75
K:1.001.441.250.530.401.000.620.640.55
L:0.810.861.881.281.110.671.361.080.83
M:1.1412.791.492.770.320.748.111.984.05
N:1.000.441.721.421.891.000.840.431.64
P:1.000.560.311.442.460.862.832.122.18
Q:1.000.400.381.612.091.000.310.710.62
R:1.001.090.500.690.391.000.140.411.22
S:1.001.551.311.291.761.111.232.931.54
T:1.001.004.340.891.321.863.071.761.64
V:0.793.340.931.050.702.360.690.541.53
W:0.822.042.520.210.911.000.390.350.22
X:1.001.001.001.001.001.001.001.001.00
Y:1.070.741.510.391.411.000.440.610.35


Struniolo et al., 1999: A concept of generating virtual matrices for various MHC alleles was described in their papers. Based on this concept, matrices for 51 HLA-DR alleles from small set of experiments, were generated. A virtual matrix is defined in terms of pocket profiles, rather than using the peptide sequences. A pocket profile is the quantitative effect of each amino acid on the binding affinity of a peptide. Recently, a web server has been developed using these virtual matrices to predict MHC binders in an antigen sequence for 51 HLA-DR alleles (Singh and Raghava, 2001).


Struniolo et al., 1999:
Amino acid/PositionP1P2P3P4P5P6P7P8P9
A:-999.000.000.000.00-0.000.00-0.00
C:-999.000.000.000.00-0.000.00-0.00
D:-999.00-1.30-1.301.40--1.10-0.30--1.70
E:-999.000.10-1.201.50--2.400.20--1.70
F:0.000.800.80-0.90--1.10-1.00--1.00
G:-999.000.500.20-1.60--1.50-1.30--1.00
H:-999.000.800.201.10--1.400.00-0.08
I:-1.001.101.500.80--0.100.08--0.30
K:-999.001.100.00-1.70--2.40-0.30--0.30
L:-1.001.001.000.80--1.100.70--1.00
M:-1.001.101.400.90--1.100.80--0.40
N:-999.000.800.500.90-1.300.60--1.40
P:-999.00-0.500.30-1.60-0.00-0.70--1.30
Q:-999.001.200.000.80--1.500.00-0.50
R:-999.002.200.70-1.90--2.40-1.20--1.00
S:-999.00-0.300.200.80-1.00-0.20-0.70
T:-999.000.000.000.70-1.90-0.10--1.20
V:-1.002.100.50-0.90-0.900.08--0.70
W:0.00-0.100.00-1.20--1.00-1.40--1.00
X:-999.000.000.000.00-0.000.00-0.00
Y:0.000.900.80-1.60--1.50-1.20--1.00


Borras-Cuesta et al., 2000: The frequencies of amino acids from a set of binding peptides at each position were compared with average frequencies of these amino acids in peptides observed in 23406 proteins (Obtained from the Swissprot database), in order to compute the contribution of each amino acid in MHC binding. To assess that the differences in relative frequencies of each amino acid with those in the swiss prot, not due to chance alone, Z values were calculated.



Matrix Borras-Cuesta et al., 2000:
Amino acid/PositionP1P2P3P4P5P6P7P8
A-9990.003.932.770.000.000.000.00
C-9990.000.000.002.502.770.000.00
D-9990.00-4.320.00-2.88-2.88-1.940.00
E-9990.000.000.00-7.210.000.000.00
F0.000.000.000.000.000.000.000.00
G-9990.000.000.000.000.00-2.520.00
H-9992.430.000.000.00-1.920.003.47
I0.000.000.003.470.000.000.000.00
K-9990.000.00-4.320.000.000.00-4.32
L0.000.000.000.000.000.003.191.85
M-9990.000.005.550.000.004.160.00
N-9990.000.00-3.60.000.000.00-2.88
P-999-12.97-3.6-5.770.000.000.000.00
Q-9990.000.000.000.00-2.882.080.00
R-99916.650.000.002.430.000.003.70
S-9990.000.000.000.000.000.000.00
T-999-2.880.000.000.000.000.000.00
V0.000.000.004.160.000.000.000.00
W0.000.002.082.080.000.000.002.77
X-9990.000.000.000.000.000.000.00
Y0.002.080.00-8.650.000.000.000.00


Brusic et al., 1998: The identification of binding core is one of the problems in developing the prediction method. They used the genetic algorithms to align the HLA-DRB1*0401 binding peptides in order to evolve a matrix, until a fair amount of discriminative power is achieved. However, this matrix was not intended to be prediction algorithm. Nevertheless, since it has been shown to discriminate between binders and non-binder with high accuracy, we included this matrix in evaluation. It represents another important approach in pattern search that is the genetic algorithm.



Matrix Brusic et al., 1998:
Amino acid/PositionP1P2P3P4P5P6P7P8P9
A:-20.001.80.31.10.20.5-0.31.41.1
C:-20.002.0-1.21.9-1.2-1.82.1-0.10.9
D:-20.00-2.4-1.90.8-0.8-0.7-1.4-1.8-1.9
E:-20.00-1.00.7-2.40.20.60.5-1.3-2.2
F:00.001.4-0.61.90.10.42.0-2.3-1.1
G:-20.00-1.2-0.9-1.20.11.1-0.30.50.3
H:-20.00-0.61.30.11.2-0.82.01.5-1.0
I:-01.00-0.70.61.10.5-1.20.6-0.5-0.2
K:-20.000.4-2.4-2.10.9-0.70.7-0.7-1.8
L:-01.00-1.80.70.00.10.60.90.2-2.1
M:-01.000.2-2.12.50.8-0.32.10.5-1.7
N:-20.000.4-1.70.4-1.00.41.91.8-1.8
P:-20.00-0.5-0.70.00.31.10.60.2-1.4
Q:-20.00-0.21.01.0-0.9-2.00.40.80.1
R:-20.002.5-1.70.10.5-0.10.40.6-0.8
S:-20.000.90.2-1.6-0.20.4-1.21.81.2
T:-20.001.22.1-1.9-0.92.1-0.42.4-1.5
V:-01.00-2.51.30.70.60.70.91.11.0
W:00.00-1.00.21.1-2.31.5-1.91.80.1
X:-20.00-1.0-1.0-1.0-1.0-1.0-1.0-1.0-1.0
Y:00.000.2-0.5-0.1-1.1-1.30.00.41.8



Artificial Neural Networks (ANNs) based methods

Brusic et al., 1998: In order to handle the non-linearity and complexity in MHC data, the ANN was used to descriminate binders and non-binders by Brusic et al., 1998. They used neural network program PlaNet 5.6 for learning ANN on their data set. Since neither the data nor the matrices were available, we implemented and tested the algorithm using jack-knife method of cross-validation (Afifi and Clark, 1990) as described by Brusic et al., 1998. The training was performed for 300 cycles at learning rate of 0.2, momentum of 0.9 with 3 hidden layers.


Developed by:
Bioinformatics Centre

Institute of Microbial Technology
INDIA