MHC & Prediction Algorithms


These definitions are directly related to the terminology used in server, an understanding would help in better analysis of results.

Predicted Binder: Any peptide frame that scores greater than selected threshold.

Threshold: A preselected numerical value used to differentiate between binders and non binders. Any peptide frame scoring higher than this value is predicted as binder or vice versa.
The threshold is defined as the 'percentage of best scoring natural peptides'. For example, a threshold of 1% would predict peptides in any given protein sequence which belong to the 1% best scoring natural peptides. The threshold correlate with the peptide score ( Sturniolo et al., 1999) and therefore with HLA-ligand interaction. More importantly, threshold is an indicator for the likelihood that predicted peptide is capable of binding to a given HLA-molecule. The lower the threshold (= high stringency), the lower the false positive rate and the higher the false negative rate. in contrast the higher the threshold (= low stringency), the higher the false positive rate and the lower the false negative rate. In short, from the same protein sequence input, a threshold setting of 1% will predict a lower number of peptide sequences and for a lower number of HLA-II alleles, compared to 2% or higher thresholds; however, this will also ensure a higher likelihood of positive downstream experimental results. Normally, at least for a first round of screening, threshold values higher than 3% are not desirable, since the rate of false positives can increase the size of the predicted repertoire to an amount unacceptable for later experimental testing.
The  threshold values are derived as follows: (i) Peptide Frame Score for all valid Peptide Frames in a representative database of natural protein sequences are calculated for any given virtual matrix. (ii) The distribution of peptide Frame Score values within this database is checked; a database of sufficient size should generate a Gaussian-like distribution. (iii) Peptide Frame are sorted based on their scores, and the Peptide frame score values corresponding to the 1%, 2%, 3%, etc. best scoring peptides are determined. Thus, the threshold correlates with the peptide frame score value and is therefore an indicator for the likelihood that the predicted peptides are capable of binding to a given HLA molecule.

Peptide Frame: Nine amino acid long peptides generated from antigen.

e.g. an antigen sequence


will give


as peptide frames.

Peptide Frame Score : A numerical value obtained as a result of summation of each position and side chain specific value from matrix.


from the following table calculating score of peptide frame "FSDFCVGHY"

Amino acid/Position

Peptide Frame Score:  0.00 + 0.48 - 0.9 + 1.2 + 0.00 - 0.2 - 0.2 + 0.3 + 1.1

Virtual Matrix: Virtual matrices, like quantitative matrices, provide a detailed model in which the contribution to binding of each amino acid with each pocket/position (HLA binding cleft) is quantified. However, while quantitative matrices are determined individually for any given HLA allele, virtual matrices are formed by assigning and combining pocket specific quantitative binding values derived from one HLA allele to other alleles via HLA sequence comparison. The advantage over quantitative matrices is that virtual matrices address the problem of HLA polymorphism and enable the systematic prediction of peptide ligands for a broad range of HLA binding specificity ( promiscuous peptides). The prediction of promiscuous binding ligands is considered to be a prerequisite for most submit vaccine design strategies. Virtual matrix based prediction models have been validated for HLA-II in several retrospective studies. Furthermore, they have been successfully applied to predict T cell epitopes in the context of oncology, allergy and auto immune diseases .

Pocket Profile: The binding cleft of MHC Class-II proteins contain nine, so called pockets (minute indentations at the bottom of the cleft), each of which holds side chains of binding ligand (peptide). Different side chains may have different effect on the binding affinity of ligand, some may have positive (i.e. increases binding affinity), or negative (i.e. decreases binding affinity) while others may be neutral. A quantitative list of effects of different amino acids side chains in particular pocket is known pocket profile.
The advantage of pocket profile is that once a pocket profile is known for a particular pocket it can be applied to other alleles (having same pocket) without any modification i.e. the pocket profile is independent of HLA polymorphism.

Promiscuous Binder: An antigen or region of antigen that can bind to several HLA alleles. These regions are most suitable for vaccine development because with single epitope, the immune response can be generated in large population.

Name of Antigen: The field is optional but will help in keeping a record of prediction results.

Displayed top scorer: Value in this field represent the number of highest scoring peptide frames in query antigen, to be displayed. The peptide frame score for each nanomer sub sequence is calculated using quantitative matrices. The higher the score of any peptide frame the greater is the probability of it's binding to a given MHC molecule. Default value is 10 % of the total number of nanomeric frames in query antigen.

Allele: Server has matrices for 51 HLA-DR alleles that covers more than 90% of MHC Class II molecules expressed on Antigen Presenting Cells. The user can select single or multiple alleles. Multiple allele option is helpful in locating promiscuous binding regions

Result Display Format: The server offers users different result display formats to ease the identification of promiscuous binders. The peptide ligands are predicted independently for each HLA allele for easy and fast location of promiscuous or allele specific binders.

HTML view I: Predicted binders are displayed as region underlined with " * " . This display is handy in locating overlapping binding regions in terms of their extend of overlap.

HTML view II: Predicted binders are displayed as blue colored region, with P1 anchor or the starting residue of each predicted binding frame as red colored. This display is useful in locating promiscuous binding regions.

Graphical View : The graphical view plot , the score distribution profile , the threshold profile and best scoring subsequence profile (only during subsequence analysis). Each peak in the score distribution profile corresponds to the binding affinity of peptide frame starting from the amino acid under the peak. The score distribution profile represents binding strength of  each peptide frame. As it has been shown that binding strength has a correlation with Immunogenicity, the score distribution profile can provide the magnitude of binding affinity helpful in selecting promiscuous binders. The threshold profile is plotted on the right side score distribution profile. Rather than getting it separately for each allele ( as done by TEPITOPE) the server plots these profiles on same window. The threshold profile that is plotted on a meaningful scale of percent threshold vs number peptide frames, is helpful in selecting appropriate threshold for locating promiscuous binding regions. The third profile that is the best scoring subsequence profile is plotted only during subsequence analysis. It is similar to threshold profile, but here only the highest scoring peptide frame is used for calculations. The plot is same as used by TEPITOPE and helpful in selecting threshold for locating promiscuous binding regions.

 In addition to these the server offers another option for results i.e. the Sorted Top Scorer where the Peptide frames are sorted according to their score and the user selected number of highest scoring peptide frames with their location , score and other information are displayed. This display option can be clicked from the bottom of prediction results. This is commonly used by other servers of this category.

Input sequence format: Both formatted and non-formated sequences are accepted as input. For formatted sequences the server uses ReadSeq. software which can read most commonly used standard sequence formats including FASTA/PIR/EMBL/GENBANK etc. The user have to specify whether the sequence is in any format or non-formated as raw/plain text (single letter coded amino acid only).

Paste your sequence below : The user has to paste "The antigenic protein sequence" in this window. As the server uses ReadSeq. program to read the input sequence, it can accepts, in addition to plain single letter amino acids, most commenly used standard sequence formats e.g. FASTA, EMBL, PIR etc.

Or submit from file : The user can also upload the antigen sequence directly from a file.

NOTE: Care should be taken that  the server accepts input from either of two options, not both.

Obligatory P1 anchor residues: The M13 phage display library and crystallographic evidence support that  the anchor residue at P1 position is obligatory for high affinity binding. Amino acids M(Met), F(Phe), I(Ile), L(Leu), V(Val), W(Trp) and Y(Tyr) are found at P1 in significant proportion of high affinity binders therefore according to Sturiniolo et al., 1999 only these amino acids are allowed at P1 position.


MHC & Prediction Algorithms