Coreceptor Usage Prediction Server

HIVcoPred: is a support Vectors Machine (SVM) and BLAST based prediction method for coreceptor used by HIV-1 from its thrid Hypervariable loop(V3). The Split amino acid composition was generated and SVM score was claculated. BLAST E-value score generated for each submitted V3 loop and depending upon the E-value score, the Final SVM score was generated which was used for prediction purpose.

Datasets
We used the following datasets in the development of HIVcoPRED:

CCR5 Sequences (N=1799)
CXCR4 Sequences (N=598)

The X4 dataset comprised 246 CXCR4-tropic and 352 R5X4-tropic V3 sequences, making a total of 598 sequences. The individual datasets sequences are given below:

CXCR4 Sequences (N=246)
R5X4 Sequences (N=352)

where 'N' is the number of V3 sequences in each dataset.

To submit a protein sequence for prediction, following steps have to be followed:

Sequence Name:
In this area, the user can submit the name of the query sequence. This is optional and will not affect the prediction, in case name is not provided.

E-mail Address:
The user can submit his/her E-mail address for having prediction results on E-mail. This will be quite useful in case of prediction based upon "Hybrid approach (SAAC+BLAST)", as it takes time to run Blast for each submited query sequence and making the final results. The results (link of the final prediction page) will be mailed to the user after all the submitted sequences will be processed. This is optional and will not affect the prediction output in case it remain blank.

Input sequence:
There are two ways of V3 sequence submission. The user can either directly paste the sequences into the text-box or upload the fasta file using 'BROWSE' option. Before uploading please ensure that the sequences must be in FASTA format and single-letter code of V3 amino acid representation sould be there.

Prediction Options:
We developed SVM based prediction models using various input features e.g. Amino acid composition(AAC), Dipeptide composition (DPC), Split Amino Acid Composition (SAAC), Hybrid approach (SAAC+BLAST), Binary etc. In the webserver, we implemented only two best models : SAAC and Hybrid (SAAC+BLAST). The SAAC model is fast whereas the hybrid model is slow but also intergate the Blast analysis in the final result output. If the number of input sequences is more than 10, preferbly SAAC model should be used.

SVM Threshold:
Selection of threshold is very important aspect of SVM based prediction models. The probability of correct prediction directly depends on the threshold selected by the user. The SVM will classify the V3 sequence into R5-tropic or X4-tropic, depending upon the threshold selected by the user. The user can select any threshold ranging from '1' to '-1', the default set threshold is '0.3'. If user want less sensitivity but more specificity (i.e higher correct prediction of CXCR4), then higher threshold value should be specified, but if higher prediction of CCR5 is desired then lower threshold value should be selected. So, the expected outcome will depends on the trade-off between sensitivity and specificity.

Bioinformatics Centre, Institute Of Microbial Technology,Chandigarh,India