|
LGEpred |
|
Correlation Analysis and Prediction of Genes Expression from Amino Acid Sequence of Proteins |
This server allows user to analyze their gene expression data (Microarray Data). It allows calculating correlation coefficient between amino acid composition of proteins and expression level of corresponding genes. This will facilitate users in understanding which residues are preferred and vice verse in a organism in given condition. This server also allows to learn from known expression data and to predict expression level of other genes of same organism in that condition from their protein sequence. The method uses SVM for learning and prediction from dipeptide composition of proteins.
Frequently Asked
Questions (FAQ)
1.
How can
one predict expression of gene from its protein sequence? Expression of gene
depends on number of environment conditions and it change time to time, whereas
protein sequence are static.
Answer: We agree that expression of gene is depend organism
conditions and number of other factors. This is the reason we are not
developing universal method to predict expression of gene from their protein
sequence. We and expert in the field agrees that expression of same type of
proteins (similarity at sequence) in same organism and in same condition will be similar. Thus,
its possible that in a given condition if we have expression (microarray) data
of genes than we can learn relationship between expression and protein
sequence. Once we know relationship between expression level and protein
sequence than we can utilize this relationship to predict the expression of
other genes of that organism in that condition from amino acid sequence of
their protein. The LGEpred server allows user to
derive knowledge/rules (or SVM model) on their own expression data which they
can use for prediction of expression of unknown genes.
2. How I can use LGEpred server
Answer: We have provided example data/information on each submission form, that will help in understanding the format and type of data required for using LGEpred server. In order to run server on example file/data you need to download the data from example section and then submit/upload this data. In order to analysis your expression data using LGEpred you need to have name of orf/gene and corresponding amino acid sequence in FASTA format. In case of prediction you also need your sequence data in FASTA format.
3.
Why LGEpred sever required amino acid sequence of proteins?
Answer: This server is not for general anlysis of microarray data where they compute, normalization, clustering etc. This server is unique server which allows user to exploit relation between amino acid sequence of proteins and expression level of their genes.
4.
What
type of data analysis tools are available on this server ?
Answer: This server allows one to perform various type of analysis
on microarray data. This may help users in understanding the relationship
between expression of genes and amino acid composition of their proteins.
Following is the brief description of options.
·
Correlation coefficient: This allows the user
to compute the correlation between amino acid composition and gene expression
from microarray data. The user can generate correlation tables on their
microarray data like Table 2 and Table 6.
·
Bin-wise analysis: One can compute the
average expression of genes whose proteins have amino acid composition in a specified
range. Basically, it allows comprehensive analysis on binned data. One can generate
the average expression tables like Tables 1, 3, 4 & 5.
·
Scatter plots of gene expression: The
user can generate scatter plots between gene expression and amino acid
composition or length of protein using this option. This allows visualization of relation
between gene expression and amino acid sequence on their own expression data. An example
figure created using LGEpred server is shown in Figure 1.
It provides an option to the user to plot graph by taking expression level on horizontal
or vertical axis.
·
Specific plots of gene expression: The
specific plot not only allows us to generate a scatter plot between expression level
and amino acid composition but also allows drawing the average expression of
genes which have amino acid composition in a specified range (See Figure 2). Using
these graphs one can easily detect the relation between expression level and
composition in various ranges on their own data.
5.
Is
this server is for oligonucleotide array data or
C-DNA microarray data
Answer: We designed it oligonucleotide array data where one can study the relationship between protein sequence and absolute gene expression obtained from oligonucleotide array. This may be used for analyzing cDNA microarray data also where user can provide the expression change instead of expression level. This server will also be useful for detecting which residues are preferred in which conditions and why expression of particular genes changes drastically with change of conditions.
6.
Can I
predict the expression of a gene if I knew amino acid sequence of its protein
Answer: The answer of your question is no and partially yes. A) you can not predict expression of gene from its sequence because expression of gene is not static quantity. Expression of gene is depend on time and condition and number of other factors. B) Partially yes, you can predict, if you know expression of other genes and their protein sequence. In that case LGEpred will learn from known expression data and will predict expression of other genes in same condition.
7.
What
type of predict/evaluation facilities LGEpred have
Answer: One of the major features of LGEpred is to allow the users to develop a SVM based prediction method on their
own microarray data. This has three major options for the prediction of gene or
ORF expression.
·
Training and prediction: This routine builds a SVM model from users’ microarray
data using expression level of genes and sequence of proteins. Then it predicts
the expression of unknown genes of the same organism in the same condition from their protein
sequence using this SVM model.
·
Evaluation and prediction: This allows
users
to evaluate the SVM method developed on users’ microarray data using LGEpred server. The evaluation is very important in the
area of prediction because it provides confidence to the user in using the
method of
their choice.
·
Prediction from model: This allows users to
predict
the expression of genes from their protein sequence using SVM model built using the above
options of LGEpred server.