After FORESST has performed a search over its database to match a query sequence of secondary structure to a topology family, the user will receive an ID number that is used to retrieve the results of the search. After the ID is submitted back to the FORESST server, a Web page is displayed that allows the user to choose three types of scores in order to determine what family fits the structure of the query sequence. The three scores are called rank, z- score, and J- score.
If the query sequence consisted of a real secondary structure determined from the X- ray or NMR data, then the z- scores will be five- six standard deviations above the norm for the correct family. But one seldom has an experimentally determined structure. Instead, one has to make do with a predicted structure, and even the best secondary structure predictors have an accurarcy between 70 - 75%. Which means that the best predictions can still be expected to have an incorrect structure for 25 - 30% of the residues. That fact weakens the power of both rank and z- score to unequivocally determine a topology family from a predicted structure. Also, J- scores tend to be clustered around 60- 70%, unless the prediction is very good. And no procedure can work in the face of a bad prediction. Experience has shown that a significant z- score starts at about 3. The best procedure seems to be to take several good quality predictions for the query sequence and examine the families with the highest z- scores for each prediction. The scores for each prediction should be high for same families or at least to families of very similar structure.
According to the 1999 paper of V. Geetha et al. (see
the
bibliography), FORESST does better than BLASTP and certain other
procedures in identifying remote homologies when the pairwise sequence
identity is under 15%. BUT experience tells us that several methods should
be tried together in order to get a clue to a protein's function.