TopLib - User Manual

TopLib computes the library size required to attain a specified probability of two types of events: The first is "discovering at least one of the top k variants" (i.e., the library includes either the best variant among all distinct variants that could possibly be generated in the experiment, or the second-best, ... or the kth best). The second is the event "the library contains all possible distinct variants that could be generated;" this event is termed "full coverage." The library size computed is a function of the probability specified by the user and the randomization scheme.

Toplib can also run the inverse calculation, and compute the probability of any of the abovementioned events, given a library size.

Using the Web Server

First page

I want to - Choose the first option (and specify a probability - typically a number close to 1, say 0.95 or 0.99) to compute the required library size; choose the second option (and specify the library size) to compute the corresponding probability.

Probability is of - Either event of the type "discovering at least one of the top k variants" (in which case k must be specified), or the event "full coverage." The probability of discovering the best variant (i.e., the first option with k = 1) equals the expected proportion of all possible variants that is represented in the library (a quantity sometimes called "completeness," e.g. by Firth and Patrick, Nucl. Acids. Res., 2008).

Number of positions randomized - This is self explanatory. TopLib supports up to 5 randomized positions.

Randomize by - Saturation mutagenesis usually originates at the DNA level, via degenerate primers containing a mixture of sequences at the chosen codons. If this is the case, choose "specifying codon combinations." More sophisticated randomization schemes such as MAX (Hughes et al., J. Mol. Bio., 2003) allow to assign directly probabilities to each of the 20 amino acids (or to some predetermined subset thereof), without encoding stop codons. If this is the case, choose "specifying directly probabilities for amino acids."

Randomization is - When randomizing several positions in the same way (say, NNK randomization in all positions, or a 1/20 probability for each of the 20 amino acids in all positions), choose "the same in all positions," so that the randomization scheme will be specified (in the next page) only once. Otherwise, choose "different across positions."

Yield - In some experimental settings there is a small per-variant probability that the randomization fails (say, because the plasmid did not receive the insert with the randomized DNA). The term yield is used to denote the per-variant probability that the randomization succeeds.

Second page - the randomization scheme

If the "Randomize by" field in the previous page was set to "specifying codon combinations," the desired nucleotides need to be specified here. For example, for NNN randomization, all four nucleotides need to be checked in each of the three bases constituting the codon (as is the default); for NNK randomization, all four nucleotides need to be checked in each of the first two bases, and only G and T need to be checked in the third base.

If the "Randomize by" field in the previous page was set to "specifying directly probabilities for amino acids," these probabilities need to be specified here. The probabilities need not sum up to 1 (though they must be non-negative), and may be thought of as weights: for example, to assign probabilities of 0.5 for Alanine, 0.25 for Serine, and 0.25 for Valine, one can enter 2 for Alanine, 1 for Serine, 1 for Valine, and 0 for each of the other amino acids.

Third page - result

The library size required to attain the specified probability of the event of interest is reported, next to a summary of the randomization scheme.

Computational Notes

Given a library size, TopLib can compute the desired probability using the methods described in the paper "When second best is good enough: another probabilistic look at saturation mutagenesis" by Yuval Nov. To find the library size given a probability, TopLib first tries increasingly larger libraries (the library size is doubled each iteration), until the resulting probability is greater than the specified probability. Then, using binary search, TopLib finds the minimal library size that corresponds to a probability greater than or equal to the specified probability.

When the number of randomized positions or the number of top variants to be considered (k) increases, computing exactly the desired probability may become too lengthy, as it involves a sum with a very large number of summands. In such a case, TopLib can compute quickly an approximation to the probability. This approximation is very close to the true value: for example, when randomizing 3 NNN positions, the exact library size required to attain a 0.95 probability of discovering any of the top two variants is 25,583, whereas TopLib's approximate answer is 25,585 (the approximate library size is always slightly larger than the exact one). TopLib switches to approximate computation either when the number of top variants to be considered is 5 or more, or when there are more than 10,000,000 summands in the aforementioned sum.


Back to TopLib main page