The Need for AveragesAverages arise everywhere. In sports, we want to project the average number of games that a team is expected to win; in gambling, we want to project the average losses incurred playing blackjack; in business, companies want to calculate their average expected sales for the next quarter.
Molecular biology is not immune from the need for averages. Researchers need to predict the expected number of antibiotic-resistant pathogenic bacteria in a future outbreak, estimate the predicted number of locations in the genome that will match a given motif, and study the distribution of alleles throughout an evolving population. In this problem, we will begin discussing the third issue; first, we need to have a better understanding of what it means to average a random process.
For a random variable X taking integer values between 1 and n, the expected value of X is
The expected value offers us a way of taking the long-term average of a random variable over a large number of trials.
As a motivating example, let X be the number on a six-sided die. Over a large number of rolls, we should expect to obtain an average of 3.5 on the die (even though it’s not possible to roll a 3.5). The formula for expected value confirms that
More generally, a random variable for which every one of a number of equally spaced outcomes has the same probability is called a uniform random variable (in the die example, this “equal spacing” is equal to 1). We can generalize our die example to find that if X is a uniform random variable with minimum possible value a and maximum possible value b, then
You may also wish to verify that for the dice example, if Y is the random variable associated with the outcome of a second die roll, then E(X+Y)=7
Six nonnegative integers, each of which does not exceed 20,000. The integers correspond to the number of couples in a population possessing each genotype pairing for a given factor. In order, the six given integers represent the number of couples having the following genotypes:
The expected number of offspring displaying the dominant phenotype in the next generation, under the assumption that every couple has exactly two offspring.
For this problem, we will be using a python library called six:
import six def iev(a,b,c,d,e,f): return ((4 * a + 4 * b + 4 * c + 3 * d + 2 * e) / 2.0) def main(): line = six.moves.input() tokens = line.split(' ') a = int(tokens) b = int(tokens) c = int(tokens) d = int(tokens) e = int(tokens) f = int(tokens) answer = iev(a,b,c,d,e,f) print(answer)
Call the function with the values given in the dataset:
iev(17849, 18970, 19661, 17001, 18174, 19701)
This problem is taken from rosalind.info. Please visit ROSALIND to find out more about Bioinformatics problems. You may also clink onto links for the definitions of each terminology.