Skip to content

Latest commit

 

History

History
214 lines (152 loc) · 11 KB

File metadata and controls

214 lines (152 loc) · 11 KB

What's a probability distribution?

Probability distributions are mathematical functions that give the probabilities of a range or set of outcomes. These outcomes can be the result of an experiment or procedure, such as tossing a coin or rolling dice. They can also be the result of a physical measurement, such as measuring the temperature of an object, counting how many electrons are spin up, etc. Broadly speaking, we can classify probability distributions into two categories - discrete probability distributions and continuous probability distributions.

Discrete Probability Distributions

It's intuitive for us to understand what a discrete probability distribution is. For example, we understand the outcomes of a coin toss very well, and also that of a dice roll. For a single coin toss, we know that the probability of getting heads $$(H)$$ is half, or $$P(H) = \frac{1}{2}$$. Similarly, the probability of getting tails $$(T)$$ is $$P(T) = \frac{1}{2}$$. Formally, we can write the probability distribution for such a coin toss as,

$$ P(n) = \begin{matrix} \displaystyle \frac 1 2 &;& n \in \left{H,T\right}. \end{matrix} $$

Here, $$n$$ denotes the outcome, and we used the "set notation", $$n \in\left{H,T\right}$$, which means "$$n$$ belongs to a set containing $$H$$ and $$T$$". From the above equation, we can also assume that any other outcome for $$n$$ (such as landing on an edge) is incredibly unlikely, impossible, or simply "not allowed" (for example, just toss again if it does land on its edge!).

For a probability distribution, it's important to take note of the set of possibilities, or the domain of the distribution. Here, $$\left{H,T\right}$$ is the domain of $$P(n)$$, telling us that $$n$$ can only be either $$H$$ or $$T$$.

If we use a different system, the outcome $$n$$ could mean other things. For example, it could be a number like the outcome of a die roll which has the probability distribution,

$$ P(n) = \begin{matrix} \displaystyle\frac 1 6 &;& n \in [![1,6]!] \end{matrix}. $$ This is saying that the probability of $$n$$ being a whole number between $$1$$ and $$6$$ is $$1/6$$, and we assume that the probability of getting any other $$n$$ is $$0$$. This is a discrete probability function because $$n$$ is an integer, and thus only takes discrete values.

Both of the above examples are rather boring, because the value of $$P(n)$$ is the same for all $$n$$. An example of a discrete probability function where the probability actually depends on $$n$$, is when $$n$$ is the sum of numbers on a roll of two dice. In this case, $$P(n)$$ is different for each $$n$$ as some possibilities like $$n=2$$ can happen in only one possible way (by getting a $$1$$ on both dice), whereas $$n=4$$ can happen in $$3$$ ways ($$1$$ and $$3$$; or $$2$$ and $$2$$; or $$3$$ and $$1$$).

The example of rolling two dice is a great case study for how we can construct a probability distribution, since the probability varies and it is not immediately obvious how it varies. So let's go ahead and construct it!

Let's first define the domain of our target $$P(n)$$. We know that the lowest sum of two dice is $$2$$ (a $$1$$ on both dice), so $$n \geq 2$$ for sure. Similarly, the maximum is the sum of two sixes, or $$12$$, so $$n \leq 12$$ also.

So now we know the domain of possibilities, i.e., $$n \in [![2,12]!]$$. Next, we take a very common approach - for each outcome $$n$$, we count up the number of different ways it can occur. Let's call this number the frequency of $$n$$, $$f(n)$$. We already mentioned that there is only one way to get $$n=2$$, by getting a pair of $$1$$s. By our definition of the function $$f$$, this means that $$f(2)=1$$. For $$n=3$$, we see that there are two possible ways of getting this outcome: the first die shows a $$1$$ and the second a $$2$$, or the first die shows a $$2$$ and the second a $$1$$. Thus, $$f(3)=2$$. If you continue doing this for all $$n$$, you may see a pattern (homework for the reader!). Once you have all the $$f(n)$$, we can visualize it by plotting $$f(n)$$ vs $$n$$, as shown below.

<FIG> Die Roll

We can see from the plot that the most common outcome for the sum of two dice is a $$n=7$$, and the further away from $$n=7$$ you get, the less likely the outcome. Good to know, for a prospective gambler!

Normalization

The $$f(n)$$ plotted above is technically NOT the probability $$P(n)$$ – because we know that the sum of all probabilities should be $$1$$, which clearly isn't the case for $$f(n)$$. But we can get the probability by dividing $$f(n)$$ by the total number of possibilities, $$N$$. For two dice, that is $$N = 6 \times 6 = 36$$, but we could also express it as the sum of all frequencies,

$$ N = \sum_n f(n), $$

which would also equal to $$36$$ in this case. So, by dividing $$f(n)$$ by $$\sum_n f(n)$$ we get our target probability distribution, $$P(n)$$. This process is called normalization and is crucial for determining almost any probability distribution. So in general, if we have the function $$f(n)$$, we can get the probability as

$$ P(n) = \frac{f(n)}{\displaystyle\sum_{n} f(n)}. $$

Note that $$f(n)$$ does not necessarily have to be the frequency of $$n$$ – it could be any function which is proportional to $$P(n)$$, and the above definition of $$P(n)$$ would still hold. It's easy to check that the sum is now equal to $$1$$, since

$$ \sum_n P(n) = \frac{\displaystyle\sum_{n}f(n)}{\displaystyle\sum_{n} f(n)} = 1. $$

Once we have the probability function $$P(n)$$, we can calculate all sorts of probabilites. For example, let's say we want to find the probability that $$n$$ will be between two integers $$a$$ and $$b$$, inclusively (also including $$a$$ and $$b$$). For brevity, we will use the notation $$\mathbb{P}(a \leq n \leq b)$$ to denote this probability. And to calculate it, we simply have to sum up all the probabilities for each value of $$n$$ in that range, i.e.,

$$ \mathbb{P}(a \leq n \leq b) = \sum_{n=a}^{b} P(n). $$

Probability Density Functions

What if instead of a discrete variable $$n$$, we had a continuous variable $$x$$, like temperature or weight? In that case, it doesn't make sense to ask what the probability is of $$x$$ being exactly a particular number – there are infinite possible real numbers, after all, so the probability of $$x$$ being exactly any one of them is essentially zero! But it does make sense to ask what the probability is that $$x$$ will be between a certain range of values. For example, one might say that there is $$50%$$ chance that the temperature tomorrow at noon will be between $$5$$ and $$15$$, or $$5%$$ chance that it will be between $$16$$ and $$16.5$$. But how do we put all that information, for every possible range, in a single function? The answer is to use a probability density function.

What does that mean? Well, suppose $$x$$ is a continous quantity, and we have a probability density function, $$P(x)$$ which looks like

<FIG> probability density

Now, if we are interested in the probability of the range of values that lie between $$x_0$$ and $$x_0 + dx$$, all we have to do is calculate the area of the green sliver above. This is the defining feature of a probability density function:

the probability of a range of values is the area of the region under the probability density curve which is within that range.

So if $$dx$$ is infinitesimally small, then the area of the green sliver becomes $$P(x)dx$$, and hence,

$$ \mathbb{P}(x_0 \leq x \leq x_0 + dx) = P(x)dx. $$

So strictly speaking, $$P(x)$$ itself is NOT a probability, but rather the probability is the quantity $$P(x)dx$$, or any area under the curve. That is why we call $$P(x)$$ the probability density at $$x$$, while the actual probability is only defined for ranges of $$x$$.

Thus, to obtain the probability of $$x$$ lying within a range, we simply integrate $$P(x)$$ between that range, i.e.,

$$ \mathbb{P}(a \leq x \leq b ) = \int_a^b P(x)dx. $$

This is analagous to finding the probability of a range of discrete values from the previous section:

$$ \mathbb{P}(a \leq n \leq b) = \sum_{n=a}^{b} P(n). $$

The fact that all probabilities must sum to $$1$$ translates to

$$ \int_D P(x)dx = 1. $$

where $$D$$ denotes the domain of $$P(x)$$, i.e., the entire range of possible values of $$x$$ for which $$P(x)$$ is defined.

Normalization of a Density Function

Just like in the discrete case, we often first calculate some density or frequency function $$f(x)$$, which is NOT $$P(x)$$, but proportional to it. We can get the probability density function by normalizing it in a similar way, except that we integrate instead of sum:

$$ P(\mathbf{x}) = \frac{f(\mathbf{x})}{\int_D f(\mathbf{x})d\mathbf{x}}. $$

For example, consider the following Gaussian function (popularly used in normal distributions),

$$ f(x) = e^{-x^2}, $$

which is defined for all real numbers $$x$$. We first integrate it (or do a quick google search, as it is rather tricky) to get

$$ N = \int_{-\infty}^{\infty} e^{-x^2} dx = \sqrt{\pi}. $$

Now we have a Gaussian probability distribution,

$$ P(x) = \frac{1}{N} e^{-x^2} = \frac{1}{\sqrt{\pi}} e^{-x^2}. $$

In general, normalization can allow us to create a probability distribution out of almost any function $$f(\mathbf{x})$$. There are really only two rules that $$f(\mathbf{x})$$ must satisfy to be a candidate for a probability density distribution:

  1. The integral of $$f(\mathbf{x})$$ over any subset of $$D$$ (denoted by $$S$$) has to be non-negative (it can be zero): $$ \int_{S}f(\mathbf{x})d\mathbf{x} \geq 0. $$
  2. The following integral must be finite: $$ \int_{D} f(\mathbf{x})d\mathbf{x}. $$
<script> MathJax.Hub.Queue(["Typeset",MathJax.Hub]); </script>

License

Images/Graphics
Text

The text of this chapter was written by K. Shudipto Amin and is licensed under the Creative Commons Attribution-ShareAlike 4.0 International License.

Pull Requests

After initial licensing (#560), the following pull requests have modified the text or graphics of this chapter:

  • none