EntroBeam
Search…
Probability distribution

# Introduce

A probability distribution is a mathematical description of the probabilities of events, subsets of the sample space. The sample space, often denoted by
$\Omega$
is the set of all possible outcomes of a random phenomenon being observed; it may be any set: a set of real numbers, a set of vectors, a set of arbitrary non-numerical values, etc. For example, the sample space of a coin flip would be
$\Omega$
= {heads, tails}. To define probability distributions for the specific case of random variables (so the sample space can be seen as a numeric set), it is common to distinguish between discrete and continuous random variables. In the discrete case, it is sufficient to specify a probability mass function
$p$
assigning a probability to each possible outcome: for example, when throwing a fair die, each of the six values 1 to 6 has the probability 1/6. The probability of an event is then defined to be the sum of the probabilities of the outcomes that satisfy the event; for example, the probability of the event "the die rolls an even value" is
$p(2)+p(4)+p(6)=\cfrac16+\cfrac16+\cfrac16=\cfrac12 = 50\%$
The probability mass function
$p(S)$
specifies the probability distribution for the sum
$S$
of counts from two dice. (source)
In contrast, when a random variable takes values from a continuum then typically, any individual outcome has probability zero and only events that include infinitely many outcomes, such as intervals, can have positive probability. Continuous probability distributions can be described in several ways. The probability density function describes the infinitesimal probability of any given value, and the probability that the outcome lies in a given interval can be computed by integrating the probability density function over that interval. An alternative description of the distribution is by means of the cumulative distribution function, which describes the probability that the random variable is no larger than a given value (i.e.,
$P(X
for some
$x$
). The cumulative distribution function is the area under the probability density function from
$-\infty$
to
$x$
.
The probability density function of the 'Normal Distribution', also called Gaussian or bell curve, the most important continuous random distribution. (source)

# HyperGeometric Distribution

In probability theory and statistics, the hypergeometric distribution is a discrete probability distribution that describes the probability of
$k$
successes (random draws for which the object drawn has a specified feature) in
$n$
draws, without replacement, from a finite population of size
$N$
that contains exactly
$K$
objects with that feature, wherein each draw is either a success or a failure. In contrast, the binomial distribution describes the probability of
$k$
successes in
$n$
draws with replacement.
Elaborate a bit, The application of the hypergeometric distribution is sampling without replacement. Think of an urn with two colors of marbles, red and green. Define drawing a green marble as a success and drawing a red marble as a failure (analogous to the binomial distribution). If the variable
$N$
describes the number of all marbles in the urn (see contingency table below) and
$K$
describes the number of green marbles, then
$N-K$
corresponds to the number of red marbles. In this example,
$X$
is the random variable whose outcome is
$k$
, the number of green marbles actually drawn in the experiment.
Assume that there are 3 green and 6 red marbles in the urn. Standing next to the urn, you close your eyes and draw 2 marbles without replacement.
What is the probability that exactly 1 of the 2 are green? Note that although we are looking at success/failure, the data are not accurately modeled by the binomial distribution, because the probability of success on each trial is not the same, as the size of the remaining population changes as we remove each marble.
/
Drawn
Not drawn
Total
Green marbles
$k =\ \ 1$
$K-k=\ \ 2$
$K=\ \ 3$
Red marbles
$n-k=\ \ 1$
$N+k-n-K=\ \ 5$
$N-K=\ \ 6$
Total
$n=\ \ 2$
$N-n=\ \ 7$
$N=\ \ 9$

## Random distribution elements Issues

The formula for the probability of drawing exactly k from the hypergeometric distribution is as follows.
$P(X=k)=f(k;N,K,n)= \frac{{K \choose k}{N-K \choose n-k}}{{N \choose n}}$
,
${a \choose b} = \frac{a!}{b!(a-b)!}$
is a binomial coefficient
$P(X=k)=f(1;9,3,2)= \frac{{3 \choose 1}{9-3 \choose 2-1}}{{9 \choose 2}}=\frac{3\times 6}{36}=\frac{18}{36}=0.5=50\%$
It obviously has a 50% chance of success, but the issue of reliability is another issue entirely.
Distribution of marbles in an urn.
As you can see from the figure above, the hypergeometric probability itself is reasonable, but the distribution of the marbles is not reliable. You may want to choose the distribution yourself, but that exactly loses randomness; furthermore, it goes against closing your eyes.
What is the probability that exactly 1 of the 2 are green? The formula is correct and 50% certain. However, The marbles may be distributed favorably for the draws, conversely, maybe distributed unfavorably. Of course, no one knows and must not see how the marbles are distributed inside the urn. But, even with a 50% chance, in some cases, even if you try 10 times, you may not be successful even once; conversely, in some cases, all 10 attempts may be successful.
The implication of the distribution of marbles is too significant to dismiss this phenomenon as just randomness and their results, or simply luck. Therefore, it is essential to have a reliable basis for the premise that marbles distribution is unbiased. In the case of the binomial distribution, there is no limit to the number of trials, and the more trials, the closer to the normal distribution. However, for a probability distribution with a limited number of trials, such as a hypergeometric distribution, the distribution of marbles must be more evenly, i.e., unbiased, than a binomial distribution. It is possible to adjust the hypergeometric distribution to fit the normal distribution from the initial state by sampling several times. Still, in this case, it is a fact that someone may already know the result.
This means that processes must disclose entropy sources and random number generators, which should not be transparently disclosed. It's a pretty bizarre circumstance, but if compare randomness to a storm, randomness also includes the calmest part, like the eye of a storm. EntroBeam is trying to solve this process, which can also be called the eye of randomness, with the Entropy registry and Entropy chain.