Bishop의 PRML 책에서 Bayesian probability에 관한 내용이다.
1. Bayesian Probabilities
확률에는 두 가지 관점이 있다.
- frequentist view: frequencies of random, repeatable events
- Bayesian view: quantification of uncertainty
예를 들어 북극의 얼음이 녹아 없어지는 사건이 있다고 하자. 이런 사건은 반복적으로 일어나는 사건이 아니기 때문에 frequentist view로 확률을 계산할 수는 없다. 하지만 얼음이 얼마나 빨리 녹고 있는지를 관찰(e.g. 인공위성 촬영 사진)할 수 있다면 녹아서 없어지는 사건의 가능성을 측정해볼 수는 있을 것이다.
In such circumstances, we would like to be able to quantify our expression of uncertainty and make precise revisions of uncertainty in the light of new evidence, as well as subsequently to be able to take optimal actions or decisions as a consequence. This can all be achieved through the elegant, and very general, Bayesian interpretation of probability.
Bayesian 관점에서의 확률 이론을 기반으로 한다면 모델의 파라미터 $w$에 대한 uncertainty를 표현할 수 있을 뿐만 아니라, 어떤 모델을 선택해야하는지 또한 추정해볼 수 있다.
Bayes theorem은 다음과 같다.
$$p(\textbf{w}|\mathcal{D}) = \frac{p(\mathcal{D}|\textbf{w})p(\textbf{w})}{p(\mathcal{D})}$$
We capture our assumptions about $\textbf{w}$, before observing the data, in the form of a prior probability distribution $\textbf{w}$. The effect of the observed data $\mathcal{D} = \{ t_{1}, \cdots, t_{N}\}$ is expressed through the conditional probability $p(\mathcal{D}|\textbf{w})$
Bayes Theorem allows us to evaluate the uncertainty in $\textbf{w}$ after we have observed $\mathcal{D}$, in the form of the posterior probability $p(\textbf{w}|\mathcal{D})$
The quantity $p(\mathcal{D}|\textbf{w})$ on the right-hand side of Bayes’ theorem is evaluated for the observed data set $\mathcal{D}$ and can be viewed as a function of the parameter vector $\textbf{w}$, in which case it is called the likelihood function. It expresses how probable the observed data set is for different settings of the parameter vector $\textbf{w}$. Note that the likelihood is not a probability distribution over $\textbf{w}$, and its integral with respect to w does not (necessarily) equal one.
Denominator $p(\mathcal{D})$ is the normalization constant, which ensures that the posterior distribution on the left-hand side is a valid probability density and integrates to one. we can express the denominator in Bayes’ theorem in terms of the prior distribution and the likelihood function.
$$p(\mathcal{D}) = \int p(\mathcal{D|\textbf{w}})p(\textbf{w})d\textbf{w}$$
Bayes theorem은 아래와 같은 단어로 표현할 수 있다.
$$\text{posterior} \ \propto \ \text{likelihood} \times \text{prior}$$
$\text{posterior}, \text{likelihood}, \text{prior}$ 모두 $\textbf{w}$에 대한 함수이다.
Bayesian과 frequentist paradigm 모두에서 likelihood function $p(\mathcal{D|\textbf{w}})$가 핵심적인 역할을 한다. 하지만 접근 방식은 근본적으로 다르다.
In a frequentist setting, $\textbf{w}$ is considered to be a fixed parameter, whose value is determined by some form of ‘estimator’, and error bars on this estimate are obtained by considering the distribution of possible data sets $\mathcal{D}$.
A widely used frequentist estimator is maximum likelihood, in which w is set to the value that maximizes the likelihood function $p(\mathcal{D}|\textbf{w})$. This corresponds to choosing the value of $\textbf{w}$ for which the probability of the observed data set is maximized.
From the Bayesian viewpoint there is only a single data set $\mathcal{D}$ (namely the one that is actually observed), and the uncertainty in the parameters is expressed through a probability distribution over $\textbf{w}$.
One advantage of the Bayesian viewpoint is that the inclusion of prior knowledge arises naturally. Suppose, for instance, that a fair-looking coin is tossed three times and lands heads each time. A classical maximum likelihood estimate of the probability of landing heads would give 1, implying that all future tosses will land heads! By contrast, a Bayesian approach with any reasonable prior will lead to a much less extreme conclusion
'통계이론' 카테고리의 다른 글
Probability (0) | 2023.01.14 |
---|