Bayesian Analysis — An Introduction

Yash Chaudhary
4 min readDec 21, 2020

--

Bayesian statistics is a particular approach of applying probability to statistical problems. It provides us with mathematical tools to update our beliefs about random events considering new data or evidence about those events.

We may have a prior belief about an event, but our beliefs are likely to change when new evidence is brought to light. Bayesian statistics gives us a solid mathematical means of incorporating our prior beliefs, and evidence, to produce new posterior beliefs.

Bayesian statistics provides us with mathematical tools to rationally update our subjective beliefs considering new data or evidence.

This article will introduce readers to Bayesian Analysis. It is one of the must-know topics in Data Science.

Article Aim

This article will provide an overview of the following concepts:

  1. What Is Bayesian Analysis?
  2. What Is Bayes Theorem?
  3. Bayesian vs Frequentist Analysis
  4. Conclusion

Bayesian Analysis is used in many sectors including insurance, healthcare, e-commerce, sports, law amongst others. It is heavily used in classification algorithms whereby we attempt to classify/group text or numbers into their appropriate classes.

Bayesian Analysis

Bayesian analysis is a statistical analysis that answers research questions about unknown parameters of statistical models by using probability statements. Bayesian analysis rests on the assumption that all model parameters are random quantities and thus can incorporate prior knowledge.

Bayesian analysis follows a simple rule of probability, the Bayes rule, which provides a formalism for combining prior information with evidence from the data at hand.

The Bayes rule is used to form the posterior distribution of model parameters. The posterior distribution results from updating the prior knowledge about model parameters with evidence from the observed data. Bayesian analysis uses the posterior distribution to form various summaries for the model parameters including point estimates such as posterior means, medians, percentiles. Moreover, all statistical tests about model parameters can be expressed as probability statements based on the estimated posterior distribution.

Bayes Theorem

In simplistic terms, the Bayes’ theorem calculates the posterior probability of an event. It uses the prior probability along with the likelihood probability of the event. Bayes’ theorem is a framework that enables us in calculating the probability of one event occurring, given that the other event has already occurred.

Bayes Rule for Bayesian Inference

P (θ |D) = P (D |θ) *P(θ) / P(D)

Where:

  • P(θ) is the prior. This is the strength in our belief of θ without considering the evidence D. Our prior view on the probability of our belief.
  • P (θ |D) is the posterior. This is the (refined) strength of our belief of θ once the evidence D has been considered.
  • P (D| θ) is the likelihood. This is the probability of seeing the data D as generated by a model with parameter θ. Likelihood refers to the probability of observing what we did given that our priors are true.
  • P(D) is the the normalizing factor is used to ensure that the posterior is a probability and is not above 1.

Bayesian vs Frequentist Analysis

Bayesian and frequentist approaches have very different philosophies about what is considered fixed and, therefore, have very different interpretations of the results. The Bayesian approach assumes that the observed data sample is fixed and that model parameters are random. The posterior distribution of parameters is estimated based on the observed data and the prior distribution of parameters and is used for inference. The frequentist approach assumes that the observed data are a repeatable random sample and that parameters are unknown but fixed and constant across the repeated samples. The inference is based on the sampling distribution of the data or of the data characteristics (statistics). In other words, Bayesian analysis answers questions based on the distribution of parameters conditional on the observed sample, whereas frequentist analysis answers questions based on the distribution of statistics obtained from repeated hypothetical samples, which would be generated by the same process that produced the observed sample given that parameters are unknown but fixed.

Frequentist analysis is entirely data-driven and strongly depends on whether the data assumptions required by the model are met. On the other hand, Bayesian analysis provides a more robust estimation approach by using not only the data at hand but also some existing information or knowledge about model parameters.

Bayesian inference

Uses probabilities for both hypotheses and data.

Depends on the prior and likelihood of observed data.

Requires one to know or construct a ‘subjective prior’.

Dominated statistical practice before the 20th century.

May be computationally intensive due to integration over many parameters.

Frequentist inference

Never uses or gives the probability of a hypothesis (no prior or posterior).

Depends on the likelihood P (D | H) for both observed and unobserved data.

Does not require a prior.

Dominated statistical practice during the 20th century.

Tends to be less computationally intensive.

Conclusion

The universality of the Bayesian approach is probably its main methodological advantage to the traditional frequentist approach. Bayesian inference is based on a single rule of probability, the Bayes rule, that is applied to all parametric models. This makes the Bayesian approach universal and greatly facilitates its application and interpretation. The frequentist approach, however, relies on a variety of estimation methods designed for specific statistical problems and models. Often, inferential methods designed for one class of problems cannot be applied to another class of models.

Despite the conceptual and methodological advantages of the Bayesian approach, its application in practice is still considered controversial sometimes. There are two main reasons for this — the presumed subjectivity in specifying prior information and the computational challenges in implementing Bayesian methods.

--

--