PMF, PDF and CDF in Machine Learning

PMF, PDF and CDF in Machine Learning

Random variables and the various distribution functions which form the foundations of Machine Learning

Table of contents

  • Introduction
  • Random Variable and its types
  • PDF (probability density function)
  • PMF (Probability Mass function)
  • CDF (Cumulative distribution function)
  • Example
  • Further Reading

Introduction

PDF and CDF are commonly used techniques in the Exploratory data analysis to finding the probabilistic relation between the variables.

Before going through the contents in this page ,first go through the fundamental concepts like random variable, pmf, pdf and cdf.

Random variable

A random variable is a variable whose value is unknown to the function i.e, the value is depends upon the outcome of experiment

For example, while throwing a dice, the variable value is depends upon the outcome.

Mostly random variables are used for regression analysis to determine statistical relationship between each other. There are 2 types of random variable:

1 ? Continuous random variable

2 ? Discrete random variable

Continuous random variable:- A variable which having the values between the range/interval and take infinite number of possible ways is called Continuous random variable . OR the variables whose values are obtained by measuring is called Continuous random variable. For e.g, A average height of 100 peoples, measurement of rainfall

Discrete Random Variable:-A variable which takes countable number of distinct values. OR the variables whose values are obtained by counting is called Discrete Random Variable. For e.g, number of students present in class

PDF (Probability Density Function):-

Image for postThe formula for PDF

PDF is a statistical term that describes the probability distribution of the continues random variable

PDF most commonly follows the Gaussian Distribution. If the features / random variables are Gaussian distributed then PDF also follows Gaussian Distribution. On PDF graph the probability of single outcome is always zero, this happened because the single point represents the line which doesn?t cover the area under the curve.

PMF (Probability Mass Function):-

Image for postFig:- Formula for PMF

PMF is a statistical term that describes the probability distribution of the Discrete random variable

People often get confused between PDF and PMF. The PDF is applicable for continues random variable while PMF is applicable for discrete random variable For e.g, Throwing a dice (You can only select 1 to 6 numbers (countable) )

CDF (Cumulative Distribution Function):-

Image for postFig:- Formula for CDF

PMF is a way to describe distribution but its only applicable for discrete random variables and not for continuous random variables. The cumulative distribution function is applicable for describing the distribution of random variables either it is continuous or discrete

For example, if X is the height of a person selected at random then F(x) is the chance that the person will be shorter than x. If F(180 cm)=0.8. then there is an 80% chance that a person selected at random will be shorter than 180 cm (equivalently, a 20% chance that they will be taller than 180cm)

Python example for PDF and CDF on Iris Dataset:-

The iris data set contains the following data:-

Image for postFig:- Flower image from iris dataset

The detailed explanation of iris data-set is here

PDF On Iris:-

Image for postPDF for [?species?]== ?setosa? on petal length

CDF on Iris:-

Image for postCDf of iris_setosa using petal length

Both PDF and CDF visualisation:-

Image for postPdf and Cdf

You will find the detailed explanation with python code on Github Here.

References:

Iris Dataset

I am Ritchie Ng, a machine learning engineer specializing in deep learning and computer vision. Check out my code?

www.ritchieng.com

Probability Density Function (PDF)

Probability density function (PDF) is a statistical expression that defines a probability distribution for a continuous?

www.investopedia.com

Cumulative Distribution Function

The PMF is one way to describe the distribution of a discrete random variable. As we will see later on, PMF cannot be?

www.probabilitycourse.com

What is CDF – Cumulative distribution function?

begingroup$ If you have a quantity $X$ that takes some value at random, the cumulative density function $F(x)$ gives?

math.stackexchange.com

18