The Fundamental Theorem of Calculus and Probability Distributions

The Fundamental Theorem of Calculus and Its Relationship with Density Functions and Cumulative Distribution Functions

The Fundamental Theorem of Calculus (FTC) bridges the concepts of differentiation and integration, serving as a cornerstone in mathematical analysis. In probability theory, this theorem establishes a crucial relationship between probability density functions (PDFs) and cumulative distribution functions (CDFs). In this section, we will explore this relationship and demonstrate how the FTC applies to probability distributions.

The Fundamental Theorem of Calculus

The FTC consists of two parts:

First Part (FTC1): Relates the integral of a function to its antiderivative.
Second Part (FTC2): States that the derivative of the integral of a function is the original function.

First Part (FTC1)

If \( f \) is continuous on \([a, b]\) and \( F \) is an antiderivative of \( f \) on \([a, b]\), then:

\( \displaystyle \int_a^b f(x) \, dx = F(b) - F(a) \)

Second Part (FTC2)

If \( f \) is continuous on an open interval \( I \) and \( a \) is any point in \( I \), then for every \( x \) in \( I \):

\( \displaystyle \frac{d}{dx} \left( \int_a^x f(t) \, dt \right) = f(x) \)

Relationship with Density Functions and Cumulative Distribution Functions

In probability theory, the cumulative distribution function (CDF) of a continuous random variable \( X \) is defined as:

\( F_X(x) = P(X \leq x) = \displaystyle \int_{-\infty}^x f_X(t) \, dt \)

where \( f_X(x) \) is the probability density function (PDF) of \( X \).

Applying the FTC to Probability Distributions

Using the FTC, we can establish the relationship between PDFs and CDFs:

From PDF to CDF:
The CDF is the integral of the PDF:

\( F_X(x) = \displaystyle \int_{-\infty}^x f_X(t) \, dt \)
From CDF to PDF:
The PDF is the derivative of the CDF:

\( f_X(x) = \dfrac{d}{dx} F_X(x) \)

Example: Standard Normal Distribution

Consider the standard normal distribution with PDF:

\( f_X(x) = \dfrac{1}{\sqrt{2\pi}} e^{-\dfrac{x^2}{2}} \)

The CDF is:

\( F_X(x) = \displaystyle \int_{-\infty}^x \dfrac{1}{\sqrt{2\pi}} e^{-\dfrac{t^2}{2}} \, dt \)

Although the CDF does not have a closed-form expression, its derivative recovers the PDF:

\( \dfrac{d}{dx} F_X(x) = f_X(x) \)

Generating Realizations from a Discrete Univariate Probability Distribution

In this exercise, we will:

Generate realizations from a discrete univariate probability distribution with arbitrary probabilities.
Graphically show the convergence of the empirical distribution to the theoretical distribution as the sample size increases.
Compute the mean and variance using recursive methods (e.g., Knuth's/Welford's algorithms) and compare these results with the theoretical mean and variance, discussing the relationship.

Defining the Arbitrary Discrete Distribution

Let's define a discrete random variable \( X \) that takes values in \( \{1, 2, 3, 4, 5\} \) with the following arbitrary probabilities:

\( x \)	1	2	3	4	5
\( P(X = x) \)	0.1	0.2	0.4	0.2	0.1

Theoretical Mean and Variance

Mean (\( \mu \)):

\( \mu = E[X] = \sum_{i=1}^{5} x_i P(X = x_i) = (1)(0.1) + (2)(0.2) + (3)(0.4) + (4)(0.2) + (5)(0.1) = 3 \)

Variance (\( \sigma^2 \)):

\( \begin{align*} \sigma^2 &= E[(X - \mu)^2] = \sum_{i=1}^{5} (x_i - \mu)^2 P(X = x_i) \\ &= (1 - 3)^2(0.1) + (2 - 3)^2(0.2) + (3 - 3)^2(0.4) + (4 - 3)^2(0.2) + (5 - 3)^2(0.1) \\ &= (4)(0.1) + (1)(0.2) + (0)(0.4) + (1)(0.2) + (4)(0.1) \\ &= 0.4 + 0.2 + 0 + 0.2 + 0.4 = 1.2 \end{align*} \)

Generating Random Samples

We will use the inverse transform sampling method to generate random samples from the defined discrete distribution.

Steps:

Compute the cumulative distribution function (CDF).
Generate a uniform random number \( U \) in [0, 1].
Find the smallest \( x \) such that \( F_X(x) \geq U \).
Assign this \( x \) as the generated value.

Empirical vs. Theoretical Distribution

We will generate samples of increasing sizes (\( N = 10^2, 10^3, 10^4, 10^5 \)) and graphically show how the empirical distribution converges to the theoretical distribution.

Computing Mean and Variance Using Welford's Algorithm

Welford's algorithm provides a numerically stable method for computing the mean and variance incrementally.

Algorithm Steps:

Initialization: Set \( \mu_0 = 0 \) and \( M_2 = 0 \).
For each new data point \( x_i \), update:
- \( n = n + 1 \)
- \( \delta = x_i - \mu_{n-1} \)
- \( \mu_n = \mu_{n-1} + \delta / n \)
- \( M_2 = M_2 + \delta \times (x_i - \mu_n) \)
At any point, compute:
- Mean: \( \mu_n \)
- Variance: \( \sigma^2 = M_2 / n \)

Comparing the Results

We will compare the empirical mean and variance computed using Welford's algorithm with the theoretical values, discussing the convergence and the relationship between them.

Try the Simulation

Click here to Start the Simulation