Proving the Cauchy-Schwarz Inequality and Exploring Statistical Concepts


Introduction

This article covers three fundamental topics in statistics and probability theory:

  1. Proof of the Cauchy-Schwarz Inequality: A simple yet powerful inequality with applications in various fields, including statistics, linear algebra, and analysis.
  2. Independence vs. Uncorrelation: A reflection on the conceptual differences between statistical independence and uncorrelated random variables, along with measures to assess them.
  3. Euler-Maruyama Simulator Enhancement: Developing a unified simulation framework for stochastic differential equations (SDEs) using the Euler-Maruyama method.

Additionally, an optional exercise explores the derivation of regression coefficients using the least squares method and their relationship with the coefficient of determination, \( R^2 \).

1. Proving the Cauchy-Schwarz Inequality

The Cauchy-Schwarz inequality is a fundamental result in linear algebra and analysis. In the context of real-valued sequences or vectors, it states that for any two sequences \( \{x_i\} \) and \( \{y_i\} \):

\( \left( \sum_{i=1}^{n} x_i y_i \right)^2 \leq \left( \sum_{i=1}^{n} x_i^2 \right) \left( \sum_{i=1}^{n} y_i^2 \right) \)

Simplest Proof Using the Schwarz Inequality

Consider any real numbers \( x_i \) and \( y_i \) for \( i = 1, 2, \dots, n \). Define the real-valued function:

\( f(t) = \sum_{i=1}^{n} (x_i - t y_i)^2 \)

Since \( f(t) \geq 0 \) for all real \( t \), it must reach its minimum at some real value of \( t \). Expand \( f(t) \):

\( f(t) = \sum_{i=1}^{n} (x_i^2 - 2 x_i y_i t + y_i^2 t^2 ) = \sum_{i=1}^{n} x_i^2 - 2t \sum_{i=1}^{n} x_i y_i + t^2 \sum_{i=1}^{n} y_i^2 \)

Consider \( f(t) \) as a quadratic function in \( t \):

\( f(t) = A t^2 - 2B t + C \)

Where:

Since \( f(t) \geq 0 \) for all \( t \), the discriminant of this quadratic must be non-positive:

\( \Delta = (-2B)^2 - 4AC = 4B^2 - 4AC \leq 0 \)

Simplify:

\( 4B^2 - 4AC \leq 0 \quad \Rightarrow \quad B^2 \leq AC \)

Therefore:

\( \left( \sum_{i=1}^{n} x_i y_i \right)^2 \leq \left( \sum_{i=1}^{n} x_i^2 \right) \left( \sum_{i=1}^{n} y_i^2 \right) \)

Normalizing Denominator (Correlation Coefficient \( r \))

In statistics, the Cauchy-Schwarz inequality underpins the fact that the absolute value of the Pearson correlation coefficient \( r \) does not exceed 1.

The Pearson correlation coefficient is defined as:

\( r = \dfrac{\sum_{i=1}^{n} (x_i - \overline{x})(y_i - \overline{y})}{\sqrt{\sum_{i=1}^{n} (x_i - \overline{x})^2 \sum_{i=1}^{n} (y_i - \overline{y})^2}} \)

Applying the Cauchy-Schwarz inequality to centered variables \( x_i - \overline{x} \) and \( y_i - \overline{y} \), we ensure that \( -1 \leq r \leq 1 \).

2. Independence vs. Uncorrelation: Conceptual Differences and Measures

Definitions

Statistical Independence: Two random variables \( X \) and \( Y \) are independent if the occurrence of one does not affect the probability distribution of the other. Formally:

\( P(X = x \text{ and } Y = y) = P(X = x) \times P(Y = y) \)

Uncorrelation: Two random variables \( X \) and \( Y \) are uncorrelated if their covariance is zero:

\( \text{Cov}(X, Y) = E[(X - E[X])(Y - E[Y])] = 0 \)

Conceptual Differences

Dependency vs. Linear Relationship:

Implications:

Examples

Independent Variables Are Uncorrelated:

Let \( X \) and \( Y \) be independent random variables with finite variances. Then \( \text{Cov}(X, Y) = 0 \).

Uncorrelated but Dependent Variables:

Let \( X \) be a standard normal variable, and define \( Y = X^2 \). Then \( E[X] = 0 \), \( E[Y] = E[X^2] = 1 \), but \( \text{Cov}(X, Y) = E[XY] - E[X]E[Y] = 0 - 0 \times 1 = 0 \).

\( X \) and \( Y \) are uncorrelated but clearly dependent since \( Y \) is determined by \( X \).

Measures to Assess Independence and Uncorrelation

Implications in Statistical Modeling

3. Euler-Maruyama Simulator Enhancement

Developing a Unified Simulation Framework for SDEs

The Euler-Maruyama method is a numerical technique used to approximate solutions to stochastic differential equations (SDEs). Enhancing the simulator involves creating a general framework that can handle various types of SDEs.

Goals

Implementation Steps

Define the Base SDE Class

class SDE:
    def __init__(self, drift_function, diffusion_function, initial_value, time_grid):
        self.drift = drift_function
        self.diffusion = diffusion_function
        self.x0 = initial_value
        self.time_grid = time_grid
            
Implement the Euler-Maruyama Method

def simulate(self):
    x = np.zeros(len(self.time_grid))
    x[0] = self.x0
    dt = self.time_grid[1] - self.time_grid[0]
    for i in range(1, len(self.time_grid)):
        t = self.time_grid[i-1]
        x[i] = x[i-1] + self.drift(x[i-1], t) * dt + self.diffusion(x[i-1], t) * np.sqrt(dt) * np.random.normal()
    return x
            
Example Usage

Ornstein-Uhlenbeck Process:


def ou_drift(x, t):
    theta = 0.7
    mu = 1.5
    return theta * (mu - x)

def ou_diffusion(x, t):
    sigma = 0.2
    return sigma

sde = SDE(drift_function=ou_drift, diffusion_function=ou_diffusion, initial_value=0, time_grid=np.linspace(0, 1, 1000))
simulation = sde.simulate()
            

Extensibility

Benefits

Optional Exercise: Deriving Regression Coefficients and Relationship with \( R^2 \)

Deriving the Least Squares Regression Coefficients

For a simple linear regression model:

\( y = a + b x + \varepsilon \)

Where:

Coefficient \( b \) (Slope)

Minimize the sum of squared errors:

\( S = \sum_{i=1}^{n} (y_i - a - b x_i)^2 \)

Take partial derivatives with respect to \( a \) and \( b \), set them to zero:

With respect to \( b \):

\( \dfrac{\partial S}{\partial b} = -2 \sum_{i=1}^{n} x_i (y_i - a - b x_i) = 0 \)

With respect to \( a \):

\( \dfrac{\partial S}{\partial a} = -2 \sum_{i=1}^{n} (y_i - a - b x_i) = 0 \)

Solve the normal equations:

\( \begin{cases} \sum y_i = n a + b \sum x_i \\ \sum x_i y_i = a \sum x_i + b \sum x_i^2 \end{cases} \)

Express \( a \) and \( b \):

\( b = \dfrac{\sum (x_i - \overline{x})(y_i - \overline{y})}{\sum (x_i - \overline{x})^2} \)

\( a = \overline{y} - b \overline{x} \)

Relationship with \( R^2 \) (Coefficient of Determination)

\( R^2 \) measures the proportion of variance in \( y \) explained by \( x \):

\( R^2 = \dfrac{SSR}{SST} = 1 - \dfrac{SSE}{SST} \)

Where:

Connection between \( R^2 \) and Correlation Coefficient \( r \)

In simple linear regression:

\( R^2 = r^2 \)

Where \( r \) is the Pearson correlation coefficient between \( x \) and \( y \).

Conclusion

This article provided:

Understanding these concepts deepens our comprehension of statistical principles and enhances our ability to model and analyze complex systems.