This article covers three fundamental topics in statistics and probability theory:
Additionally, an optional exercise explores the derivation of regression coefficients using the least squares method and their relationship with the coefficient of determination, \( R^2 \).
The Cauchy-Schwarz inequality is a fundamental result in linear algebra and analysis. In the context of real-valued sequences or vectors, it states that for any two sequences \( \{x_i\} \) and \( \{y_i\} \):
\( \left( \sum_{i=1}^{n} x_i y_i \right)^2 \leq \left( \sum_{i=1}^{n} x_i^2 \right) \left( \sum_{i=1}^{n} y_i^2 \right) \)
Consider any real numbers \( x_i \) and \( y_i \) for \( i = 1, 2, \dots, n \). Define the real-valued function:
\( f(t) = \sum_{i=1}^{n} (x_i - t y_i)^2 \)
Since \( f(t) \geq 0 \) for all real \( t \), it must reach its minimum at some real value of \( t \). Expand \( f(t) \):
\( f(t) = \sum_{i=1}^{n} (x_i^2 - 2 x_i y_i t + y_i^2 t^2 ) = \sum_{i=1}^{n} x_i^2 - 2t \sum_{i=1}^{n} x_i y_i + t^2 \sum_{i=1}^{n} y_i^2 \)
Consider \( f(t) \) as a quadratic function in \( t \):
\( f(t) = A t^2 - 2B t + C \)
Where:
Since \( f(t) \geq 0 \) for all \( t \), the discriminant of this quadratic must be non-positive:
\( \Delta = (-2B)^2 - 4AC = 4B^2 - 4AC \leq 0 \)
Simplify:
\( 4B^2 - 4AC \leq 0 \quad \Rightarrow \quad B^2 \leq AC \)
Therefore:
\( \left( \sum_{i=1}^{n} x_i y_i \right)^2 \leq \left( \sum_{i=1}^{n} x_i^2 \right) \left( \sum_{i=1}^{n} y_i^2 \right) \)
In statistics, the Cauchy-Schwarz inequality underpins the fact that the absolute value of the Pearson correlation coefficient \( r \) does not exceed 1.
The Pearson correlation coefficient is defined as:
\( r = \dfrac{\sum_{i=1}^{n} (x_i - \overline{x})(y_i - \overline{y})}{\sqrt{\sum_{i=1}^{n} (x_i - \overline{x})^2 \sum_{i=1}^{n} (y_i - \overline{y})^2}} \)
Applying the Cauchy-Schwarz inequality to centered variables \( x_i - \overline{x} \) and \( y_i - \overline{y} \), we ensure that \( -1 \leq r \leq 1 \).
Statistical Independence: Two random variables \( X \) and \( Y \) are independent if the occurrence of one does not affect the probability distribution of the other. Formally:
\( P(X = x \text{ and } Y = y) = P(X = x) \times P(Y = y) \)
Uncorrelation: Two random variables \( X \) and \( Y \) are uncorrelated if their covariance is zero:
\( \text{Cov}(X, Y) = E[(X - E[X])(Y - E[Y])] = 0 \)
Let \( X \) and \( Y \) be independent random variables with finite variances. Then \( \text{Cov}(X, Y) = 0 \).
Let \( X \) be a standard normal variable, and define \( Y = X^2 \). Then \( E[X] = 0 \), \( E[Y] = E[X^2] = 1 \), but \( \text{Cov}(X, Y) = E[XY] - E[X]E[Y] = 0 - 0 \times 1 = 0 \).
\( X \) and \( Y \) are uncorrelated but clearly dependent since \( Y \) is determined by \( X \).
Covariance measures the joint variability of two random variables. Pearson Correlation Coefficient \( r \) quantifies the strength and direction of the linear relationship.
Checking for zero covariance is insufficient for independence. Skewness and kurtosis can reveal non-linear dependencies.
A measure from information theory that quantifies the total amount of information shared between variables. \( I(X; Y) = 0 \) if and only if \( X \) and \( Y \) are independent.
For categorical data, assesses whether observed frequencies differ from expected frequencies under independence.
The Euler-Maruyama method is a numerical technique used to approximate solutions to stochastic differential equations (SDEs). Enhancing the simulator involves creating a general framework that can handle various types of SDEs.
class SDE:
def __init__(self, drift_function, diffusion_function, initial_value, time_grid):
self.drift = drift_function
self.diffusion = diffusion_function
self.x0 = initial_value
self.time_grid = time_grid
def simulate(self):
x = np.zeros(len(self.time_grid))
x[0] = self.x0
dt = self.time_grid[1] - self.time_grid[0]
for i in range(1, len(self.time_grid)):
t = self.time_grid[i-1]
x[i] = x[i-1] + self.drift(x[i-1], t) * dt + self.diffusion(x[i-1], t) * np.sqrt(dt) * np.random.normal()
return x
Ornstein-Uhlenbeck Process:
def ou_drift(x, t):
theta = 0.7
mu = 1.5
return theta * (mu - x)
def ou_diffusion(x, t):
sigma = 0.2
return sigma
sde = SDE(drift_function=ou_drift, diffusion_function=ou_diffusion, initial_value=0, time_grid=np.linspace(0, 1, 1000))
simulation = sde.simulate()
For a simple linear regression model:
\( y = a + b x + \varepsilon \)
Where:
Minimize the sum of squared errors:
\( S = \sum_{i=1}^{n} (y_i - a - b x_i)^2 \)
Take partial derivatives with respect to \( a \) and \( b \), set them to zero:
\( \dfrac{\partial S}{\partial b} = -2 \sum_{i=1}^{n} x_i (y_i - a - b x_i) = 0 \)
\( \dfrac{\partial S}{\partial a} = -2 \sum_{i=1}^{n} (y_i - a - b x_i) = 0 \)
Solve the normal equations:
\( \begin{cases} \sum y_i = n a + b \sum x_i \\ \sum x_i y_i = a \sum x_i + b \sum x_i^2 \end{cases} \)
Express \( a \) and \( b \):
\( b = \dfrac{\sum (x_i - \overline{x})(y_i - \overline{y})}{\sum (x_i - \overline{x})^2} \)
\( a = \overline{y} - b \overline{x} \)
\( R^2 \) measures the proportion of variance in \( y \) explained by \( x \):
\( R^2 = \dfrac{SSR}{SST} = 1 - \dfrac{SSE}{SST} \)
Where:
In simple linear regression:
\( R^2 = r^2 \)
Where \( r \) is the Pearson correlation coefficient between \( x \) and \( y \).
This article provided:
Understanding these concepts deepens our comprehension of statistical principles and enhances our ability to model and analyze complex systems.