Analyzing Sampling Distributions and Cryptographic Functions Using JavaScript

Introduction

In this article, we will explore two main topics:

Sampling Distributions: Using a discrete probability distribution, we will generate multiple samples of varying sizes, compute the distribution of the sampling averages, and analyze the relationship between the sample statistics and the parent distribution's parameters.
Cryptographic Functions and Distributions: We will examine the distribution of the random variable \( Y = g^U \mod n \), where \( U \) is uniformly distributed, for different values of \( g \) and \( n \). We will analyze the implications of these distributions in terms of cryptographic properties like uniformity and predictability.

Part 1: Sampling Distributions from a Discrete Distribution

Objective

Generate \( m \) samples, each of size \( n \), from a given discrete distribution.
Compute the distribution of the sampling averages.
Determine the mean and variance of the distribution of the sample means.
Discuss the relationship between these statistics and those of the parent distribution.

Discrete Distribution Setup

We will use the same discrete distribution defined previously:

\( x \)	1	2	3	4	5
\( P(X = x) \)	0.1	0.2	0.4	0.2	0.1

Theoretical Mean (\( \mu \)):

\( \mu = E[X] = \sum_{i=1}^{5} x_i P(X = x_i) = 3 \)

Theoretical Variance (\( \sigma^2 \)):

\( \sigma^2 = E[(X - \mu)^2] = 1.2 \)

Generating Samples

We will generate \( m = 1000 \) samples of varying sizes \( n = 20, 30, 100 \).

Methodology

For each sample size \( n \), generate \( m \) samples.
For each sample, compute the sample mean.
Collect all the sample means to form the distribution of the sampling averages.

Implementation in JavaScript

We will use JavaScript to implement the simulation. The code can be run in a web browser or a JavaScript environment like Node.js. For visualization, we'll use a JavaScript plotting library such as Chart.js.

Code: Generating Samples and Computing Sample Means


// Define the discrete distribution
const xValues = [1, 2, 3, 4, 5];
const probabilities = [0.1, 0.2, 0.4, 0.2, 0.1];

// Compute the cumulative distribution function (CDF)
const cdf = probabilities.reduce((acc, prob, index) => {
    acc.push(prob + (acc[index - 1] || 0));
    return acc;
}, []);

// Function to generate a single sample of size n
function generateSample(n) {
    const sample = [];
    for (let i = 0; i < n; i++) {
        const u = Math.random();
        const xIndex = cdf.findIndex(cumProb => cumProb >= u);
        sample.push(xValues[xIndex]);
    }
    return sample;
}

// Function to generate m samples of size n and compute their means
function samplingDistribution(m, n) {
    const sampleMeans = [];
    for (let i = 0; i < m; i++) {
        const sample = generateSample(n);
        const mean = sample.reduce((sum, val) => sum + val, 0) / n;
        sampleMeans.push(mean);
    }
    return sampleMeans;
}

Computing the Sampling Distribution

For each \( n \), we compute the sampling distribution and visualize it.

Code: Visualizing the Sampling Distribution


// Sample sizes and number of samples
const sampleSizes = [20, 30, 100];
const m = 1000;

// Function to compute mean and variance
function computeStats(data) {
    const mean = data.reduce((sum, val) => sum + val, 0) / data.length;
    const variance = data.reduce((sum, val) => sum + Math.pow(val - mean, 2), 0) / data.length;
    return { mean, variance };
}

// Visualization using Chart.js
function plotSamplingDistributions() {
    sampleSizes.forEach((n, index) => {
        const sampleMeans = samplingDistribution(m, n);
        const { mean: meanOfMeans, variance: varianceOfMeans } = computeStats(sampleMeans);

        // Create histogram data
        const histogramData = {};
        sampleMeans.forEach(mean => {
            const bin = mean.toFixed(2);
            histogramData[bin] = (histogramData[bin] || 0) + 1;
        });

        const labels = Object.keys(histogramData);
        const data = Object.values(histogramData);

        // Create a canvas element for the chart
        const canvas = document.createElement('canvas');
        document.body.appendChild(canvas);

        // Create the chart
        new Chart(canvas.getContext('2d'), {
            type: 'bar',
            data: {
                labels: labels,
                datasets: [{
                    label: `Sample Size n=${n}`,
                    data: data,
                    backgroundColor: 'rgba(54, 162, 235, 0.7)',
                }]
            },
            options: {
                title: {
                    display: true,
                    text: `Sample Size n=${n}, Mean=${meanOfMeans.toFixed(2)}, Variance=${varianceOfMeans.toFixed(4)}`
                },
                scales: {
                    xAxes: [{ scaleLabel: { display: true, labelString: 'Sample Mean' } }],
                    yAxes: [{ scaleLabel: { display: true, labelString: 'Frequency' } }]
                }
            }
        });
    });
}

// Run the visualization
plotSamplingDistributions();

Note: To run this code, include the Chart.js library in your HTML file:


<script src="https://cdn.jsdelivr.net/npm/chart.js"></script>

Results and Observations

For each sample size \( n \), we observe the following:

Sample Size \( n = 20 \)

Mean of Sample Means: Approximately 3.0
Variance of Sample Means: Decreases compared to the parent variance.
Distribution Shape: Wider spread due to smaller sample size.

Sample Size \( n = 30 \)

Mean of Sample Means: Still around 3.0
Variance of Sample Means: Further decreases.
Distribution Shape: Slightly narrower.

Sample Size \( n = 100 \)

Mean of Sample Means: Close to 3.0
Variance of Sample Means: Significantly smaller.
Distribution Shape: Much narrower, resembling a normal distribution.

Relationship with the Parent Distribution

Mean: The mean of the sampling distribution (\( \mu_{\bar{X}} \)) equals the theoretical mean (\( \mu \)) of the parent distribution.

\( \mu_{\bar{X}} = \mu \)

Variance: The variance of the sampling distribution (\( \sigma^2_{\bar{X}} \)) is related to the parent variance (\( \sigma^2 \)) by:

\( \sigma^2_{\bar{X}} = \dfrac{\sigma^2}{n} \)

As \( n \) increases, \( \sigma^2_{\bar{X}} \) decreases, leading to a narrower distribution of sample means.

Conclusion

The sampling distribution of the sample means approaches a normal distribution due to the Central Limit Theorem, especially for larger \( n \).
The mean of the sampling distribution equals the parent mean.
The variance of the sampling distribution decreases with increasing sample size, inversely proportional to \( n \).

Part 2: Cryptographic Functions and Distributions

Objective

Generate the distribution of \( Y = g^U \mod n \), where \( U \) is uniformly distributed in \([1, \text{max}_U]\).
Analyze the distributions for specific values of \( g \) and \( n \).
Compute entropy or other diversity indexes.
Discuss implications for cryptographic properties.

Definitions

Modular Exponentiation: \( Y = g^U \mod n \)
Uniform Random Variable: \( U \sim \text{Uniform}(1, \text{max}_U) \)
Entropy: Measure of randomness in the distribution.

Parameters

Case A

\( n = 19 \)
\( g = 2, 3, 10, 17 \)
\( \text{max}_U = 1000 \)

Case B

\( n = 15 \)
\( g = 3, 6, 9, 12 \)
\( \text{max}_U = 1000 \)

Generating the Distributions

Code: Generating \( Y \) and Computing Entropy


// Function to generate Y distribution
function generateYDistribution(g, n, maxU) {
    const U = [];
    for (let i = 1; i <= maxU; i++) {
        U.push(i);
    }
    const Y = U.map(u => {
        return BigInt(g) ** BigInt(u) % BigInt(n);
    });
    return Y;
}

// Function to compute entropy
function computeEntropy(Y) {
    const counts = {};
    Y.forEach(y => {
        counts[y] = (counts[y] || 0) + 1;
    });
    const probabilities = Object.values(counts).map(count => count / Y.length);
    const entropy = -probabilities.reduce((sum, p) => sum + p * Math.log2(p), 0);
    return entropy;
}

Note: We use BigInt for large exponentiations to handle big numbers in JavaScript.

Case A: \( n = 19 \), \( g = 2, 3, 10, 17 \)

Code: Visualizing Distributions and Computing Entropy


// Parameters
const nA = 19;
const gValuesA = [2, 3, 10, 17];
const maxU = 1000;

// Visualization using Chart.js
function plotDistributionsCaseA() {
    gValuesA.forEach((g, index) => {
        const Y = generateYDistribution(g, nA, maxU);
        const entropy = computeEntropy(Y);

        // Count frequencies
        const counts = {};
        Y.forEach(y => {
            counts[y] = (counts[y] || 0) + 1;
        });

        const yValues = Object.keys(counts);
        const frequencies = yValues.map(y => counts[y]);

        // Create canvas element
        const canvas = document.createElement('canvas');
        document.body.appendChild(canvas);

        // Create chart
        new Chart(canvas.getContext('2d'), {
            type: 'bar',
            data: {
                labels: yValues,
                datasets: [{
                    label: `g = ${g}, Entropy = ${entropy.toFixed(2)}`,
                    data: frequencies,
                    backgroundColor: 'rgba(75, 192, 192, 0.7)',
                }]
            },
            options: {
                title: {
                    display: true,
                    text: `Distribution of Y for g = ${g}, n = ${nA}`
                },
                scales: {
                    xAxes: [{ scaleLabel: { display: true, labelString: 'Y values' } }],
                    yAxes: [{ scaleLabel: { display: true, labelString: 'Frequency' } }]
                }
            }
        });
    });
}

// Run visualization for Case A
plotDistributionsCaseA();

Case B: \( n = 15 \), \( g = 3, 6, 9, 12 \)

Code: Visualizing Distributions and Computing Entropy


// Parameters
const nB = 15;
const gValuesB = [3, 6, 9, 12];

// Visualization using Chart.js
function plotDistributionsCaseB() {
    gValuesB.forEach((g, index) => {
        const Y = generateYDistribution(g, nB, maxU);
        const entropy = computeEntropy(Y);

        // Count frequencies
        const counts = {};
        Y.forEach(y => {
            counts[y] = (counts[y] || 0) + 1;
        });

        const yValues = Object.keys(counts);
        const frequencies = yValues.map(y => counts[y]);

        // Create canvas element
        const canvas = document.createElement('canvas');
        document.body.appendChild(canvas);

        // Create chart
        new Chart(canvas.getContext('2d'), {
            type: 'bar',
            data: {
                labels: yValues,
                datasets: [{
                    label: `g = ${g}, Entropy = ${entropy.toFixed(2)}`,
                    data: frequencies,
                    backgroundColor: 'rgba(255, 159, 64, 0.7)',
                }]
            },
            options: {
                title: {
                    display: true,
                    text: `Distribution of Y for g = ${g}, n = ${nB}`
                },
                scales: {
                    xAxes: [{ scaleLabel: { display: true, labelString: 'Y values' } }],
                    yAxes: [{ scaleLabel: { display: true, labelString: 'Frequency' } }]
                }
            }
        });
    });
}

// Run visualization for Case B
plotDistributionsCaseB();

Note: Ensure that the BigInt exponentiation and modulo operations are supported in your JavaScript environment.

Observations

Case A

Uniformity: The distributions for \( g = 2, 3, 10, 17 \) are more uniform.
Entropy: Higher entropy values, indicating greater unpredictability.
Cryptographic Suitability: Better for cryptographic applications due to uniformity and high entropy.

Case B

Patterns: The distributions for \( g = 3, 6, 9, 12 \) show noticeable patterns and less uniformity.
Entropy: Lower entropy values, indicating predictability.
Vulnerabilities: Potential weaknesses due to lower entropy and predictable outputs.

Computing Entropy and Diversity Indexes

Entropy measures the randomness in the distribution:

\( H(Y) = -\sum_{i} P(Y = y_i) \log_2 P(Y = y_i) \)

Higher Entropy: Indicates a more uniform distribution, desirable in cryptographic applications.
Lower Entropy: Suggests predictability and potential vulnerabilities.

Cryptographic Implications

Case A Advantages

High Entropy: Uniform distributions make it difficult to predict \( Y \) without knowing \( U \) or \( g \).
Primitive Roots: The chosen \( g \) values (2, 3, 10, 17) are primitive roots modulo 19, ensuring maximal cycle lengths and uniformity.
Security: Enhanced cryptographic strength due to unpredictability.

Case B Vulnerabilities

Low Entropy: Non-uniform distributions with discernible patterns.
Predictability: Easier to guess or infer \( Y \), reducing security.
Subgroup Attack Risks: \( g \) values are not primitive roots modulo 15, leading to smaller subgroup orders.

Reason for Choosing \( g = 2, 3, 10, 17 \) in Case A

- **Primitive Roots Modulo 19**: These values are generators of the multiplicative group modulo 19.

- **Maximal Periods**: Ensure that \( g^U \mod n \) cycles through all possible residues before repeating.

- **Uniform Distribution**: Leads to higher entropy and better cryptographic properties.

Spotting Errors in the Exercise

- **Possible Error**: Using a composite modulus \( n = 15 \) instead of a prime number.

- **Implication**: Composite moduli can introduce vulnerabilities due to the presence of factors and non-invertible elements.

- **Correction**: For cryptographic purposes, it is preferable to use a prime modulus to ensure a cyclic group and maximal entropy.

Note: The code snippets provided can be run in a web browser environment. Ensure that you include the necessary libraries and that your JavaScript environment supports the BigInt data type for large integer calculations.