Analyzing Sampling Distributions and Cryptographic Functions Using JavaScript


Introduction

In this article, we will explore two main topics:

  1. Sampling Distributions: Using a discrete probability distribution, we will generate multiple samples of varying sizes, compute the distribution of the sampling averages, and analyze the relationship between the sample statistics and the parent distribution's parameters.
  2. Cryptographic Functions and Distributions: We will examine the distribution of the random variable \( Y = g^U \mod n \), where \( U \) is uniformly distributed, for different values of \( g \) and \( n \). We will analyze the implications of these distributions in terms of cryptographic properties like uniformity and predictability.

Part 1: Sampling Distributions from a Discrete Distribution

Objective

Discrete Distribution Setup

We will use the same discrete distribution defined previously:

\( x \) 1 2 3 4 5
\( P(X = x) \) 0.1 0.2 0.4 0.2 0.1

Theoretical Mean (\( \mu \)):

\( \mu = E[X] = \sum_{i=1}^{5} x_i P(X = x_i) = 3 \)

Theoretical Variance (\( \sigma^2 \)):

\( \sigma^2 = E[(X - \mu)^2] = 1.2 \)

Generating Samples

We will generate \( m = 1000 \) samples of varying sizes \( n = 20, 30, 100 \).

Methodology

  1. For each sample size \( n \), generate \( m \) samples.
  2. For each sample, compute the sample mean.
  3. Collect all the sample means to form the distribution of the sampling averages.

Implementation in JavaScript

We will use JavaScript to implement the simulation. The code can be run in a web browser or a JavaScript environment like Node.js. For visualization, we'll use a JavaScript plotting library such as Chart.js.

Code: Generating Samples and Computing Sample Means


// Define the discrete distribution
const xValues = [1, 2, 3, 4, 5];
const probabilities = [0.1, 0.2, 0.4, 0.2, 0.1];

// Compute the cumulative distribution function (CDF)
const cdf = probabilities.reduce((acc, prob, index) => {
    acc.push(prob + (acc[index - 1] || 0));
    return acc;
}, []);

// Function to generate a single sample of size n
function generateSample(n) {
    const sample = [];
    for (let i = 0; i < n; i++) {
        const u = Math.random();
        const xIndex = cdf.findIndex(cumProb => cumProb >= u);
        sample.push(xValues[xIndex]);
    }
    return sample;
}

// Function to generate m samples of size n and compute their means
function samplingDistribution(m, n) {
    const sampleMeans = [];
    for (let i = 0; i < m; i++) {
        const sample = generateSample(n);
        const mean = sample.reduce((sum, val) => sum + val, 0) / n;
        sampleMeans.push(mean);
    }
    return sampleMeans;
}
            

Computing the Sampling Distribution

For each \( n \), we compute the sampling distribution and visualize it.

Code: Visualizing the Sampling Distribution


// Sample sizes and number of samples
const sampleSizes = [20, 30, 100];
const m = 1000;

// Function to compute mean and variance
function computeStats(data) {
    const mean = data.reduce((sum, val) => sum + val, 0) / data.length;
    const variance = data.reduce((sum, val) => sum + Math.pow(val - mean, 2), 0) / data.length;
    return { mean, variance };
}

// Visualization using Chart.js
function plotSamplingDistributions() {
    sampleSizes.forEach((n, index) => {
        const sampleMeans = samplingDistribution(m, n);
        const { mean: meanOfMeans, variance: varianceOfMeans } = computeStats(sampleMeans);

        // Create histogram data
        const histogramData = {};
        sampleMeans.forEach(mean => {
            const bin = mean.toFixed(2);
            histogramData[bin] = (histogramData[bin] || 0) + 1;
        });

        const labels = Object.keys(histogramData);
        const data = Object.values(histogramData);

        // Create a canvas element for the chart
        const canvas = document.createElement('canvas');
        document.body.appendChild(canvas);

        // Create the chart
        new Chart(canvas.getContext('2d'), {
            type: 'bar',
            data: {
                labels: labels,
                datasets: [{
                    label: `Sample Size n=${n}`,
                    data: data,
                    backgroundColor: 'rgba(54, 162, 235, 0.7)',
                }]
            },
            options: {
                title: {
                    display: true,
                    text: `Sample Size n=${n}, Mean=${meanOfMeans.toFixed(2)}, Variance=${varianceOfMeans.toFixed(4)}`
                },
                scales: {
                    xAxes: [{ scaleLabel: { display: true, labelString: 'Sample Mean' } }],
                    yAxes: [{ scaleLabel: { display: true, labelString: 'Frequency' } }]
                }
            }
        });
    });
}

// Run the visualization
plotSamplingDistributions();
            

Note: To run this code, include the Chart.js library in your HTML file:


<script src="https://cdn.jsdelivr.net/npm/chart.js"></script>
            

Results and Observations

For each sample size \( n \), we observe the following:

Sample Size \( n = 20 \)

Sample Size \( n = 30 \)

Sample Size \( n = 100 \)

Relationship with the Parent Distribution

Mean: The mean of the sampling distribution (\( \mu_{\bar{X}} \)) equals the theoretical mean (\( \mu \)) of the parent distribution.

\( \mu_{\bar{X}} = \mu \)

Variance: The variance of the sampling distribution (\( \sigma^2_{\bar{X}} \)) is related to the parent variance (\( \sigma^2 \)) by:

\( \sigma^2_{\bar{X}} = \dfrac{\sigma^2}{n} \)

As \( n \) increases, \( \sigma^2_{\bar{X}} \) decreases, leading to a narrower distribution of sample means.

Conclusion


Part 2: Cryptographic Functions and Distributions

Objective

Definitions

Parameters

Case A

Case B

Generating the Distributions

Code: Generating \( Y \) and Computing Entropy


// Function to generate Y distribution
function generateYDistribution(g, n, maxU) {
    const U = [];
    for (let i = 1; i <= maxU; i++) {
        U.push(i);
    }
    const Y = U.map(u => {
        return BigInt(g) ** BigInt(u) % BigInt(n);
    });
    return Y;
}

// Function to compute entropy
function computeEntropy(Y) {
    const counts = {};
    Y.forEach(y => {
        counts[y] = (counts[y] || 0) + 1;
    });
    const probabilities = Object.values(counts).map(count => count / Y.length);
    const entropy = -probabilities.reduce((sum, p) => sum + p * Math.log2(p), 0);
    return entropy;
}
            

Note: We use BigInt for large exponentiations to handle big numbers in JavaScript.

Case A: \( n = 19 \), \( g = 2, 3, 10, 17 \)

Code: Visualizing Distributions and Computing Entropy


// Parameters
const nA = 19;
const gValuesA = [2, 3, 10, 17];
const maxU = 1000;

// Visualization using Chart.js
function plotDistributionsCaseA() {
    gValuesA.forEach((g, index) => {
        const Y = generateYDistribution(g, nA, maxU);
        const entropy = computeEntropy(Y);

        // Count frequencies
        const counts = {};
        Y.forEach(y => {
            counts[y] = (counts[y] || 0) + 1;
        });

        const yValues = Object.keys(counts);
        const frequencies = yValues.map(y => counts[y]);

        // Create canvas element
        const canvas = document.createElement('canvas');
        document.body.appendChild(canvas);

        // Create chart
        new Chart(canvas.getContext('2d'), {
            type: 'bar',
            data: {
                labels: yValues,
                datasets: [{
                    label: `g = ${g}, Entropy = ${entropy.toFixed(2)}`,
                    data: frequencies,
                    backgroundColor: 'rgba(75, 192, 192, 0.7)',
                }]
            },
            options: {
                title: {
                    display: true,
                    text: `Distribution of Y for g = ${g}, n = ${nA}`
                },
                scales: {
                    xAxes: [{ scaleLabel: { display: true, labelString: 'Y values' } }],
                    yAxes: [{ scaleLabel: { display: true, labelString: 'Frequency' } }]
                }
            }
        });
    });
}

// Run visualization for Case A
plotDistributionsCaseA();
            

Case B: \( n = 15 \), \( g = 3, 6, 9, 12 \)

Code: Visualizing Distributions and Computing Entropy


// Parameters
const nB = 15;
const gValuesB = [3, 6, 9, 12];

// Visualization using Chart.js
function plotDistributionsCaseB() {
    gValuesB.forEach((g, index) => {
        const Y = generateYDistribution(g, nB, maxU);
        const entropy = computeEntropy(Y);

        // Count frequencies
        const counts = {};
        Y.forEach(y => {
            counts[y] = (counts[y] || 0) + 1;
        });

        const yValues = Object.keys(counts);
        const frequencies = yValues.map(y => counts[y]);

        // Create canvas element
        const canvas = document.createElement('canvas');
        document.body.appendChild(canvas);

        // Create chart
        new Chart(canvas.getContext('2d'), {
            type: 'bar',
            data: {
                labels: yValues,
                datasets: [{
                    label: `g = ${g}, Entropy = ${entropy.toFixed(2)}`,
                    data: frequencies,
                    backgroundColor: 'rgba(255, 159, 64, 0.7)',
                }]
            },
            options: {
                title: {
                    display: true,
                    text: `Distribution of Y for g = ${g}, n = ${nB}`
                },
                scales: {
                    xAxes: [{ scaleLabel: { display: true, labelString: 'Y values' } }],
                    yAxes: [{ scaleLabel: { display: true, labelString: 'Frequency' } }]
                }
            }
        });
    });
}

// Run visualization for Case B
plotDistributionsCaseB();
            

Note: Ensure that the BigInt exponentiation and modulo operations are supported in your JavaScript environment.

Observations

Case A

Case B

Computing Entropy and Diversity Indexes

Entropy measures the randomness in the distribution:

\( H(Y) = -\sum_{i} P(Y = y_i) \log_2 P(Y = y_i) \)

Cryptographic Implications

Case A Advantages

Case B Vulnerabilities

Reason for Choosing \( g = 2, 3, 10, 17 \) in Case A

- **Primitive Roots Modulo 19**: These values are generators of the multiplicative group modulo 19.

- **Maximal Periods**: Ensure that \( g^U \mod n \) cycles through all possible residues before repeating.

- **Uniform Distribution**: Leads to higher entropy and better cryptographic properties.

Spotting Errors in the Exercise

- **Possible Error**: Using a composite modulus \( n = 15 \) instead of a prime number.

- **Implication**: Composite moduli can introduce vulnerabilities due to the presence of factors and non-invertible elements.

- **Correction**: For cryptographic purposes, it is preferable to use a prime modulus to ensure a cyclic group and maximal entropy.


Note: The code snippets provided can be run in a web browser environment. Ensure that you include the necessary libraries and that your JavaScript environment supports the BigInt data type for large integer calculations.