Statistic example_ Market Research Reliability Assessment

Jun 4, 2025

6 min read

Welcome back to the AI Bayeslab Statistics series.

Today, we continue to explore statistics on sampling distributions, specifically examining situations where sample proportions are used to make inferences about population proportions. The design of the case is as follows:

1. Case Introduction: Market Research Reliability Assessment

A company conducts customer satisfaction surveys in two regions:

Region X: n₁ = 100, p̂ ₁ = 69%

Region Y: n₂ = 100, p̂ ₂ = 60.5%

Objective:

Before comparing proportions, we must check if the variances in satisfaction levels are equal (variance homogeneity test). This ensures data can be pooled for further analysis.

As illustrated in the previous example, we frequently face the decision of endorsing a specific attitude in real life. Our primary focus today is on how to statistically analyze satisfaction survey results that yield a "yes" or "no" outcome. This type of data relates closely to the binomial distribution, as the results either exhibit a particular characteristic or they do not, similar to the two regions X and Y.

Note: Formulas related to the binomial distribution

Expected value: E(X) = np

Variance: D(X) = np(1-p)

Suppose the satisfaction level variances between Region X and Region Y are found to be equal. In that case, it indicates that the satisfaction levels in both regions are comparable, showing no significant difference. In such scenarios, data from both areas can be merged for additional analysis. A test for homogeneity of variances is used to assess whether the satisfaction levels in the two regions are equal.

Additionally, we can assess the differences in satisfaction levels between the two regions to determine if there is no significant variation in the satisfaction proportions. By estimating at a specified significance level, we calculate the confidence interval(CI).

If this interval includes zero, it indicates that there is no significant difference in the overall proportions of the two groups.

If both endpoints are negative, it implies that (p₁ < p₂);

Conversely, if both endpoints are positive, it suggests (p₁ > p₂).

This method of estimating the significance level by confidence interval similarly applies to the mean in hypothesis testing.

2. Statistical Background: Binomial Proportions & Normal Approximation Condition

Since survey outcomes are binary ("satisfied" or "not satisfied"), the data follow a binomial distribution:

Expected value (mean): E(X) = np

Variance: D(X) = np(1-p)

Sample proportion variance:

The latex is \text{Var}(\hat{p}) = \frac{p(1-p)}{n}

Normal Approximation Condition:

For large samples ( p̂ ₁ > 5 and n(1-p̂ ) > 5), the binomial distribution approximates a normal distribution:

p̂ \sim N \left( p, \frac{p(1-p)}{n} \right)

Here is the complete explanation and formulas regarding the sampling distribution of proportions, expected values, and variances, conditions for normal approximation, and confidence intervals:

1) Definition of Symbols

Population proportion: p (unknown parameter)

Sample proportion: p'=X / n, where X is the number of individuals with a particular characteristic in the sample, and n is the sample size.

Random variable X: Follows a binomial distribution B(n, p), i.e., X∼B(n,p).

2) Expected Value and Variance

Expected value and variance of X :

E(X) = np

D(X) = np(1-p)

Expected value and variance of the sample proportion p' :

E(p') = E\left(\frac{X}{n}\right) = \frac{1}{n}E(X) = p

D(p') = D\left(\frac{X}{n}\right) = \frac{1}{n^2}D(X) = \frac{p(1-p)}{n}

Standard deviation (standard error):

\sigma_{p'} = \sqrt{\frac{p(1-p)}{n}}

3) Conditions for Normal Approximation

When the sample size is sufficiently large, the binomial distribution can be approximated by a normal distribution.

As the AI block visualization example:

The bar chart represents the binomial distribution

The Curve representing the normal distribution curve

As the sample size increases, when n = 100, we can see there is an overlapping shape with the binomial distribution and the normal curve.

Typically, we obtain a specific calculation criterion in the proportion hypothesis testing under the following conditions:

np' > 5 \quad \text{and} \quad n(1-p') > 5

Under these conditions:

X approximately follows a normal distribution : N(np,np(1-p)).

p' approximately follows a normal distribution :N(p,(p(1-p))/n).

N(np, np(1-p))

N\left(p, \frac{p(1-p)}{n}\right)

4) Confidence Interval ("1-α" Confidence Level)

Under the normal approximation conditions, the confidence interval for the population proportion p is:

p' \pm z_{\alpha/2} \cdot \sqrt{\frac{p'(1-p')}{n}}

Where:

Zₐ/₂ is the critical value from the standard normal distribution (e.g., for α = 0.05 , Z_0.025≈1.96).

Interval formula:

\left[ p' - z_{\alpha/2} \sqrt{\frac{p'(1-p')}{n}},\; p' + z_{\alpha/2} \sqrt{\frac{p'(1-p')}{n}} \right]

5) Notes

Continuity correction: For small samples or when p is close to 0 or 1, consider Yates' continuity correction (click the link for more details on Yates' continuity correction).