Hypothesis Testing Calculator

Perform hypothesis tests using z-test, t-test, chi-square, or proportion test methods. Determine whether to reject or fail to reject the null hypothesis based on your sample data and chosen significance level. See also our T-Test Calculator, Z-Test Calculator, P-Value Calculator, and Confidence Interval Calculator.

Test Type:

Sample Mean (x̄)

Population Mean (μ₀)

Population Std Dev (σ)

Sample Size (n)

Significance Level (α)

How to Use the Hypothesis Testing Calculator

Hypothesis testing is a fundamental statistical method used to make decisions about population parameters based on sample data. The process involves formulating a null hypothesis (H₀) that represents the status quo and an alternative hypothesis (H₁) that represents what you want to prove. This calculator supports four common test types: z-test for known population standard deviation, t-test for unknown population standard deviation, chi-square test for categorical data, and proportion test for binary outcomes.

To use this calculator, first select the appropriate test type based on your data and research question. For z-tests and t-tests, enter the sample mean, hypothesized population mean, standard deviation, and sample size. For chi-square tests, enter the test statistic and degrees of freedom. For proportion tests, enter the sample proportion, hypothesized population proportion, and sample size. Choose your significance level (α), which represents the probability of rejecting a true null hypothesis (Type I error).

After clicking Calculate, the tool computes the test statistic, critical value, p-value, and provides a clear decision. If the p-value is less than α, you reject the null hypothesis, indicating statistically significant evidence for the alternative hypothesis. If the p-value is greater than or equal to α, you fail to reject the null hypothesis, meaning there is insufficient evidence to support the alternative. Remember that failing to reject H₀ does not prove H₀ is true — it simply means the data does not provide strong enough evidence against it.

Hypothesis Testing Formulas

Z-Test (known σ):

z = (x̄ - μ₀) / (σ / √n)

T-Test (unknown σ):

t = (x̄ - μ₀) / (s / √n)

df = n - 1

Chi-Square Test:

χ² = Σ [(Oᵢ - Eᵢ)² / Eᵢ]

df = (rows - 1)(cols - 1) or k - 1

Proportion Test:

z = (p̂ - p₀) / √(p₀(1-p₀)/n)

Decision Rule:

Reject H₀ if |test statistic| > critical value

Or equivalently: Reject H₀ if p-value < α

Example Calculation

A quality control manager wants to test whether the average weight of cereal boxes differs from the advertised 500g. A random sample of 30 boxes has a mean weight of 498g with a known population standard deviation of 10g. Test at α = 0.05.

Given: x̄ = 498, μ₀ = 500, σ = 10, n = 30, α = 0.05

H₀: μ = 500 (boxes weigh 500g on average)

H₁: μ ≠ 500 (boxes do not weigh 500g on average)

SE = σ/√n = 10/√30 = 1.8257

z = (498 - 500) / 1.8257 = -1.0954

Critical value (two-tailed): ±1.960

P-value = 2 × P(Z > 1.0954) = 2 × 0.1367 = 0.2733

Since |z| = 1.0954 < 1.960 and p = 0.2733 > 0.05:

Decision: Fail to reject H₀

There is insufficient evidence that the mean weight differs from 500g.

Critical Values Reference Table

α (two-tailed)	Z Critical Value	Confidence Level
0.10	±1.645	90% confidence
0.05	±1.960	95% confidence
0.025	±2.241	97.5% confidence
0.01	±2.576	99% confidence
0.005	±2.807	99.5% confidence
0.001	±3.291	99.9% confidence

Step-by-Step Decision Process

Define the research question: What claim are you testing? Is it about a mean, proportion, variance, or association?
Set up H0 and H1: H0 contains the equality (=). H1 reflects what you want to prove (≠, >, or <).
Choose significance level: alpha = 0.05 is standard. Use 0.01 for high-stakes decisions or 0.10 for exploratory research.
Select the appropriate test: t-test for means, z-test for proportions, chi-square for categories, ANOVA for 3+ groups, F-test for variances.
Collect data and compute test statistic: The test statistic measures how far the sample result is from H0.
Determine p-value or compare to critical value: Both approaches give the same conclusion.
State your conclusion in context: Reject or fail to reject H0, and explain what this means for the research question.
Consider Type I and Type II errors: Type I (false positive) = rejecting true H0. Type II (false negative) = failing to reject false H0.

Additional Solved Examples

Example: Two-Tailed Test for Customer Satisfaction

A company claims average satisfaction is 7.5/10. A survey of 50 customers gives mean = 7.1 with SD = 1.8. Test the claim at alpha = 0.05.

H0: mu = 7.5, H1: mu ≠ 7.5 (two-tailed)

SE = 1.8/sqrt(50) = 1.8/7.071 = 0.2546

t = (7.1 - 7.5)/0.2546 = -0.4/0.2546 = -1.571

df = 49, t-critical (two-tail, 0.05) = +/-2.010

Since |-1.571| < 2.010, fail to reject H0

p-value = 0.123

Answer: There is insufficient evidence to reject the company's claim (t(49) = -1.57, p = 0.123). The observed mean of 7.1 is not significantly different from the claimed 7.5. Note: this does not prove the claim is true.

Example: One-Tailed Test for Manufacturing Defects

A factory claims defect rate is at most 3%. Inspection of 500 items finds 22 defectives (4.4%). Test at alpha = 0.05.

H0: p ≤ 0.03, H1: p > 0.03 (right-tailed)

p_hat = 22/500 = 0.044

SE = sqrt(0.03 x 0.97/500) = sqrt(0.0000582) = 0.00763

z = (0.044 - 0.03)/0.00763 = 0.014/0.00763 = 1.835

p-value = P(Z > 1.835) = 0.0332

Since 0.0332 < 0.05, reject H0

Answer: The evidence suggests the defect rate exceeds 3% (z = 1.84, p = 0.033). The factory should investigate its production process for quality issues.

Interpreting Results

Type I and Type II Errors

Decision	H0 True	H0 False
Reject H0	Type I Error (alpha)	Correct (Power = 1-beta)
Fail to Reject H0	Correct	Type II Error (beta)

Power = probability of correctly rejecting a false H0. Increase power by: increasing sample size, increasing alpha, or when the true effect is larger.

Key Takeaways

Hypothesis testing provides a structured framework for making decisions about population parameters using sample data.
The null hypothesis (H0) is assumed true until sufficient evidence (small p-value) contradicts it.
Statistical significance (p < alpha) does not imply practical importance - always consider effect size and context.
Type I error (false positive) is controlled by alpha. Type II error (missed effect) is reduced by increasing sample size.
Never say "accept H0" - the correct phrasing is "fail to reject H0" because we cannot prove the null is true.

Frequently Asked Questions

What is the difference between Type I and Type II errors?

A Type I error (false positive) occurs when you reject a true null hypothesis. The probability of a Type I error equals α (significance level). A Type II error (false negative) occurs when you fail to reject a false null hypothesis. The probability of a Type II error is denoted β. Power (1-β) is the probability of correctly rejecting a false H₀. Reducing α increases β, so there is always a trade-off between the two error types.

How do I choose the right significance level (α)?

The most common significance level is α = 0.05, which means a 5% chance of rejecting a true null hypothesis. Use α = 0.01 for more stringent testing (medical research, safety-critical applications). Use α = 0.10 for exploratory research where you want more power to detect effects. The choice depends on the consequences of each error type in your specific context.

When should I use a one-tailed vs two-tailed test?

Use a two-tailed test when you want to detect any difference from the hypothesized value (H₁: μ ≠ μ₀). Use a one-tailed test when you have a specific directional hypothesis (H₁: μ > μ₀ or H₁: μ < μ₀). One-tailed tests have more power in the specified direction but cannot detect effects in the opposite direction. Most researchers recommend two-tailed tests unless there is strong theoretical justification for a directional hypothesis.

What does "statistically significant" actually mean?

Statistical significance means the observed result is unlikely to have occurred by chance alone if the null hypothesis were true. It does NOT mean the result is practically important or that the effect is large. A very large sample can produce statistically significant results for trivially small effects. Always consider effect size, confidence intervals, and practical significance alongside p-values.

Can I use this calculator for A/B testing?

Yes. For A/B testing with conversion rates, use the proportion test. Enter the conversion rate of your test group as the sample proportion and the control group's rate as the population proportion. For continuous metrics (revenue, time on page), use the z-test or t-test. Ensure your sample size is large enough for reliable results — typically at least 100 observations per group for proportion tests.

What assumptions must be met for hypothesis testing?

Key assumptions include: (1) Random sampling from the population, (2) Independence of observations, (3) For z and t-tests: approximately normal distribution of the sample mean (satisfied by CLT for n ≥ 30), (4) For proportion tests: np₀ ≥ 5 and n(1-p₀) ≥ 5, (5) For chi-square: expected frequencies ≥ 5 in each cell. Violations of these assumptions can lead to incorrect conclusions.

Related Calculators

T-Test Z-Test P-Value Confidence Interval Chi-Square Test Normal Distribution