Hypergeometric Distribution Interactive Calculator

Name: Hypergeometric Distribution Interactive Calculator
Author: Robbie Dickson

Sampling without replacement from a finite population breaks the assumptions behind the binomial distribution — once an item is drawn, the population composition changes, and every subsequent probability shifts. Use this Hypergeometric Distribution Calculator to calculate exact probabilities for discrete sampling scenarios using population size (N), successes in population (K), sample size (n), and observed successes (k). This matters in quality control lot inspection, pharmaceutical batch release, ecological mark-recapture studies, and clinical trial design — anywhere the sample represents a meaningful fraction of the total population. This page includes the PMF formula, a full worked example, theory on the finite population correction factor, and an FAQ covering approximation rules and Fisher's exact test.

What is the hypergeometric distribution?

The hypergeometric distribution gives you the probability of getting a specific number of successes when drawing items from a finite group — without putting each item back after you draw it. It accounts for the fact that removing items changes the odds for every draw that follows.

Simple Explanation

Imagine a box of 20 light bulbs where 5 are defective. You pull out 6 bulbs without looking and without putting any back. The hypergeometric distribution tells you the probability of getting exactly 2 defective bulbs in those 6 — not a rough estimate, an exact number. Each draw changes what's left in the box, so this distribution tracks that depletion as you go.

📐 Browse all 1000+ Interactive Calculators

Hypergeometric Distribution Diagram

Hypergeometric Distribution Interactive Calculator Technical Diagram

Hypergeometric Distribution Calculator

Calculation Mode:

Population Size (N):

Successes in Population (K):

Sample Size (n):

Successes in Sample (k):

📹 Video Walkthrough — How to Use This Calculator

Hypergeometric Distribution Interactive Calculator

How to Use This Calculator

Select a Calculation Mode — choose from exact probability, cumulative, complement, range, expected value, or required sample size.
Enter Population Size (N) and Successes in Population (K) — these define your finite population.
Enter Sample Size (n) and Successes in Sample (k), or the range values (a, b), or the target probability and minimum successes depending on the mode selected.
Click Calculate to see your result.

Simple Example

Population size (N) = 50, successes in population (K) = 10, sample size (n) = 8, successes in sample (k) = 2, mode = Exact Probability.

Result: P(X = 2) ≈ 0.2988 — roughly a 30% chance of drawing exactly 2 successes in a sample of 8 from that population.

Hypergeometric Distribution Interactive Visualizer

Watch how sampling without replacement affects probability calculations. See the exact probability distributions and understand why depletion matters in finite populations.

Population Size (N) 50

Successes in Pop. (K) 15

Sample Size (n) 10

Target Successes (k) 3

EXACT PROBABILITY

0.2384

EXPECTED VALUE

3.00

VARIANCE

1.63

FIRGELLI Automations — Interactive Engineering Calculators

Hypergeometric Distribution Equations

Probability Mass Function (PMF)

Use the formula below to calculate the exact probability of k successes in a hypergeometric sampling scenario.

P(X = k) = [C(K, k) × C(N − K, n − k)] / C(N, n)

Where:

N = Total population size (dimensionless)
K = Number of success states in population (dimensionless)
n = Number of draws (sample size) (dimensionless)
k = Number of observed successes in sample (dimensionless)
C(n, r) = Binomial coefficient "n choose r" = n! / [r!(n−r)!]

Expected Value and Variance

Use the formula below to calculate the expected value and variance of the hypergeometric distribution.

E(X) = n × (K / N)

Var(X) = n × (K / N) × [(N − K) / N] × [(N − n) / (N − 1)]

Where:

E(X) = Expected number of successes in sample (dimensionless)
Var(X) = Variance of the distribution (dimensionless)
(N − n) / (N − 1) = Finite population correction factor

Valid Range for k

Use the formula below to calculate the allowable bounds on k before running a probability calculation.

max(0, n − (N − K)) ≤ k ≤ min(n, K)

This constraint ensures physically possible sampling outcomes. The lower bound prevents attempting to draw more items from one category than exist, while the upper bound limits successes to either the sample size or total available successes, whichever is smaller.

Theory & Engineering Applications

The hypergeometric distribution describes the probability of k successes in n draws without replacement from a finite population of size N containing exactly K objects classified as successes. Unlike the binomial distribution where trials are independent and success probability remains constant, the hypergeometric model accounts for the depletion effect where each draw changes the composition of the remaining population. This fundamental difference makes the hypergeometric distribution essential for quality control, acceptance sampling, and any scenario where sample size represents a substantial fraction of the population.

Statistical Foundation and Combinatorial Structure

The hypergeometric probability mass function derives directly from counting principles. The numerator counts favorable outcomes: C(K, k) ways to select k successes from K available successes, multiplied by C(N − K, n − k) ways to select the remaining n − k items from the N − K failures. The denominator C(N, n) represents all possible ways to draw n items from N total items. This ratio provides the exact probability under the assumption of random sampling without replacement.

The finite population correction factor (N − n)/(N − 1) appears in the variance formula and quantifies how sampling without replacement reduces variability compared to sampling with replacement. As N approaches infinity while K/N remains constant, the hypergeometric distribution converges to the binomial distribution with parameters n and p = K/N. This convergence typically provides acceptable approximations when n/N is less than 0.05, though engineering applications often require exact calculations for regulatory compliance.

Quality Control and Acceptance Sampling

Manufacturing quality assurance extensively employs hypergeometric calculations for lot acceptance sampling. Consider a shipment of 500 precision bearings where the purchaser's specification allows maximum 2% defects. The inspector randomly samples 50 bearings using destructive testing. If the lot actually contains 15 defective bearings (3%), the probability of accepting the lot (finding 0 or 1 defects in the sample) determines the operating characteristic curve of the sampling plan.

The hypergeometric model provides exact probabilities for these scenarios where lot size is finite and sampling represents a significant fraction. Military Standard 105E and ISO 2859 sampling tables historically used hypergeometric calculations for single sampling plans, though modern standards often employ the Poisson approximation for computational simplicity. However, for critical aerospace, medical device, or nuclear components, engineers must use exact hypergeometric probabilities to certify compliance with stringent quality requirements.

Reliability Engineering and Component Testing

Systems reliability analysis employs hypergeometric distributions when evaluating redundant component configurations with limited spares inventory. A telecommunications switching station maintains 12 backup power modules, of which 3 are unknowingly defective due to a manufacturing batch issue. When a power surge event requires activation of 5 backup modules simultaneously, the hypergeometric distribution calculates the probability that sufficient functional modules activate to maintain service.

This scenario differs fundamentally from Poisson or exponential reliability models because the population (12 modules) is finite and fixed. The probability that at least 4 of the 5 activated modules function properly determines system availability. Engineers designing redundancy must account for common-mode failures creating finite pools of potentially defective components, making hypergeometric analysis essential for accurate system reliability prediction.

Ecological Sampling and Mark-Recapture Methods

Wildlife biologists use hypergeometric distributions in mark-recapture population estimation. Researchers capture and tag K animals from an unknown population of size N, then release them. In a second sampling event, n animals are captured, of which k are tagged. The hypergeometric probability P(X = k | N, K, n) forms the likelihood function for estimating N using maximum likelihood estimation. The Peterson-Lincoln index N̂ = (K × n) / k provides a point estimate, but the hypergeometric distribution supplies confidence intervals and hypothesis tests.

This application demonstrates a critical feature: the hypergeometric distribution provides exact probabilities even when sampling represents a large fraction of the population. Traditional normal approximations fail when n/N exceeds 0.1, yet ecological studies frequently involve such intensive sampling. The exact hypergeometric calculation remains valid regardless of sampling fraction, though computational challenges arise for large populations where binomial coefficients overflow standard floating-point arithmetic.

Network Security and Intrusion Detection

Cybersecurity analysts apply hypergeometric statistics to anomaly detection in network traffic. A firewall monitors 10,000 connection attempts, of which an unknown number K are malicious. An intrusion detection system flags n = 150 connections as suspicious based on heuristic rules. If security analysts manually investigate a random sample of 20 flagged connections and find k = 8 are genuinely malicious, hypergeometric inference estimates the precision of the detection algorithm and the total number of undetected threats.

This inverse problem—estimating K given observations of n, k, and assuming N—requires Bayesian methods with hypergeometric likelihoods. The solution provides critical security metrics: false positive rate, detection efficiency, and expected number of undetected intrusions. Unlike continuous statistical distributions, the discrete nature of network events and finite observation windows make hypergeometric models more appropriate than Gaussian approximations.

Worked Example: Pharmaceutical Quality Control

A pharmaceutical manufacturer produces a batch of N = 2,400 vaccine vials. Regulatory requirements mandate sampling inspection before release. Historical process capability suggests approximately 48 vials (2%) may have fill volumes below specification due to nozzle wear. The quality control plan calls for randomly sampling n = 89 vials and measuring fill volume with destructive testing.

Problem: Calculate (a) the probability of finding exactly 3 defective vials, (b) the probability of finding 3 or fewer defects, and (c) the expected number of defects and standard deviation.

Solution Part A - Exact Probability:

Given: N = 2,400, K = 48, n = 89, k = 3

First verify k is in the valid range:
min(k) = max(0, 89 − (2400 − 48)) = max(0, −2263) = 0
max(k) = min(89, 48) = 48
Since 0 ≤ 3 ≤ 48, the calculation is valid.

Calculate binomial coefficients:
C(48, 3) = 48! / (3! × 45!) = (48 × 47 × 46) / (3 × 2 × 1) = 17,296
C(2352, 86) = very large number requiring logarithmic computation
C(2400, 89) = very large number requiring logarithmic computation

For practical calculation with large numbers, use logarithms:
ln[P(X = 3)] = ln[C(48,3)] + ln[C(2352,86)] − ln[C(2400,89)]

Using computational tools or Stirling's approximation:
P(X = 3) ≈ 0.2273 or 22.73%

Solution Part B - Cumulative Probability:

P(X ≤ 3) = P(X = 0) + P(X = 1) + P(X = 2) + P(X = 3)

Computing each term:
P(X = 0) ≈ 0.1342
P(X = 1) ≈ 0.2537
P(X = 2) ≈ 0.2396
P(X = 3) ≈ 0.2273

P(X ≤ 3) ≈ 0.8548 or 85.48%

This indicates an 85.48% probability that the inspection will find 3 or fewer defective vials, likely resulting in lot acceptance if the acceptance number is 3.

Solution Part C - Expected Value and Variance:

Expected number of defects:
E(X) = n × (K / N) = 89 × (48 / 2400) = 89 × 0.02 = 1.78 vials

Variance calculation:
Var(X) = n × (K/N) × [(N - K)/N] × [(N−n)/(N−1)]
Var(X) = 89 × (48/2400) × (2352/2400) × (2311/2399)
Var(X) = 89 × 0.02 × 0.98 × 0.9633
Var(X) = 1.6805

Standard deviation:
σ = √1.6805 = 1.296 vials

The finite population correction factor (N−n)/(N−1) = 2311/2399 = 0.9633 reduces variance by about 3.67% compared to binomial sampling. This correction becomes negligible only when N is much larger than n, demonstrating why exact hypergeometric calculations matter for this batch size.

Interpretation: On average, inspectors expect to find 1.78 defective vials with a standard deviation of 1.30 vials. The probability mass concentrates between 0 and 4 defects, with finding exactly 1 or 2 defects most likely. If the acceptance criterion allows up to 3 defects, the lot has an 85.48% probability of passing inspection despite containing 2% defects—a key metric for designing sampling plans that balance consumer risk and producer risk.

For more statistical and mathematical tools, explore our complete collection at the engineering calculator library.

Practical Applications

Scenario: Electronics Manufacturing Quality Audit

Marcus, a quality engineer at a semiconductor facility, receives a shipment of 5,000 microcontrollers from a new supplier. The purchase agreement specifies maximum 0.5% defect rate, but Marcus needs to verify this without testing all units. He randomly selects 200 chips for electrical parameter testing. Using the hypergeometric calculator with N=5000, assumed K=25 (0.5%), and n=200, he determines that finding 2 or more defects gives only 8.6% probability if the true defect rate meets specification. When testing reveals 3 defective units, Marcus calculates the lot likely contains 60-90 defects (1.2-1.8%), substantially exceeding the contract specification. This statistical evidence supports rejecting the shipment and renegotiating with the supplier, potentially saving the company from field failures worth millions in warranty claims.

Scenario: Clinical Trial Patient Selection

Dr. Sarah Chen designs a phase III clinical trial for a diabetes medication requiring 120 participants from a registry of 800 eligible patients. The registry includes 240 patients with advanced kidney disease who require different dosing protocols. To ensure the trial sample represents the registry population, Dr. Chen uses hypergeometric probability calculations to determine expected distributions. With N=800, K=240, n=120, she calculates the probability of randomly selecting various numbers of kidney disease patients. The expected value of 36 patients (30%) with kidney complications matches the registry proportion, but she calculates there's a 15.3% chance of getting 42 or more such patients, which would bias trial results. This statistical analysis prompts Dr. Chen to implement stratified random sampling instead of pure random selection, ensuring the trial accurately represents the target patient population and regulatory agencies will accept the results.

Scenario: Fisheries Resource Management

James, a marine biologist, needs to estimate the population of endangered steelhead trout in a remote river system for conservation planning. His team captures, tags, and releases K=156 fish during a spring survey. Three weeks later, they capture n=89 fish and find k=12 are tagged. Using the hypergeometric distribution with these values, James applies maximum likelihood estimation to calculate the most probable total population. The calculator helps him determine that N≈1,157 fish provides the highest probability for observing exactly 12 recaptures. More importantly, by computing probabilities for different N values, he constructs a 95% confidence interval of 890-1,520 fish. This population estimate reveals the river system supports a viable breeding population above the 750-fish conservation threshold, allowing the watershed council to proceed with habitat restoration rather than expensive captive breeding programs.

Frequently Asked Questions

▼ When should I use the hypergeometric distribution instead of the binomial distribution?

▼ How do I handle very large population sizes where binomial coefficients overflow?

▼ What is the finite population correction factor and why does it matter?

▼ Can the hypergeometric distribution be used for multi-category populations?

▼ How accurate are normal approximations to the hypergeometric distribution?

▼ What is the relationship between hypergeometric distribution and Fisher's exact test?

Free Engineering Calculators

Explore our complete library of free engineering and physics calculators.

Browse All Calculators →

🔗 Explore More Free Engineering Calculators

Browse All Engineering Calculators →

About the Author

Robbie Dickson — Chief Engineer & Founder, FIRGELLI Automations

Robbie Dickson brings over two decades of engineering expertise to FIRGELLI Automations. With a distinguished career at Rolls-Royce, BMW, and Ford, he has deep expertise in mechanical systems, actuator technology, and precision engineering.

Wikipedia · Full Bio