The hypergeometric distribution calculator computes probabilities for sampling without replacement from finite populations with two distinct categories. Unlike the binomial distribution, which assumes independence between trials, the hypergeometric distribution accounts for changing probabilities as items are drawn from a limited population. Engineers use this in quality control sampling, reliability testing, network analysis, and statistical hypothesis testing where sample sizes represent a significant fraction of the total population.
This calculator provides exact probability calculations for discrete sampling scenarios where the composition of success and failure states matters. The distribution appears frequently in acceptance sampling plans, biostatistics, ecological population studies, and manufacturing quality assurance where destructive testing or limited inventory prevents replacement.
📐 Browse all free engineering calculators
Table of Contents
Hypergeometric Distribution Diagram
Hypergeometric Distribution Calculator
Hypergeometric Distribution Equations
Probability Mass Function (PMF)
P(X = k) = [C(K, k) × C(N − K, n − k)] / C(N, n)
Where:
- N = Total population size (dimensionless)
- K = Number of success states in population (dimensionless)
- n = Number of draws (sample size) (dimensionless)
- k = Number of observed successes in sample (dimensionless)
- C(n, r) = Binomial coefficient "n choose r" = n! / [r!(n−r)!]
Expected Value and Variance
E(X) = n × (K / N)
Var(X) = n × (K / N) × [(N − K) / N] × [(N − n) / (N − 1)]
Where:
- E(X) = Expected number of successes in sample (dimensionless)
- Var(X) = Variance of the distribution (dimensionless)
- (N − n) / (N − 1) = Finite population correction factor
Valid Range for k
max(0, n − (N − K)) ≤ k ≤ min(n, K)
This constraint ensures physically possible sampling outcomes. The lower bound prevents attempting to draw more items from one category than exist, while the upper bound limits successes to either the sample size or total available successes, whichever is smaller.
Theory & Engineering Applications
The hypergeometric distribution describes the probability of k successes in n draws without replacement from a finite population of size N containing exactly K objects classified as successes. Unlike the binomial distribution where trials are independent and success probability remains constant, the hypergeometric model accounts for the depletion effect where each draw changes the composition of the remaining population. This fundamental difference makes the hypergeometric distribution essential for quality control, acceptance sampling, and any scenario where sample size represents a substantial fraction of the population.
Statistical Foundation and Combinatorial Structure
The hypergeometric probability mass function derives directly from counting principles. The numerator counts favorable outcomes: C(K, k) ways to select k successes from K available successes, multiplied by C(N − K, n − k) ways to select the remaining n − k items from the N − K failures. The denominator C(N, n) represents all possible ways to draw n items from N total items. This ratio provides the exact probability under the assumption of random sampling without replacement.
The finite population correction factor (N − n)/(N − 1) appears in the variance formula and quantifies how sampling without replacement reduces variability compared to sampling with replacement. As N approaches infinity while K/N remains constant, the hypergeometric distribution converges to the binomial distribution with parameters n and p = K/N. This convergence typically provides acceptable approximations when n/N is less than 0.05, though engineering applications often require exact calculations for regulatory compliance.
Quality Control and Acceptance Sampling
Manufacturing quality assurance extensively employs hypergeometric calculations for lot acceptance sampling. Consider a shipment of 500 precision bearings where the purchaser's specification allows maximum 2% defects. The inspector randomly samples 50 bearings using destructive testing. If the lot actually contains 15 defective bearings (3%), the probability of accepting the lot (finding 0 or 1 defects in the sample) determines the operating characteristic curve of the sampling plan.
The hypergeometric model provides exact probabilities for these scenarios where lot size is finite and sampling represents a significant fraction. Military Standard 105E and ISO 2859 sampling tables historically used hypergeometric calculations for single sampling plans, though modern standards often employ the Poisson approximation for computational simplicity. However, for critical aerospace, medical device, or nuclear components, engineers must use exact hypergeometric probabilities to certify compliance with stringent quality requirements.
Reliability Engineering and Component Testing
Systems reliability analysis employs hypergeometric distributions when evaluating redundant component configurations with limited spares inventory. A telecommunications switching station maintains 12 backup power modules, of which 3 are unknowingly defective due to a manufacturing batch issue. When a power surge event requires activation of 5 backup modules simultaneously, the hypergeometric distribution calculates the probability that sufficient functional modules activate to maintain service.
This scenario differs fundamentally from Poisson or exponential reliability models because the population (12 modules) is finite and fixed. The probability that at least 4 of the 5 activated modules function properly determines system availability. Engineers designing redundancy must account for common-mode failures creating finite pools of potentially defective components, making hypergeometric analysis essential for accurate system reliability prediction.
Ecological Sampling and Mark-Recapture Methods
Wildlife biologists use hypergeometric distributions in mark-recapture population estimation. Researchers capture and tag K animals from an unknown population of size N, then release them. In a second sampling event, n animals are captured, of which k are tagged. The hypergeometric probability P(X = k | N, K, n) forms the likelihood function for estimating N using maximum likelihood estimation. The Peterson-Lincoln index N̂ = (K × n) / k provides a point estimate, but the hypergeometric distribution supplies confidence intervals and hypothesis tests.
This application demonstrates a critical feature: the hypergeometric distribution provides exact probabilities even when sampling represents a large fraction of the population. Traditional normal approximations fail when n/N exceeds 0.1, yet ecological studies frequently involve such intensive sampling. The exact hypergeometric calculation remains valid regardless of sampling fraction, though computational challenges arise for large populations where binomial coefficients overflow standard floating-point arithmetic.
Network Security and Intrusion Detection
Cybersecurity analysts apply hypergeometric statistics to anomaly detection in network traffic. A firewall monitors 10,000 connection attempts, of which an unknown number K are malicious. An intrusion detection system flags n = 150 connections as suspicious based on heuristic rules. If security analysts manually investigate a random sample of 20 flagged connections and find k = 8 are genuinely malicious, hypergeometric inference estimates the precision of the detection algorithm and the total number of undetected threats.
This inverse problem—estimating K given observations of n, k, and assuming N—requires Bayesian methods with hypergeometric likelihoods. The solution provides critical security metrics: false positive rate, detection efficiency, and expected number of undetected intrusions. Unlike continuous statistical distributions, the discrete nature of network events and finite observation windows make hypergeometric models more appropriate than Gaussian approximations.
Worked Example: Pharmaceutical Quality Control
A pharmaceutical manufacturer produces a batch of N = 2,400 vaccine vials. Regulatory requirements mandate sampling inspection before release. Historical process capability suggests approximately 48 vials (2%) may have fill volumes below specification due to nozzle wear. The quality control plan calls for randomly sampling n = 89 vials and measuring fill volume with destructive testing.
Problem: Calculate (a) the probability of finding exactly 3 defective vials, (b) the probability of finding 3 or fewer defects, and (c) the expected number of defects and standard deviation.
Solution Part A - Exact Probability:
Given: N = 2,400, K = 48, n = 89, k = 3
First verify k is in the valid range:
min(k) = max(0, 89 − (2400 − 48)) = max(0, −2263) = 0
max(k) = min(89, 48) = 48
Since 0 ≤ 3 ≤ 48, the calculation is valid.
Calculate binomial coefficients:
C(48, 3) = 48! / (3! × 45!) = (48 × 47 × 46) / (3 × 2 × 1) = 17,296
C(2352, 86) = very large number requiring logarithmic computation
C(2400, 89) = very large number requiring logarithmic computation
For practical calculation with large numbers, use logarithms:
ln[P(X = 3)] = ln[C(48,3)] + ln[C(2352,86)] − ln[C(2400,89)]
Using computational tools or Stirling's approximation:
P(X = 3) ≈ 0.2273 or 22.73%
Solution Part B - Cumulative Probability:
P(X ≤ 3) = P(X = 0) + P(X = 1) + P(X = 2) + P(X = 3)
Computing each term:
P(X = 0) ≈ 0.1342
P(X = 1) ≈ 0.2537
P(X = 2) ≈ 0.2396
P(X = 3) ≈ 0.2273
P(X ≤ 3) ≈ 0.8548 or 85.48%
This indicates an 85.48% probability that the inspection will find 3 or fewer defective vials, likely resulting in lot acceptance if the acceptance number is 3.
Solution Part C - Expected Value and Variance:
Expected number of defects:
E(X) = n × (K / N) = 89 × (48 / 2400) = 89 × 0.02 = 1.78 vials
Variance calculation:
Var(X) = n × (K/N) × [(N−K)/N] × [(N−n)/(N−1)]
Var(X) = 89 × (48/2400) × (2352/2400) × (2311/2399)
Var(X) = 89 × 0.02 × 0.98 × 0.9633
Var(X) = 1.6805
Standard deviation:
σ = √1.6805 = 1.296 vials
The finite population correction factor (N−n)/(N−1) = 2311/2399 = 0.9633 reduces variance by about 3.67% compared to binomial sampling. This correction becomes negligible only when N is much larger than n, demonstrating why exact hypergeometric calculations matter for this batch size.
Interpretation: On average, inspectors expect to find 1.78 defective vials with a standard deviation of 1.30 vials. The probability mass concentrates between 0 and 4 defects, with finding exactly 1 or 2 defects most likely. If the acceptance criterion allows up to 3 defects, the lot has an 85.48% probability of passing inspection despite containing 2% defects—a key metric for designing sampling plans that balance consumer risk and producer risk.
For more statistical and mathematical tools, explore our complete collection at the engineering calculator library.
Practical Applications
Scenario: Electronics Manufacturing Quality Audit
Marcus, a quality engineer at a semiconductor facility, receives a shipment of 5,000 microcontrollers from a new supplier. The purchase agreement specifies maximum 0.5% defect rate, but Marcus needs to verify this without testing all units. He randomly selects 200 chips for electrical parameter testing. Using the hypergeometric calculator with N=5000, assumed K=25 (0.5%), and n=200, he determines that finding 2 or more defects gives only 8.6% probability if the true defect rate meets specification. When testing reveals 3 defective units, Marcus calculates the lot likely contains 60-90 defects (1.2-1.8%), substantially exceeding the contract specification. This statistical evidence supports rejecting the shipment and renegotiating with the supplier, potentially saving the company from field failures worth millions in warranty claims.
Scenario: Clinical Trial Patient Selection
Dr. Sarah Chen designs a phase III clinical trial for a diabetes medication requiring 120 participants from a registry of 800 eligible patients. The registry includes 240 patients with advanced kidney disease who require different dosing protocols. To ensure the trial sample represents the registry population, Dr. Chen uses hypergeometric probability calculations to determine expected distributions. With N=800, K=240, n=120, she calculates the probability of randomly selecting various numbers of kidney disease patients. The expected value of 36 patients (30%) with kidney complications matches the registry proportion, but she calculates there's a 15.3% chance of getting 42 or more such patients, which would bias trial results. This statistical analysis prompts Dr. Chen to implement stratified random sampling instead of pure random selection, ensuring the trial accurately represents the target patient population and regulatory agencies will accept the results.
Scenario: Fisheries Resource Management
James, a marine biologist, needs to estimate the population of endangered steelhead trout in a remote river system for conservation planning. His team captures, tags, and releases K=156 fish during a spring survey. Three weeks later, they capture n=89 fish and find k=12 are tagged. Using the hypergeometric distribution with these values, James applies maximum likelihood estimation to calculate the most probable total population. The calculator helps him determine that N≈1,157 fish provides the highest probability for observing exactly 12 recaptures. More importantly, by computing probabilities for different N values, he constructs a 95% confidence interval of 890-1,520 fish. This population estimate reveals the river system supports a viable breeding population above the 750-fish conservation threshold, allowing the watershed council to proceed with habitat restoration rather than expensive captive breeding programs.
Frequently Asked Questions
▼ When should I use the hypergeometric distribution instead of the binomial distribution?
▼ How do I handle very large population sizes where binomial coefficients overflow?
▼ What is the finite population correction factor and why does it matter?
▼ Can the hypergeometric distribution be used for multi-category populations?
▼ How accurate are normal approximations to the hypergeometric distribution?
▼ What is the relationship between hypergeometric distribution and Fisher's exact test?
Free Engineering Calculators
Explore our complete library of free engineering and physics calculators.
Browse All Calculators →🔗 Explore More Free Engineering Calculators
About the Author
Robbie Dickson — Chief Engineer & Founder, FIRGELLI Automations
Robbie Dickson brings over two decades of engineering expertise to FIRGELLI Automations. With a distinguished career at Rolls-Royce, BMW, and Ford, he has deep expertise in mechanical systems, actuator technology, and precision engineering.